Chasing Rabbit Holes: Why We Shouldn’t Trust AI Blindly -

In an era where large language models dazzle us with fluency, confident reasoning, and near-human responses, Prof. Manish Shrivastava urges caution by pulling back the curtain on AI’s “illusion of reasoning,” and makes a compelling case for smarter data, smaller models, and a more thoughtful future for AI, especially in the Indian context.

Prof. Manish Shrivastava’s research philosophy can be best described with two ‘Rs’: “R for research and R for rabbit holes.” Explaining that there are three types of research, the goal-oriented kind which is focused and socially impactful, the opportunistic kind which jumps into emerging gaps in a field and the exploratory type, driven by intellectual curiosity, Prof. Shrivastava elaborates that most of his work falls into the third category. It’s these rabbit holes that have led him deep into one of today’s most urgent questions: Are large language models (LLMs) actually doing what we think they are?

What Do Language Models Really Know?
Anybody who is using an LLM treats it as an intelligent entity. But for Prof. Shrivastava, it is “facts plus language”. Deconstructing the LLM as essentially a memory bank of facts extracted from massive data and a language engine that understands syntax and semantics, he cautions that it is “a broken memory bank, a lossy compression at best.” According to him, LLMs learn from frequency. The more often a fact appears in data, the stronger it becomes embedded. “But frequency isn’t truth. From social media, we know that’s not true,” he says, adding that it raises uncomfortable questions, such as – are the models really reasoning or are they just predicting what sounds statistically likely? And are they using evidence or ignoring it?

The Illusion of Reasoning
To test if the models were indeed reasoning, Prof. Shrivastava’s team designed systematic experiments in evidence-based reasoning tasks. In one setup, models were given a table (evidence), a statement (hypothesis) and a task: to decide whether the evidence supports the statement. “Logically, if you remove relevant evidence, the model should no longer confidently decide,” he declares. But what happened? When relevant rows were deleted, in 72% of cases, the model still stuck to its original answer. It did not change it’s decision,” as logic would demand. “It is relying on something from the past while not performing the task it was supposed to perform.” In other words, the model wasn’t really reading the evidence, it was leaning on its prior internal knowledge.

This is where hallucinations come from. “LLMs’ performance is high. Great. It helps us. But are they getting the high performance for the right reason? Not really.” Even more troubling was that models often performed almost as well when given only the hypothesis, without any evidence at all. “That suggests something fundamental: They may be predicting plausibility, not reasoning,” says Prof. Shrivastava.

Spray and Pray: The Inefficiency Problem
Describing today’s AI training ecosystem with a blunt, “Throw data at it, throw compute at it, and pray that it works,” Prof. Shrivastava is of the view that while this approach works, it is inefficient and expensive. “It also doesn’t serve low-resource languages or communities well.” This then lead to his second major research thread: Can we train better models with less data?

Smarter, Not Bigger: Data Efficiency in Machine Translation
Working on low-resource machine translation (like English–Hindi, English–Telugu, English–Odia), Prof. Srivastava’s team explored a bold idea: What if not all data is equally useful? Instead of feeding millions of sentences blindly into a model, they removed noisy data, filtered out low-quality translations, and selected linguistically rich, structurally complex sentences. From 1.8 million English–Hindi pairs, they reduced the dataset to 800 thousand high-quality samples. The result was that translation quality either remained the same or improved. In some language pairs, they achieved 50–60% reduction in data, significant computational cost savings and equal or better performance.

Their key finding was that complex, linguistically rich sentences teach models more efficiently than repetitive, simple ones. Smarter data beats more data.

Building High-Quality Indian Language Resources
Beyond model analysis, the team also invested heavily in data creation for Indian languages. Between 2016 and 2020, over 500 students collaborated to build datasets for question-answering, summarization, and language generation. One of the major releases was a high-quality Telugu summarization dataset. They began with 90,000 samples but after aggressive quality filtering, only 20,000 made the cut. “We are looking at a 20% yield… but that’s what I want to leave with,” states Prof. Shrivastava. High standards matter more than volume. And today, that dataset is one of the most reliable Telugu summarization resources available.

Are LLMs Threatening Academia?
In response to a question of whether LLMs are threatening research, the professor compared them to the early stages of motor vehicles. “They are wonderful tools, but very unsafe. It’s only over time that cars became safer, more fuel efficient and more reliable.” He believes AI will follow a similar trajectory. Right now, LLMs are wasteful, overgeneralized and prone to hallucinations. Observing that the generalist outlook is bad because generalists are more prone to hallucinations, Prof. Shrivastava argues for task-specific models, interpretable systems, pluggable knowledge models along with a clear separation of language and facts.
“We don’t need models that know everything. We need models that know what’s relevant.”

Bigger Isn’t Always Better
Continuing with the theme of why we ought to eschew generalism, the professor draws a parallel with humans. “Human experts aren’t universal generalists. We specialize. So why build models that try to learn the entire world?” LLMs, in effect, try to teach a person to learn everything in the world because someday that information might be useful. Instead, Prof. Shrivastava proposes a different future: one with specialized expert models, modular architectures, and systems that communicate like think tanks. “Not one giant brain. But many coordinated ones.”

Trust, But Verify
Clarifying that his talk wasn’t an attack on AI but instead a call for intellectual honesty, Prof. Shrivastava concluded by reiterating that LLMs are powerful, transformative and definitely here to stay. He, however, cautioned that high accuracy doesn’t equal reasoning, fluency doesn’t equal understanding, nor does size equal intelligence. According to him, the first question we ought to ask is whether they are doing what they should.

He signed off by reminding everyone that in chasing rabbit holes – into model behavior, linguistic structure, and data efficiency – AI research isn’t about building bigger systems. It’s about asking better questions. And sometimes, the most important question is the uncomfortable one: Are we trusting these models for the right reasons?

Sarita Chebbi

Sarita Chebbi is a compulsive early riser. Devourer of all news. Kettlebell enthusiast. Nit-picker of the written word especially when it’s not her own.

On the haLLMarks of fame: Siva Reddy’s journey from IIIT-H to Mila »

« IIIT Hyderabad student Lasya Katari conquers acute disability with superhuman grit