As software writes itself, Prof. Abhishek Kr Singh outlines a solution that blends deep theory with practical tools to detect bugs even before they can cause failures.
With AI taking on an increasing role in writing code, a new and pressing question has emerged: how can we be sure that automatically generated code is safe, reliable, and correct? “The process of developing software itself is now automated,” Prof. Abhishek Singh, Software Engineering Research Center, explains. “But in the process, there are a lot of bugs that get generated as well.” His research focuses on building automated systems that can catch these bugs early – especially in modern software that runs multiple tasks at the same time, a category known as parallel or concurrent programs.
Root Of The Problem
According to Prof. Singh, many software bugs originate long before code is written. “Many software bugs find their roots in the transition from informal intent to formal implementation. They begin with how humans describe what they want software to do. “The problem arises because you never describe your intent clearly,” he says. “You provide inputs in natural language and then these AI agents produce code for you.” But, he explains, natural language is ambiguous. “English sentences may have multiple meanings,” Singh notes. Code, on the other hand, has no room for ambiguity. Even a small mismatch between intent and implementation can lead to errors that are hard to detect.
Correctness By Construction
Rather than fixing bugs after software is written, Prof. Singh advocates for an approach known as “correctness by construction.” “We prove safety and reliability by making the code correct during construction itself,” he explains. This involves expressing a programmer’s intent in a formal, precise way, often directly within the code itself, using specifications and assertions that computers can check automatically. “If you can specify your intention in a more formal language,” Prof. Singh says, “there is a possibility of checking whether those intentions are met or not.”
Why Parallel Programs Are Especially Dangerous
The challenge becomes far more complex when software runs tasks simultaneously, as most modern systems do. “If automated bug detection is hard in sequential programs, it becomes even harder and non-intuitive in parallel programs,” says Prof. Singh. In such systems, tiny timing differences in how tasks interleave – often called race conditions – can lead to bugs that appear only under very specific conditions – sometimes years after the software is deployed.
To address this, Prof. Singh’s team uses a technique called fuzzing, which automatically generates large numbers of inputs to test how software behaves. “Constructing test cases by hand is not that easy,” he reasons. “Industry spends a lot of time doing testing using input-output pairs but this is not a systematic way.” Fuzzing takes a different approach. Instead of checking whether the output is exactly right, it checks whether any important rules are violated under any input. “If even one property is broken,” he says, “you know something went wrong.”

Smarter Fuzzing
What sets Prof. Singh’s work apart is the way fuzzing is combined with a deep understanding of how parallel programs behave on real hardware. “Random fuzzing may not trigger the exact bug that is there inside your program,” he cautions. Instead, his team uses what he calls semantic-guided fuzzing – using formal knowledge of program behavior to guide which inputs are generated. This matters because existing tools often rely on oversimplified assumptions. “There is no tool right now that actually deals with fuzzing of weak-memory programs which run on modern architecture,” he says, while admitting that there are standard fuzzers like the AFL (American Fuzzy Loop) but very few research groups in the world that understand the deep semantics of parallel programs.
From Theory to Tools
The automated bug detection project is being carried out in collaboration with Prof. Ashish Mishra of IIT Hyderabad, along with students from both IIT and IIIT-H working on it full-time. While still in its early stages, the team is building toward a practical tool that works on real software. Rather than generating random inputs blindly, the tool uses formal knowledge of how concurrent programs behave on real hardware to explore the most error-prone scenarios. “We are targeting commonly used programming languages like C++ and modern architectures like x86 and ARM,” Prof. Singh explains.
Towards Real-World Impact
Prof. Singh describes the work as a natural extension of his earlier research which focused more on theoretical proofs of program correctness. “My PhD work was mostly about formal verification and mathematical proofs,” he says. “Now we want to translate those theoretical results into actual tool building.” The potential industry impact is significant. Major technology and chip companies already rely heavily on fuzzing to test their systems. “This is not something purely theoretical,” Singh emphasizes. “It has a lot of impact in industry.” Future plans include extending the approach to GPUs and hardware accelerators, where correctness challenges are even more severe.
For now, the team is focused on getting the foundations right. But the goal is ambitious: to change how software is tested – and trusted – in an age where humans increasingly rely on machines to write the code that runs the world.

Sarita Chebbi is a compulsive early riser. Devourer of all news. Kettlebell enthusiast. Nit-picker of the written word especially when it’s not her own.


Next post