IIIT-H Students Take Home Top Prize In ISEC Data Science Challenge For Bug Prediction Model

At the recently concluded Student Data Science Challenge (SDC) competition that was held on the sidelines of the Innovations in Software Engineering Conference (ISEC) 2026 – organized under the ACM India umbrella – IIIT-H team Megalodon walked away with the first prize for developing a novel machine learning system to detect faulty code.

With ISEC being a premier event dedicated to exploring the latest advancements, emerging trends, and key challenges in software engineering, the Student Data Science Challenge too focused on an issue often encountered in software engineering – software quality assurance teams spending enormous effort detecting and fixing bugs in large codebases. The challenge solicited solutions from participants in the form of predictive algorithms capable of identifying patterns within quantitative features like static code metrics to determine if a code file is faulty. 

Fully Automated
“Can we detect whether a snippet of code has bugs without reading a single line of code?,” asks Team Megalodon enthusiastically. For team mates Vijay Aravynthan , Itikela Bhaskar and Arihant Tripathy, the answer is not just a resounding ‘Yes’ but also comes with an impressive 85% accuracy. Traditionally, identifying errors in software requires either running the code or manually reviewing it – both time-consuming processes. The students’ approach sidesteps that entirely. Instead, their model analyzes structural metrics extracted from the codebase. “These include properties such as Cyclomatic Complexity, Lines of Code, and the number of string literals,” explains Vijay. Using around 16 such metrics, the team trained an algorithm that can predict whether code is likely to be faulty. While such metrics are commonly used in software engineering to assess code quality, the students’ work applies them differently: to predict whether a code file is likely to be faulty before it is run. “There can be code that is written very sloppily but still runs,” Vijay adds. “By combining multiple metrics and training data, we can identify patterns that indicate when the code is likely to fail.”

Why This Matters
The project comes at a time when the volume of code being written worldwide is rapidly increasing, driven in part by AI-assisted programming tools. According to the students, this makes manual code review increasingly impractical. “With AI-generated code, anyone can produce large amounts of code quickly,” says Bhaskar. “You can’t have someone manually checking everything. Tools like this can act as a filter before human review.” Arihant points out another practical advantage: the system works even when running the code itself may be difficult. “For very large codebases, running the code can be hard,” he explains. “But calculating these metrics and predicting whether it will run is much easier.”

A Team Forged in Hackathons
The collaboration between Vijay and Bhaskar didn’t begin with this competition. The pair has been participating in hackathons together regularly. “This is probably our fifth or sixth hackathon together,” Bhaskar says. “Hackathons really test you, especially your ability to not give up.” The fast-paced nature of hackathons pushes teams to develop ideas quickly and prototype solutions under intense time constraints. “It’s basically a race about how much you can do and how well you can do it in one or two days,” says Vijay. For them, even though hackathon projects are rarely production-ready in such a short time, the ideas often prove valuable long after the event ends.

From Hackathon Idea to Real Campus Feature
Interestingly enough, one such idea has already made its way into everyday campus life. During a previous hackathon organized by the Open Source Developers Group (OSDG), the team, which saw Arihant as a mentor, proposed a feature to estimate crowd in the campus mess – without using cameras or collecting personal data. Instead, they used data from the QR code scans students already make when entering the mess. By analyzing how many scans occur within a given time window and the intervals between them, their system estimates how many people are currently in the mess hall. “We don’t use any personal information, only the number of QR scans and their timing,” Bhaskar explains. “Using those metrics, we estimate how crowded the mess is.” The idea impressed judges and won the Most Impactful Solution award at the hackathon.

More importantly, it moved beyond the competition. The feature has now been integrated into the MyIIITH app, where it helps students check real-time crowd levels in the mess before heading out for a meal. “Whenever meals are on, the app shows how many people are in the mess,” says Arihant, who is part of the OSDG committee as well as part of the MyIIITH development team. The team hopes to expand the feature further by recommending optimal meal times based on crowd patterns and class schedules. “If you have a class at 2 pm, we might recommend going before 1:30,” Bhaskar explains. “If you’re free later, you might get a different recommendation.”

Learning Beyond the Competition
Apart from the competition itself, the ISEC Data Science Challenge also gave the students an opportunity to participate in a software engineering conference, where they attended research talks and workshops related to their work. For Arihant and Vijay, who work in the software engineering research domain, the experience was particularly valuable. “It was our first time attending a software engineering conference, and we got to hear about a lot of new research in this field.” 

What’s Next?
While the competition solution is currently a prototype, the students are already thinking about how to develop it further. The main challenge is access to larger datasets for training and validation. “The limitation right now is the dataset,” Bhaskar notes. “But we can collect metrics from open-source repositories across different programming languages.” The team plans to continue developing the idea once their schedules free up. “During the summer break, we’ll look at open-source repositories and expand the dataset,” says Vijay. “If everything goes well, we hope to turn this into a research paper.”

Leave a Reply

Your email address will not be published. Required fields are marked *

Next post