Engineering Agentic AI Systems: From Lab to Land -

Prof. Karthik Vaidhyanathan underlines the importance of software engineering practices while building agentic AI systems and further explains how some of the research undertaken by his group has been taken to production. The Evolving Landscape of Software Engineering and AI As software has become more and more ubiquitous and widely used, the way we build software systems has also transformed. The initial idea was to decompose systems into reusable web services that could scale independently. However, the lack of well-defined service granularity led to ambiguity regarding their optimal size. This ambiguity, coupled with new demands like continuous delivery, the prominence of domain-driven design, the need for small autonomous teams, continuous deployment, and reducing time to market, ushered in the microservices era. The goal became building small, autonomous services that communicate using lightweight protocols. To further refine deployment and alleviate developer burden, serverless computing emerged, and this trend continues. Yet, despite these advancements, software systems remain susceptible to failures and crashes. On the other hand, the field of AI has also evolved over the years since its inception in the 1940s. Especially over the last decade with increased availability of data and compute, AI has made monumental strides, spanning applications across diverse domains. This is particularly true with the emergence of Generative AI, kickstarted by the groundbreaking transformer architecture in 2017. The core idea behind transformers being their self-attention mechanism, enabling them to weigh the importance of different words in a sequence to capture long-range dependencies and context in parallel to predict the next sequence of words. This led to massive advancements. Transformers, trained on vast amounts of data including the entire internet text corpus resulted in what we now call Large Language Models (LLMs). These models possess an impressive ability to generate human-like text by accurately predicting the next word in a sequence.

Age of LLMs, Agentic AI and the Convergence of SE and AI Today, a wide range of proprietary and open-source LLMs have become available, including ChatGPT, Claude, Gemini, LLaMA, Deepseek, and Mistral. As these models grew in popularity, a natural question emerged: if LLMs can generate human-like text, could they also interpret instructions and execute tasks? This marked the shift from asking questions like “How can I get to Hyderabad?” to “Book me a ticket to Hyderabad.” To complete such tasks, the system must understand user intent, remember preferences, and interact with external APIs to take action. An LLM alone, limited to text generation, cannot fulfill this role. However, when equipped with memory, tool usage, and contextual awareness, it becomes part of a larger construct known as an LLM agent. An LLM agent wraps the model with added capabilities invoking tools, managing user data, and interacting with external APIs to autonomously achieve a goal. These agents are reminiscent of microservices in that they are modular and autonomous. But agents go further by incorporating reasoning and goal-directed behavior. When multiple agents collaborate to accomplish broader objectives, the resulting system is known as an Agentic AI system. Further, for standardizing the communication between agents as well as between agents and tools, protocols like A2A and MCP have emerged just like how we use REST to communicate between services. This convergence of AI and software engineering has resulted in two broad research directions: AI4SE, which is about using AI for improving software engineering practices/approaches and SE4AI which is about coming up with better practices for engineering AI systems. Over the years, we have been doing research in this interesting convergence. From the Lab: Applying Generative AI for Software Engineering At the Software Engineering Research Center (SERC), research into the application of generative AI spans the entire software development lifecycle (AI4SE): from requirements to design, deployment, and maintenance. Within our SA4S group at SERC, initial efforts focused on leveraging LLMs to support software design activities, particularly addressing the challenge of software architecture knowledge management (AKM). One of the key activities in AKM involves capturing architectural decisions that are often implicit and scattered across codebases, issue trackers, and developer conversations.

An empirical study explored whether LLMs could assist in generating architecture design decisions¹ that document the rationale behind architectural choices. Results indicated that LLMs, especially fine-tuned smaller models deployable within enterprise environments, could effectively support human architects in documenting and reflecting on architectural knowledge. Following this, the team moved toward the interface between design and implementation by exploring LLMs’ ability to generate architectural components², such as serverless functions. Evaluations on multiple open-source repositories showed that collaboration between architects, developers, and LLMs led to promising automation outcomes. Building on this, the research ventured into dynamic service generation: can an LLM understand a user’s new functional need at runtime, generate a service to fulfill it, deploy it, and integrate it into the system on the fly? A proof-of-concept case study implemented with the help of LLM agents demonstrated that this is feasible, opening up avenues for runtime extensibility³. In the maintenance phase, the group evaluated the use of LLMs for self-adaptation. By analyzing logs and metrics, LLMs were able to make decisions aimed at maintaining SLA compliance⁴. Their performance approached that of state-of-the-art autonomous adaptation techniques, reinforcing the potential of LLMs and agents to contribute meaningfully throughout the lifecycle of modern software systems. While applying AI into various aspects of SE, one thing that we noticed is that engineering AI systems themselves comes with several challenges which resulted in us doing various works on AI engineering and in turn has led to coming up with standard best practices for engineering AI systems (SE4AI).

1 Rudra Dhar, Karthik Vaidhyanathan, Vasudeva Varma, Can LLMs Generate Architectural Design Decisions? – An Exploratory Empirical study, IEEE ICSA 2024 2 Shrikara Arun, Meghana Tedla, Karthik Vaidhyanathan, LLMs for Generation of Architectural Components: An Exploratory Empirical Study in the Serverless World, IEEE ICSA 2025 3 Bassam Adnan, Sathvika Miryala, Aneesh Sambu, Karthik Vaidhyanathan, Martina De Sanctis, Romina Spalazzese, Leveraging LLMs for Dynamic IoT Systems Generation through Mixed-Initiative Interaction, IEEE ICSA 2025 Companion 4 Raghav Donakanti, Prakhar Jain, Shubham Kulkarni, Karthik Vaidhyanathan, Reimagining self-adaptation in the age of large language models, IEEE ICSA 2024 Companion

To the Land: Engineering an Agentic AI Framework The challenges for engineering AI systems became more evident for us during my postdoc research (2020) on a project which resulted in a work where we listed the challenges in architecting ML-enabled systems⁵. It then also became a starting point for a workshop with some of our collaborators on software architecture and machine learning (SAML)⁶. Further, we also did a Dagstuhl seminar on this topic⁷. ML-enabled systems face different types of challenges which is even higher in the case of LLM systems. This is where the land part comes in. About 2 years ago, we began collaborating with MontyCloud, a CloudOps startup that leverages autonomous bots to manage cloud compliance, security, and continuous operations. When they created an autonomous copilot named Marvin⁸, it functioned as a conversational AI agent allowing users to interact with the platform while simultaneously simplifying everyday cloud operations. In addition to this, embodying its inherent intelligence, the copilot also performed automatic checks, generating actionable reports for cloud users. Very soon we faced some important challenges related to maintainability, extensibility, and data management. It was then that we realised the need for an agentic approach. We thought if multiple agents, each with its own knowledge can collaborate to achieve a specific functionality, we could overcome a number of limitations of our existing system primarily stemming from the complexity of managing systems on the cloud such as diverse data sources, orchestration of multiple processes and handling complex workflows to automate routine tasks. We took inspiration from the principles of Domain Driven Design (DDD) by starting to think of agents around various domains within the larger domain of CloudOps. The next step was about realizing the multi-agent system. There were a lot of existing frameworks like Langraph, crew.ai, autogen, etc. However, it did not work for us since they were all monolithic in nature. Moreover, it did not allow us the flexibility of using particular platforms to power certain agents. We wanted to go a step further where we could decide which agent could be built using which framework. That’s how along with the MontyCloud team, we created Meta Orchestrator of Your Agents (MOYA)⁹ – where we orchestrate how different agents can be built with different technologies. This also resulted in a research publication at CAIN 2025 (co-located with ICSE 2025) and was a candidate for the best paper award. Although MOYA came out of our efforts in developing an agentic approach to autonomous CloudOps, the framework itself is a generic framework that can be applied to any use case. To further validate this, we also conducted a hackathon at IIIT-H, Hack-IIIT in collaboration with the Open Source Developers Group (OSDG). More than 100 students participated in the hackathon with teams building agentic AI applications for a wide variety of use cases ranging from meme generators to framework enhancements to games. This further enforced and validated the capability of MOYA. One of the major feedbacks we received was on the simplicity of the framework which was in line with our goals of building the framework in the first place. We also received a lot of constructive feedback which has allowed us to further enhance MOYA.

5 Henry Muccini, Karthik Vaidhyanathan, Software Architecture for ML-based Systems: What Exists and What Lies Ahead, WAIN 2021@ICSE 2021 6 https://sa-ml.github.io/saml2025/ 7 https://www.dagstuhl.de/seminars/seminar-calendar/seminar-details/23302 8 https://blogs.iiit.ac.in/montycloud 9 https://github.com/montycloud/moya

Onward and Forward Agentic AI is shifting the way we think about building software systems or services. As Agentic AI continues to gain traction, a few guiding principles are emerging: The big thing that the community needs to think about is what is an agent and when an agent is required. It’s not about converting all existing services to Agents. Some of the existing services (APIs) may eventually become tools for agents to leverage. Thinking in agents is a skill that needs to emerge. Not every time we need agents. For use cases like chatbot on document, all we need are LLMs that leverage RAG on the documents. Sometimes what we need is well orchestrated flows where one agent calls a tool and so on. The real power of agents comes in scenarios where we need dynamic behavior, in scenarios where agents have to communicate back and forth to achieve a task. Rather than building centralized models or rule-based systems, developers may now begin to view software as a collaboration of intelligent, goal-driven agents – each equipped with tools, memory, and autonomy. This mindset echoes the evolution that microservices brought to system architecture but elevates it by incorporating proactive behavior, situational awareness, and dynamic learning. This transition is especially significant for SaaS platforms. Traditional SaaS architectures are being challenged by increasing demands for autonomy, personalization, and continuous adaptation. Its not that SaaS is dead but the way we do SaaS will probably change. Agentic AI offers a new model where SaaS can be seen more as a composition of intelligent agents rather than static services. These agentic systems can better manage complexity, provide dynamic responses, and integrate seamlessly across workflows. Looking ahead, the software engineering research and practitioner community will need to adopt/adapt new/existing software engineering practices, including memory management for agents, enhancing reliability and robustness, sustainability, standardized interaction protocols, observability for autonomous workflows, developing responsible AI guidelines and evaluation frameworks for agentic behavior. While this is the case, we also need to rethink whether we need to have the “large” LLMs to power agents, instead can we think of domain specific Small Language Models (SLMs). While this is the case on the other hand, there is also a lot of emerging support for improving existing SE practices and processes like software design, deployment, maintenance or even potentially for migrating legacy to modern software systems. Having said that, shifting legacy APIs to make it more agent or LLM-friendly is a task by itself. But we also need to be aware so that we don’t reinvent the wheel. Many of the best practices in building service-oriented systems can be reused/adapted to the agentic AI systems. These are not just technical challenges, but cultural ones as well – requiring software architects, developers, and AI practitioners to collaborate more deeply than ever before.

July 2025