Prof. Vikram Pudi expresses his views on what modern AI and the world of LLMs/LVMs portends for us while suggesting the right way forward.
Modern AI is poised to ignite the world — in all senses! Built upon different architectures of deep neural networks, Large Language Models (LLMs) and Large Vision Models (LVMs) enable computers to understand and generate text and visual content. They are not just transforming industries, they are reshaping the very fabric of how we seek and express knowledge in the ocean of data in the world, most of which cannot be known by any single human individual. As academia and industry come together to unlock their full potential, it’s crucial to examine the workings behind these models, and look at the challenges and opportunities that lie ahead.
Different Models
At the core, there really is no magic. The models are designed to recognize patterns, generate content or make predictions by minimizing a loss function by propagating errors backwards through multiple deep layers of a neural network, arranged in various architectures. Though they operate on different modalities of language and vision, LLMs (like GPT-3, GPT-4, BERT) and LVMs (such as CLIP, DALL·E, and Vision Transformers) share these fundamental design principles. Their sheer size and scale are what set them apart from earlier models. With billions or even trillions of parameters, these models represent the culmination of decades of research, immense computational resources, and breakthroughs in both hardware and software.
LLMs are based on the transformer architecture, introduced by Vaswani et al. in 2017, that allows models to efficiently process long sequences of text, enabling them to understand context, infer relationships, and generate coherent outputs. On the LVM side, similar architectural advancements have enabled models to perform tasks like image generation, classification, and object detection — tasks that once required specialized systems and considerable human expertise.
The Backbone Of Models
Data is the life-blood of these models, and is used to train them to perform their tasks. LLMs are trained on vast corpora of text data from books, articles, websites, and more, while LVMs are exposed to enormous datasets of images and videos. The more data a model is exposed to, the better its ability to generalize and adapt to unseen scenarios.
Challenges
Being dependent on data raises challenges regarding data quality, bias, and ethical considerations such as fairness, accountability and transparency. Both LLMs and LVMs inherit biases present in the data they are trained on, which can lead to unintended consequences when deployed in real-world scenarios. These concerns become critical in high-stakes applications like healthcare, finance, and law enforcement.
Training Costs
Training these models requires immense computational power, typically provided by specialized hardware such as Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs). Additionally, the costs associated with training state-of-the-art models have skyrocketed, leading to debates about accessibility, energy consumption, and sustainability.
Ethical Use and the Existential Question
Most importantly, the wide availability and ease of use of these technologies raises the question of how to ensure that it is used for the greater good and not exploited for harmful purposes. Another angle that researchers and thinkers are grappling with is the existential question of what it means for the human race if we create intelligence that surpasses our own. Will we face possible extinction? This possibility is made worse if some humans misuse the AI, but even otherwise, it remains a grim possibility. The public front of AI researchers and users, including industry and government, is moving forward, driven by the opportunities of automation, treating the existential threat as at most a minor possibility in the far future, when solutions will likely emerge for it.
But it now seems as if AI can actually pick up and write better content than most of us for most topics, which means it is more knowledgeable than most of us. Given that its abilities are increasing exponentially every few months, and the amount of time, effort and resources put into building faster, smarter, and more competitive machines is so much more than deciphering what is truly right and wrong, the importance of this line of thought will magnify sooner than we may think.
Opportunities
Researchers are exploring new ways to make these models more interpretable and explainable, a critical step toward improving trust and adoption. Meanwhile, the race must turn towards applying AI to solve today’s problems than to build machines of the future and get ahead! The integration of multi-modal learning, where models can simultaneously understand text, images, and sound, will lead to more advanced systems for content understanding and content generation in the near future.
Ongoing Work at IIITH
As a related exploration in our lab, in collaboration with SalesForce, we are currently looking at how to make existing models work more effectively for time-series data, and thereby tap into their predictive power for an enormous range of applications.
Data Foundation
At IIIT Hyderabad, we are building a Data Foundation (hosted at https://india-data.org/), a repository of datasets for novel data-driven AI applications. It currently hosts the Indian Brain Atlas, Indian Driving dataset, and 30+ other novel datasets. The platform allows research lab teams to collaborate and build datasets privately, publish them when ready, and host AI and analytics challenges using these datasets. The technology behind the hosting platform, Data Foundation, is built using open-source technologies and will also be available to be installed locally at various collaborating organizations.
A Call for Multi-Stakeholder Collaboration
The journey of advancing LLMs, LVMs and AI in general is not one that can be traversed alone. Academia has been instrumental in laying the theoretical groundwork and pushing the boundaries of what is possible with neural networks and deep learning. Industry, on the other hand, provides the real-world applications and the resources needed to scale these systems.
It will be useful for the community to get together to decide on regulations that allow safe AI, and enable the government to implement those regulations. We should learn from the decades of open-source models of software development in other fields and insist that AI develops along similar lines, by the public, for the public, and of the public. The future of these technologies will be shaped by the continued collaboration between these sectors. Furthermore, industry-academic partnerships could foster talent pipelines, allowing the next generation of researchers and practitioners to develop a deep understanding and safe usage of these technologies and their real-world applications.
This article was initially published in the January ’25 edition of TechForward Dispatch
Prof Vikram Pudi is a Professor at the Data Science and Analytics Center, IIITH. While his formal research interests include building algorithms for learning interpretable models, interactive learning, recommendation systems and citation analysis, he likes to work on almost any high impact problem that is simple to state but difficult to solve.