Prof. Dipti Misra Sharma traces the evolution of language and language technologies before summarising the research developments in the space that are taking place at IIITH.
Perhaps the single-most distinguishing feature of humankind is its ability to speak. While other species communicate too, what sets humans apart is a far more advanced and complex system of linguistic signs that enables formulating and expressing abstract thoughts and ideas beyond mere speech. ‘Language’ has played a very critical role in the way humans have progressed in the realm of knowledge sharing and preservation leading to human advancement. Language is not merely a tool for communication. It is a vehicle for culture, identity, and social cohesion. Language evolved as a spoken system. But language only in the spoken form had limitations. Knowledge dispersal and long distance communication were difficult in this mode.
One of the major landmarks that facilitated knowledge/information dispersal was the invention of writing systems. Writing allowed humans to preserve their thoughts and ideas facilitating communication across time and space. From painstaking carvings on stone, ‘writing’ moved to imprinting text on clay tablets and using reeds dipped in ink before modern-day pens and pencils were invented. But this still meant that the written word was available to only a select few. It wasn’t until the invention of the printing press that widespread dissemination of the written word took place.
Early Tech Advancements
The introduction of the printing press in the 15th century revolutionised the dissemination of written content. It enabled the mass production of texts, making books more accessible and affordable. What truly brought about a major technological shift is the advancements in natural language processing which moved from mere representation of the spoken language towards an effort to perform some language functions that involved understanding and generation of human language in the manner that humans do.
The task was extremely challenging. Modelling language in a way that can emulate human use of language for communication is not easy. Communication using language by humans requires several types of knowledge. It includes not just complex linguistic knowledge but also knowledge of prosody, facial expressions, gestures, shared experiences, knowledge of the world around and so on. It has taken decades (Second world war to 2024) for language technologies to evolve to the level we are at today since incorporating the kind of knowledge that humans have for language use is an extremely challenging task. Initial efforts in NLP (largely for machine translation) were very limited, relying on basic dictionaries and some grammar rules. These were fragile and not really very usable in a real-world scenario.
Evolution of NLP
At the beginning of the Cold War, the IBM 701 computer automatically translated Russian sentences into English for the first time. In the 60s, the creation of ELIZA – one of the first chatbots to simulate conversation – marked a notable achievement in the field of NLP. It was in the 70s that the first ideas around rule-based machine translation emerged.The 1980s marked a shift towards more complex language processing systems, with the introduction of conceptual ontologies that structured real-world information into computer-understandable data. This shift allowed for the development of models that could learn from data rather than relying solely on predefined rules. This lead to the emergence of statistical methods as a solution to overcome the limitations of rule-based systems for training models. The introduction of word distribution semantics further advanced NLP by emphasising the relationships between words based on their context. Long Short-Term Memory (LSTM) networks emerged in the late 1990s, revolutionising the processing of sequential data in NLP. This capability was crucial for tasks such as speech recognition, language modelling and text classification. All these advances were small steps forward to model natural language for doing complex tasks.
The introduction of the Transformer architecture in 2017 marked a significant leap in the field of NLP. Transformer-based multilingual approaches have shown processes for multiple language tasks including Indian languages like Hindi, Bengali, and Gujarati, leading to substantial advancements in language technology. Speech-to-speech machine translation is another transformative architecture built by combining multiple different language technologies that has the potential to effectively address multilingual needs. Today, the talk is all about large language models (LLMs). The rise of LLMs is based on advances in the use of large text data and computational resources. However, development is biased towards high-resource languages like English, underrepresenting many languages, including several Indian languages. The lack of representation can lead to models that aren’t culturally sensitive or which perpetuate biases, a concern in diverse countries like India. Another concern that the advent of this technology and its heavy reliance on large data and huge compute facility poses is introducing disparities leaving languages behind which do not have a reasonable digital presence. If a language is left behind, then it is not just the language but also the societies, their culture and knowledge of these cultures that are left behind. Hence, going forward, the technology will move towards addressing these concerns.
In India’s multilingual context, these technologies can make a vast difference to people’s life in critical domains of health care, judiciary and education, apart from meeting people’s day today communication needs. Collaborative efforts like the National Language Translation Mission aim to create a more inclusive digital landscape by developing resources and technologies for Indian languages. Academia, industry, and government collaborations can pool resources to tackle these challenges. Encouraging open-source datasets can enhance training data availability for LRLs, enabling researchers to build better models without prohibitive data acquisition costs.
Language Tech at IIITH
In this evolving landscape of language, IIIT Hyderabad has been playing its role by developing language-related resources and technology and by being part of major national projects in this area. The effort spanned from the era of rule-based systems to statistical methods and recently to LLMs tailored for multiple Indian languages. The work includes a wide range of applications starting from core language understanding and generation tools such as shallow parsers, state-of-the-art machine translation models, question-answering, summarization, dialogue processing, and speech-to-speech translation systems. In addition, this focused research has resulted in the development of various types of multilingual content, including the translation of educational video lectures, healthcare content, and legal materials along with end applications such as educational chatbots, allowing students to easily get answers to their educational questions in their native language.
The Future Of NLP
Research in language multimodality through the incorporation of various modalities like text, images, audio, and video-based factual and stored data will enhance comprehension, generation, and execution of language depending on the wider context. Future research in this will likely focus on developing more sophisticated algorithms that can seamlessly fuse these modalities, enabling machines to derive richer insights and more nuanced interpretations of content and aid us more in our day-to-day activities. Combining visual elements with textual descriptions can significantly enhance applications in fields such as human-computer interaction. For example, a multimodal chatbot that uses natural language processing (NLP) and video processing to help citizens access social welfare resources and services. Citizens can interact with this chatbot through both audio and visual means, asking questions regarding social services, including agricultural assistance and social schemes, in domains such as governance or health. Furthermore, users can upload images or videos as part of their inquiries or provide information-based proofs, which the chatbot can interpret to offer suitable multimodal responses. This approach not only improves user engagement, but also ensures that community members receive timely and pertinent information, leading to a more informed and cohesive community.
Conclusions
Language technology has come far in the last few decades. However, there is still a long way to go. The technology has reached a stage where it can be used to support some of the human tasks but it is still far from being anywhere close to the creative and social aspects of language use. Language is a critical component of human intelligence, thus an important field in AI research where we have to keep delving deeper to understand more complex aspects of the human mind and its linguistic abilities.
This article was initially published in the August edition of TechForward Dispatch
Next post