With the launch of Adi Vaani, an AI-powered translator, IIITH is part of the government’s efforts at bridging the communication gap between tribal and non-tribal communities while safeguarding endangered languages
As part of ‘Janjatiya Gaurav Varsh’ – a year-long celebration of tribal pride, legacy and empowerment coinciding with the 150th birth anniversary of Dharti Aaba Bhagwan Birsa Munda (a tribal folk hero and independence activist), the Ministry of Tribal Affairs has launched an AI-powered translator for tribal languages. Titled Adi Vaani, the platform in its beta version supports Santali, Bhili, Mundari, and Gondi languages. The next phase of the project aims to include more languages like Kui and Garo.
The app which is available on Play Store as well as a dedicated web platform has been developed by a consortium of premier institutions led by IIT Delhi and comprising of IIIT Hyderabad, BITS Pilani, and IIIT Nava Raipur in in collaboration with the Tribal Research Institutes (TRIs) in Jharkhand, Odisha, Madhya Pradesh, Chattisgarh, and Meghalaya. At IIITH, the Speech and Natural Language Processing groups at the oldest lab of Language Technologies Research Centre have been involved with the project since July 2024. “This is a social initiative project that we are proud to be associated with,” says Prof. Radhika Mamidi who leads the institute’s efforts with the help of Prof. Anil Vuppala and research scholars Vandan Mujadia, Nikhilesh Bhatnagar, Anindita Mondal and Soujanya Rao. In the initial days, the technology know-how on putting Adi Vaani on the cloud was shared by IIITH’s Product Labs team led by Satish Kathirisetti along with Sriram and Shashank.
What It Does
As per the 2011 Census, India is home to 461 tribal languages spoken by the Scheduled Tribes and 71 distinct tribal mother tongues. Among these, 81 are vulnerable and 42 critically endangered. They face the risk of extinction due to limited documentation and intergenerational transmission gaps. Adi Vaani aims to address this challenge by leveraging AI for systematic digitization, preservation, and revitalization of tribal languages. How? The platform enables real-time translation of both text and speech between Hindi/English and the tribal languages. It also helps in preserving folklore, oral traditions, and cultural heritage via optical character recognition technology. It can support and promote civic inclusion in tribal communities by spreading awareness about governmental schemes and other important initiatives.
IIITH’s Role
The IIITH team used a Transformer-based sequence-to-sequence (seq2seq) architecture for the 4 machine translation systems of English to Santali, Hindi to Santali and vice versa. “This has become the state-of-the-art approach in neural machine translation (NMT). The parallel corpus was built with the help of Tribal Research Institute, Odisha. After the base model was built, additional data was generated and post-edited by Santali native speakers which helped improve the systems,” remarks Prof. Mamidi. The researchers also developed a Text-to-Speech (TTS) tool for Santali, Mundari, and Bhili languages. The TTS tool for Gondi is currently under development. Anindita Mondal, who built the TTS tools, worked closely with the native speakers. They spent a considerable amount of time at IIITH for recording speech data.
Sabka Vikas
According to the governmental press release, the app is more than just a translation tool. It’s a national mission to safeguard endangered cultures and knowledge. It not only seeks to empower the tribal communities with all the public service initiatives in their own languages but it also promotes inclusive governance by ensuring last-mile reach of governmental schemes. Dr Radhika Mamidi adds, “We will continue to improve the models with more feedback coming in with the Beta launch. Our aspiration is to make NCERT books, educational and health awareness videos, Government schemes and materials to be made available in these low resource languages using the fast-emerging AI technologies. We also plan to work on more indigenous languages. As part of Indic-Wiki summer internship programme, under the mentorship of Krupal Kasyap, we focussed enriching online content in Telangana origin languages such as Gondi, Koya, Kolami, Naikdi, Chenchu, Kaikadi (Yerukala), Lambadi, Nakkala, and Konda Kammara. Hopefully, we can make AI tools for these languages as well with the support from Telangana government and the Ministry of Tribal Affairs.”
Sarita Chebbi is a compulsive early riser. Devourer of all news. Kettlebell enthusiast. Nit-picker of the written word especially when it’s not her own.