Prof Rajeev Sangal, founding Chair of Mission Bhashini’s Executive Committee, offers a rare behind-the-scenes account of India’s ambitious speech-to-speech translation mission—its visionary strategy, ethical dilemmas, and roadmap to global leadership in domain-specific AI.
“At Bhashini’s inception in 2018–19, many doubted India could catch up with MNC tech giants, unaware that decades of MeitY-funded R&D under TDIL had built strong expertise. The challenge was rebuilding these systems with modern tools and engineering them for scalability. Coupled with India’s experience in platforms like UPI and Aadhaar, this has enabled the Mission to deliver large-scale, world-class technology.”
From Idea to Initiative: The Story Behind Mission Bhashini
Mission Bhashini was ideated by the Prime Minister’s Science Technology and Innovation Advisory Council (PM-STIAC). In September 2018, Prof K. Vijayraghavan, its chairman, asked me to draw up a technology plan for language translation, especially for S&T content in English. I was glad to see language technology emerge as a national priority, recalling how I had demonstrated machine translation to PM Narendra Modi in February 2016 at BHU, while serving as Director of IIT(BHU) Varanasi. The Prime Minister now wanted it developed in mission mode to empower Indian-language speakers less proficient in English. Prof Vijayraghavan, a champion of diversity and the underprivileged, strongly backed this vision.
Conceptualizing the Mission
In shaping the mission, I considered the state of technology, device access, and people’s needs. Smartphones had reached a large population eager for content in their own languages, yet Indian language material was scarce—less than 0.1% of all internet content. Translation from English into local languages could bridge this gap, provided the technology was ready.
Key ideas in the conception of the Mission
Defining the Mission’s scope was the hardest part. I proposed we aim for speech-to-speech machine translation (SSMT), not just text-to-text (MT). Though it seemed a simple extension, MT and speech processing researchers usually worked in separate domains and departments. Could they collaborate toward a common goal? To explore this, we held a workshop at IIIT Hyderabad in January 2019 with leading experts from both fields. Their willingness to cooperate, backed by India’s strong capabilities built under MeitY’s TDIL program, gave me confidence to take the plunge into SSMT.
Educational content was identified as the first priority—NPTEL/Swayam courses and websites—since lectures are easier to translate than conversations, which rely on fragments and context. The system would combine machine output with human corrections, functioning as a human–machine hybrid until quality improved enough for full automation.
Finally, the Mission set its sights on covering all 22 official Indian languages along with English. Unlike MNCs, who focus only on commercially viable languages, our goal was inclusivity—developing SSMT to work across all languages of India.
Developing the right technology
The Mission set out to build AI models for spoken language translation through a complete SSMT pipeline—combining automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS). Supporting models like disfluency correction, named entity recognition, and lip synchronization would be added as needed. Each component could also function independently, for example as a text-to-text MT system or a speech transcription tool. Human intervention was built in at every stage, crucial for error correction in recordings (though not in live use). This core technology would also enable a range of applications, from summarization and sentiment analysis to later advances like LLMs. In addition, OCR for Indian languages was included to recognize text from images. The pipeline would cover all 22 official Indian languages, with the aim of expanding further—ensuring that by developing it within the country, India retained full control and could apply it across countless domains.
Strategic design elements of the Mission
A key question was how indigenously built technology could compete with MNC giants like Google, Microsoft, and Meta, who already had vast Indian language data (hundreds of times more), massive compute resources, and market dominance. The answer lay in nature: just as smaller animals thrive alongside tigers and lions by exploiting niche areas, Indian efforts could excel in domain-specific AI. Real-world applications differ from artificial benchmarks, and each domain becomes a niche where focused startups can outcompete big tech. To support this, the Mission proposed “Technology Acceleration Centres” (earlier called Centres of Excellence, April 2019) to nurture startups and strengthen technology in niche areas. These ideas were built into the Mission document.
On research capacity, the Mission relied on consortia of academic institutions to build critical mass. Language technology needs computer scientists, linguists, Sanskrit grammarians, and experts across Indian languages—expertise no single institution has. Projects were therefore awarded to consortia, ensuring collaboration, shared specifications, and timely delivery under one Consortium Leader. The 13 approved projects brought together 70+ research groups across 30+ institutions, covering 22 languages. This model, rare in India, proved vital to Bhashini’s success—though later hampered by digital accounting systems (like PFMS) that lack support for consortia, frustrating ministries and threatening best practices.
On data, significant funding was earmarked to build spoken corpora, transcripts, and parallel translations across all 22 languages. This data, along with models, was made open source and freely downloadable to empower Indian researchers and startups—though it inevitably became accessible to MNCs as well. While I had reservations, restricting access would have hurt local innovators more than global players.
Ultimately, Bhashini’s goal went beyond delivering technology: it aimed to build a full ecosystem for Indian language AI, spanning startups, research, data, and open innovation.
Identifying key elements of the language translation eco-system
Bhashini aims to build a comprehensive ecosystem comprising R&D groups, data creation teams, Technology Acceleration Centres (formerly CoEs), mechanisms for technology transfer, startup incubation, participation from companies and state governments, and engagement with end users such as publishers, courseware developers, and government departments. This ecosystem is nurtured by MeitY using Bhashini funds, guided by the Standing and Executive Committees, with input from all stakeholders.
Planning the development eco-system through Bhashini
The mission was more than a technology delivery project; it meant putting an entire eco-system in place. A large number of diverse stakeholders had to be brought together. One can think of them as being a part of three different cycles in society: (a) technology cycle, (b) market cycle, and (c) social cycle. Each of the cycles had to be made active, and moreover, these cycles have to be mutually reinforcing.
The three virtuous cycles and stakeholders
Bhashini operates through three interlinked cycles. The first, the technology cycle, links R&D with startups. Researchers develop lab and field prototypes, creating new technologies, which startups and companies then convert into products for customers. To ensure real-world readiness, technologies are engineered for robustness and adapted to practical needs by Technology Acceleration Centres (TACs), which also support startups and coordinate with R&D as needed. This two-way flow of knowledge between R&D and startups is driven primarily by passion, with funding playing a secondary role.
The second, the market cycle, connects content providers—such as publishers—with AI-based tools and services. Startups help translate content, develop voicebots, and enable providers to reach end users effectively.
The third, the social cycle, focuses on generating Indian language digital content—original and translated—through schools, colleges, language departments, cultural bodies, students, and state governments. This cycle cultivates love for languages and culture, trains e-translators, and produces valuable data.
Each cycle is driven by different forces: knowledge for technology, money for market, and service for social impact. They reinforce one another—trained manpower from the social cycle feeds the market, data from the social cycle improves AI in the technology cycle, and technology, in turn, strengthens the market and social engagement. While significant progress has been made in technology and some central government adoption, the market and social cycles remain underdeveloped and need focused energizing.
Outcomes of the Mission
Mission Bhashini has developed a comprehensive suite of SSMT (speech-to-speech machine translation) technologies for Indian languages, including ASR, MT, and TTS, engineered for large-scale deployment. OCR technology is also under development.
These technologies cover 20+ Indian languages with 350+ AI models. The free Bhashini app provides mobile services, while multiple government ministries use the technology—often as voicebots—to assist citizens with online services such as scheme enquiries and form submissions.
Bhashini has translated over 200 higher-education courses on NPTEL and Swayam, converting English lectures into eight Indian languages, with subtitling support and ongoing expansion to more courses and languages. Open-sourced data and models have empowered individuals, institutions, and startups to freely access and utilize Indian language resources.
The next focus is energizing the market cycle by nurturing startups to deliver services across sectors like health, agriculture, and school education. Technology Acceleration Centres under the Mission will play a key role in supporting startups, while R&D continues exploring advanced techniques for prosody in speech processing and discourse in machine translation.
The Future role of prosody in speech and discourse in MT
Prosody—the rhythm, stress, and intonation of speech—conveys meaning and emotion beyond words, using pitch, loudness, and duration to signal questions, emphasis, sarcasm, or feelings. Future SSMT systems will leverage prosodic features in Indian languages and enable paragraph-level, not just sentence-level, translation. Indian academia is well positioned to achieve this.
The social cycle also needs energizing, involving citizens in creating and using Indian language content, mastering Bhashini tools, and eventually producing original content, with state governments playing a key role.
A revolution in Indian languages is ready to be unleashed.
Prof. Rajeev Sangal, a pioneering computer scientist and former Director of IIT (BHU) Varanasi and founder Director of IIIT Hyderabad offers a masterclass on AI and language technology. A distinguished alumnus of IIT Kanpur and the University of Pennsylvania, Prof. Sangal is a world-renowned expert in computational linguistics, best known for his groundbreaking work on the Computational Paninian Grammar framework for Indian languages.