IIITH’s Language Technologies Research Center (LTRC) has been exploring exciting possibilities in the emergent multi-lingual speech recognition domain. We caught up with Prof. Anil Kumar Vuppala, a central figure in speech technology to hear it.
“I am a recognition guy”, says speech researcher Anil Vuppala, whose work centres around recognition and synthesis, the two major areas in speech processing. His expertise paved the way for diverse applications in the speech domain including diagnosis systems in healthcare. His current project on crowd sourcing based Telugu language speech data collection to build an Automated Speech Recognition (ASR) may show the path for large data collection in Indian languages.
Armed with a Ph.D. from IIT Kharagpur, he joined IIITH in 2012. “What attracted me to IIITH was its robust speech processing laboratories and moreover Prof. Yegnanarayana, the guide of my IIT Kharagpur Prof. K. S. Rao was faculty here”, observed Anil. The LTRC Lab have been pushing boundaries with impressive work on speech signal processing, language, speaker, emotion and pathological speech diagnosis.
Building multilingual ASR and emotional Telugu speech recognition app
The Kohli Centre for Intelligent Systems (KCIS) funded project on ASR is where all the magic happens. The thought process started 4-5 years back, when Dr. Hari Krishna Vydana, one of Prof. Vuppala’s first students, started working on ASR at the IIITH Lab. At Interspeech 2018, the international conference on spoken language processing, Microsoft threw a challenge, providing 40 hours of data in three Indian languages. “We realised that, in spite of different orthographies, Indian languages share a common phonetic space. We decided to combine languages and to build a multi-lingual ASR to solve the data scarcity and code mixing problems”, explained Prof. Vuppala.
Speech is affected by Prosody, or the variations in intonation, pauses, stresses, intensity, vocal quality and accents. Interestingly, emotion is an important culture-specific variable. “You speak slower when you are sad and faster when happy. We collected Telugu language data in different emotional states and tried to give it a recognizer”, said the speech expert. “For Interspeech 2021, Microsoft has raised the bar to a multi-lingual challenge for six languages; Tamil, Gujarati, Hindi, Oriya, Bengali and Telugu and our previous research will serve as the baseline. We hope to add Indian-English and Marathi into the system in the future”.
As the Principal Investigator for TDIL – Technology Development for Indian Languages – (A MeitY funded) pilot venture, Anil has been working on the ambitious crowd sourcing project with a team of 500 data collectors and 30 transcribers spread across Telangana and Andhra Pradesh. Data is collected via an online platform where the speaker data on a given topic is recorded, fragmented and passed through rough ASR. The rough transcription goes through two levels of evaluation before it is finally approved. “At the end of the project in October 2021, we will deliver 2,000 hours of Telugu meta-data and a recogniser using Kaldi and data collection framework tools”, reported Anil.
Giving a voice to speech problems
“My core research area is in speech processing and recognition, with special reference to emotion and pathological speech; in that I have also worked with spastic dysarthria, cleft lip and palette, voice disorders, stuttering, dementia and emotional disorders. How can technology help them? On the pathological front, one of my PhD student worked on dysarthric speech, caused by either age, stroke, medical or motor problems of speech production. Our intention is to apply speech recognition software to detect a voice disorder and predict health issues”, said Anil.
“One study to emerge from the language laboratory was on single frequency filter bank based long-term average spectra for hyper-nasality detection and assessment in cleft lip and palate speech. Features from short speech segments were extracted and signal speech frequency was evaluated using a technique introduced by Prof. Yegnanarayana B, whose contribution in the signal processing domain has been pivotal to the project.”
IIITH has strategic collaborations with several institutions in India and abroad; like The Aalto University in Finland, as well as the All India Institute of Speech and Hearing (AIISH), Mysore, that is providing data on stuttering. IIITH is obtaining data on Parkinsonian and spastic disorders using open source English language data from abroad. “We have been working on the algorithm for automatic diagnosis system for voice disorders. We have a voice disorder data base with 100 plus disorders”, informed Anil. Voice related disorders were clinically classified into four discrete groups, for a research study. “We tried to evaluate each disorder and built the system in a manner that would aid the speech pathologist. We have been working with Indore neurologist Dr. Purohit to design a tool for recovery”.
Ami tumay Bahubhashak
Bahubhashak, the government’s ambitious Natural Language Translation Mission (NLTM), is working with IIITH and other leading academic institutions to create a national speech resource. Under the aegis of the Ministry of Electronics and Information Technology (MeitY), Bahubhashak is focusing on Indian Language technology systems and products that will be deployed with the help of start-ups, with technology transfer from the coordinating institutions.
Anil Vuppala has worked with start-up ventures like Ozonetel and Mashinga to build and deploy ASR modules. Vishnu Vidyadhar Raju, an alumnus of IIITH (2020) is planning to extend this technology to a commercial purpose by using emotion recognition for voice analytics in call centres.
IIITH worked on an interesting project funded by TDIL MeitY, that sought to evaluate the status of speech technology in India. With Microsoft, Google, CDAC and academic institutions as resource partners, the team used available technology to build speech to speech translation for a few episodes of Mann ki Baat from Hindi to English and Telugu. The study showed that India still had a long way to go, in the speech translation domain.
In 2015, as Principal Investigator, Anil Vuppala worked with Hindustan Aeronautics Ltd on the development of a speech recognition algorithm for airborne control system, that was deployed on IAF aircrafts and tested in pilot trails. “We had collected 100 speakers voice data for aircraft commands and built the system. Further to this, a noise reduction algorithm was developed for building voice interface into the aircraft, to factor in the high cockpit noise”.
The DST Young Scientist Scheme project on Language identification in practical environments was a three-year program where extensive data was collected in 23 Indian languages from 25 males and 25 females for language identification in the Indian context. “We developed multi head self-attention architectures in neural networks which we are applying on our data”, explained Anil Vuppala. As co-investigator for the MHRD funded IMPRINT 2 project Sevak, the team is providing speech recognition level inputs in the building of an intelligent Indian language Chatbot in collaboration with IIT Patna, with Wipro as the company partner. The Languages Lab has been working closely with Prof. Kavita Vemuri and the Cognitive Science department, developing algorithms for analysing brain signals for the speech processing aspect of the project.
From Bezwada with determination
“I am from Nagayalanka, the coastal part of Andhra where River Krishna merges with the Bay of Bengal. My grandfather worked on boats for a living”, said Anil with a rush of pride. The family moved to Vijayawada, where Anil and his older brother Sunil acquired a solid educational foundation, shining as class toppers through school and college. “My father, a government school teacher and my mother sacrificed a lot for our studies and we wanted to make them proud of us”, he said.
His secret mantra is a single minded focus on academics. “Everything else was secondary”, remarked Anil.” I usually maintain a small diary and plan my day. To reduce pressure, I list and prioritize my work”. The pandemic gave him the opportunity to spend more time with his two young children and his wife who is pursuing a PhD in IIITH. Listening to old classic Telugu movie songs, watching/playing cricket and collecting quotations are some of his pet hobbies. “From my grandfather, I learnt to appreciate literature” said the young professor who would scribble an inspirational quote on the blackboard in class every day, whether it was appreciated or not!
Anil has been the NSS program campus coordinator from 2013, conducting various special camps at nearby government schools, regular tree plantation drives, blood donation camps and Swacch bharat programs, apart from organising visits to orphanages for educational outreach programs.
On his future goals, Anil ponders, “I definitely want to be known in my sphere of expertise and build good systems for the Indian context. Value education is something that I want to see students sensitized to. As a person, I want to see how I can be useful to society.”.