November 2022
Shelly Jain Shelly Jain received her Master of Science – Dual Degree in Computational Linguistics (CL). Her research work was supervised by Dr. Anil Kumar Vuppala. Here’s a summary of her research work on Leveraging Phonological Information for Indian Language Speech Systems:
Abstract Linguistic diversity is one of the defining traits of the Indian subcontinent. With thousands of official languages and countless mother tongues, multilingualism is a way of life for the people. Today, as speech technology gains popularity due to the rising number of smart assistants, this abundance of natural languages is an obvious target for developing new speech systems. A majority of Indian languages happen to be low resource due to any of the following reasons that arise from the Indian context: 1) A lack of native speakers of the desired language; 2) A lack of recorded spontaneous speech for modelling natural speaker behaviour; 3) Poor quality of recordings due to external noise or minimal proper documentation; 4) Use of prolific code-switching in natural speech. Such limitations to the data that can be collected encourages us to look for alternatives that can both compensate for the lack of target data as well as properly utilise the feature of the languages available. Models to handle multilingualism are becoming increasingly complex, but they fail to be robust to the Indian context. Without any precise understanding of the way the models operate, it is impossible to adapt them to Indian language requirements. In addition, there is a lack of linguistic studies which have been conducted on Indian multilingualism in the context of developing speech systems. Due to this, even the requirements for making a system robust to the Indian context are largely unclear. The first step to adapting existing speech systems is understanding the models currently in use. Then, it is important to understand the context – in this case, to become familiar with the Indian context and the way Indian languages interact. This provides a theoretical basis which helps properly utilise existing resources with lower data reliance to manage Indian speech systems. This also forms the foundation for developing new frameworks which can either leverage the unique context of Indian languages or benefit from their shared features. In this thesis we have produced tools and results which will form the basis for future work on both multilingual and acquired language studies. We analyse the working of the joint acoustic model in multilingual automatic speech recognition systems, and determine which phonological properties of languages most affect model performance. We also demonstrate how the output patterns and accuracy are affected by the choice of language for each of the architectures that was used. Such results can be used to make appropriate choices for architectures based on the available data as well as the desired outputs for different applications, avoiding any unnecessary complications. We also study acquired language pronunciations, providing a dictionary for converting American English lexicons to Indian English as well as phonetic transformation rules which characterise Indian English relative to British Received Pronunciation. In our analysis, we also create a linguistic profile for 18 major Indian languages which describes the phonemes and phonological behaviours of Indian native languages which may be considered “universal” or regional. These tools can be adapted for handling variations in pronunciation or adapting accents to make existing systems more robust. Thus, the work in this thesis is a foundation for further studies of Indian language speech systems.