Priyam Jain received his MS-Dual Degree in Electronics and Communication Engineering (ECE). His research work was supervised by Dr. Anil Kumar Vuppala. Here’s a summary of Priyam Jain’s MS thesis, Spoken Language Identification under Emotional Speech Variations as explained by him:
Identifying language information from speech utterance is referred to as spoken language identification. Language Identification (LID) is essential in multilingual speech systems. There are various conditions under which the performance of LID systems are sub-optimal, such as short duration, background noise, channel variation, and so on. There have been efforts to improve performance under these conditions, but the impact of speaker emotion variation on the performance of LID systems has not been studied. Hence in contrast to the previous studies, for the first time in the literature, the present work investigated the impact of emotional speech on language identification. In this work, different emotional speech databases have been pooled to create the experimental setup. A dataset of this kind was not available for LID, and is a contribution of this thesis. Additionally, state-of-arti-vectors, time-delay neural networks (TDNN), long short term memory (LSTM), and deep neural network (DNN) x-vector systems have been considered to build the LID systems. Performance of the LID system has been evaluated for speech utterances of different emotions in terms of equal error rate and Cavg. The results of the study indicate that the speech utterances of anger and happy emotions degrades performance of LID systems more compared to the neutral and sad emotions.
To that effect, we investigated adaptation approaches for improving the performance of LID systems by incorporating emotional utterances in the form of adaptation dataset. Hence, we studied a prosody modification technique called Flexible AnalysisSynthesis Tool (FAST) to vary the emotional characteristics of an utterance in order to improve the performance, but the results were inconsistent and not satisfactory.Therefore, we propose a combination of Recurrent Convolutional Neural Network (RCNN) based architecture with multi stage training methodology, which outperformed state-of-art LID systems.