Parvathi Priyanka Kommagouni supervised by Dr. Anil Kumar Vuppala received her Master of Science by Research in Computer Science and Engineering (CSE). Here’s a summary of her research work on Exploratory Attempts Towards Typical and Atypical Disfluency Classification in Indian English:
Disfluencies are inherent components of fluent speech, reflecting the natural processes involved in speech planning and production. They can be categorized into typical disfluencies, such as hesitations and self-repairs, and atypical disfluencies associated with speech disorders like stuttering. From a theoretical perspective, disfluencies can be interpreted as indicators of cognitive errors or as functional elements of conversation. The cognitive view posits that disfluencies signal covert errors in language formulation, while the strategic view considers them as pragmatic tools that facilitate communication. For instance, hesitations may serve to stall for time during language formulation or to correct errors, whereas filled pauses can indicate the speaker’s need for attention or signal uncertainty in message planning. Similarly, repetitions can enhance narrative cohesion or act as stalling techniques. This thesis explores the intricate distinctions between typical and atypical speech disfluencies, emphasizing their implications for speech technology and clinical applications. Accurate classification of these disfluencies is critical for enhancing voice assistants (VAs) designed for Persons Who Stutter (PWS), as misidentification can lead to premature cutoffs during speech. Furthermore, early detection of stuttering in children is vital to prevent mis-diagnosis as developmental language disfluency. To address these challenges, this research introduces the IIITH-TISA dataset, the first Indian English stammer corpus capturing atypical disfluencies, and extends the IIITH-IED dataset with detailed annotations for typical disfluencies. Employing Perceptually Enhanced Zero-Time Windowed Cepstral Coefficients (PE-ZTWCC) in conjunction with Shifted Delta Cepstra (SDC), we utilize a shallow Time Delay Neural Network (TDNN) classifier to achieve an average F1 score of 85.01passing traditional feature sets. Additionally, this study leverages intermediate representations from four pre-trained self-supervised models—Wav2Vec2.0, HuBERT, WavLM, and TERA—to classify typical and atypical disfluencies within the context of two novel Indian English datasets. Classification experiments utilizing support vector machines (SVM) and convolutional neural networks (CNN) demonstrate that 1 features extracted from HuBERT’s 5th layer yield a peak F1 score of 0.97. These results underscore the significance of intermediate layer representations in discerning subtle variations in speech patterns and contribute to the development of robust and interpretable systems for automatic speech disfluency classification. This research not only explores the understanding of speech disfluencies in a linguistically diverse context but also lays the groundwork for future innovations in speech technology that can better accommodate individuals with speech disorders.
May 2025