Interspeech-2024 -

Dr. Chiranjeevi Yarra and his students Jhansi (Ph.D) and Harshitha (MS) presented two papers at Interspeech-2024 hled at Kos, Greece, from 1 -5 September. Both the students were awarded the ISCA Travel Grant of 1000 euros each

to support their participation in the conference. This grant is offered to a limited number of students and early-career scientists based on the quality of their submitted papers.

Harshitha began her journey as an intern in the Speech Lab through the Shrishti program, conducted annually by IIIT Hyderabad. She demonstrated her dedication and commitment by continuing her internship for an additional year after the program and has now converted into a Master’s student following this publication. During the conference, both of them delivered oral presentations on their respective papers.

A Comparative Analysis of Sequential Models that Integrate Syllable Dependency for Automatic Syllable Stress Detection – Jhansi Mallela, Sai Harshitha Aluru, and Chiranjeevi Yarra

Here is the summary of the paper as explained by the authors:

Automatic syllable stress detection is typically operated at syllable level with stress-related acoustic features. The stress placed on a syllable is influenced not only by its own characteristics but also by its context in the word. However, traditional methods for stress detection overlook the contextual acoustic factors that influence stress placement. By addressing this issue, we study sequential modeling approaches by integrating the syllable dependency for automatic syllable stress detection using a masking strategy. This approach considers a sequence of syllables at the word level and identifies its stress label sequence. We explore various sequential models, such as RNNs, LSTMs, GRUs, and Attention networks. We conduct experiments on the ISLE corpus comprising non-native speakers speaking English. From the experiments, we observe a significant improvement in the performance with all sequential models compared to the state-of-the-art non-sequential baseline (DNN).

Post-Net: A Linguistically Inspired Sequence-Dependent Transformed Neural Architecture for Automatic Syllable Stress Detection – Sai Harshitha Aluru, Jhansi Mallela, and Chiranjeevi Yarra presented

Here is the summary of the paper as explained by the authors:

Automatic syllable stress detection methods typically consider syllable-level features as independent. However, as per linguistic studies, there is a dependency among the syllables within a word. In this work, we address this issue by proposing a Post-Net approach using Time-Delay Neural Networks to exploit the syllable dependency in a word for stress detection task. For this, we propose a loss function to incorporate the dependency by ensuring only one stressed syllable in a word. The proposed Post-Net leverages the existing SOTA sequence-independent stress detection models and learns in both supervised and unsupervised settings. We compare the Post-Net with three existing SOTA sequence-independent models and also with sequential model (LSTMs). Experiments conducted on ISLE corpus show the highest relative accuracy improvement of 2.1% and 20.28% with the proposed Post-Net compared to the best sequence-independent SOTA model in supervised and unsupervised manners, respectively.

Conference page – https://interspeech2024.org/

October 2024