Dr. Chiranjeevi Yarra and his students presented the following papers at Interspeech 2021, Brno, Czech Republic from 30 August – 3 September.
- Noise robust pitch stylization using minimum mean absolute error criterion – Chiranjeevi Yarra and Prasanta Kumar Ghosh.
Research work as explained by the authors: We propose a pitch stylization technique in the presence of pitch halving and doubling errors. The technique uses an optimization criterion based on a minimum mean absolute error to make the stylization robust to such pitch estimation errors, particularly under noisy conditions. We obtain segments for the stylization automatically using dynamic programming. Experiments are performed at the frame level and the syllable level. At the frame level, the closeness of stylized pitch is analyzed with the ground truth pitch, which is obtained using a laryngograph signal, considering root mean square error (RMSE) measure. At the syllable level, the effectiveness of perceptual relevant embeddings in the stylized pitch is analyzed by estimating syllabic tones and comparing those with manual tone markings using the Levenshtein distance measure. The proposed approach performs better than a minimum mean squared error criterion based pitch stylization scheme at the frame level and a knowledge based tone estimation scheme at the syllable level under clean and 20dB, 10dB and 0dB SNR conditions with five noises and four pitch estimation techniques. Among all the combinations of SNR, noise and pitch estimation techniques, the highest absolute RMSE and mean distance improvements are found to be 6.49Hz and 0.23, respectively.
- Multilingual and code-switching ASR challenges for low resource Indian languages – Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram and Basil Abraham.
Research work as explained by the authors: Recently, there is increasing interest in multilingual automatic speech recognition (ASR) where a speech recognition system caters to multiple low resource languages by taking advantage of low amounts of labeled corpora in multiple languages. With multilingualism becoming common in today’s world, there has been increasing interest in code-switching ASR as well. In code-switching, multiple languages are freely interchanged within a single sentence or between sentences. The success of low-resource multilingual and code-switching ASR often depends on the variety of languages in terms of their acoustics, linguistic characteristics as well as the amount of data available and how these are carefully considered in building the ASR system. In this challenge, we would like to focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages, namely Hindi, Marathi, Odia, Tamil, Telugu, Gujarati and Bengali. For this purpose, we provide a total of ~600 hours of transcribed speech data, comprising train and test sets, in these languages including two code-switched language pairs, Hindi-English and Bengali-English. We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
International Speech Communication Association (ISCA) is a non-profit organization. Its original statutes (statutes in French or their translation in English) were deposited on February 23rd at the Prefecture of Grenoble in France by René CARRÉ and registered on March 27th, 1988.
The association started as ESCA (European Speech Communication Association) and, since its foundation, has been steadily expanding and consolidating its activities. It has offered an increasing range of services and benefits to its members and also it has put its financial and administrative functions on a firm professional footing. Indeed, over the ten years of its existence, ESCA has evolved from a small EEC-supported European organization to a fully-independent and self-supporting international association.
At the General Assembly that took place during the Eurospeech conference in Budapest (September 1999), ESCA became a truly international association in the global field of speech science and technology, changing its name to ISCA (International Speech Communication Association) and modifying its statutes accordingly.
The purpose of the association is to promote, in an international world-wide context, activities and exchanges in all fields related to speech communication science and technology. The association is aimed at all persons and institutions interested in fundamental research and technological development that aims at describing, explaining and reproducing the various aspects of human communication by speech, that is, without assuming this enumeration to be exhaustive, phonetics, linguistics, computer speech recognition and synthesis, speech compression, speaker recognition, aids to medical diagnosis of voice pathologies.
More details at: https://www.interspeech2021.org/