Dr. Srikanth Ronanki, Senior Applied Scientist, Amazon Web Services (AWS) and an alumnus of Language Technologies Research Center (LTRC) gave a talk on Unified Streaming and non-Streaming Automatic Speech Recognition on 7 September.
Here is the summary of his talk: Recently, there has been an increasing interest in unifying streaming and non-streaming ASR models to reduce development, training and deployment cost. However, training such systems presents its own set of challenges. Currently, there exists a notable performance gap between the non-streaming mode of the unified system and an independently trained full-contextual model. Furthermore, achieving low latency by restricting a chunk’s left context when streaming often compromises accuracy. In this talk, Dr. Ronanki delved into the intricacies of training unified ASR systems using convolution-augmented Transformers. He gave comprehensive insights into the encountered challenges, the ingenious solutions devised to address them, and the corresponding outcomes achieved.
Dr. Srikanth Ronanki has 6 years of industrial experience in speech processing including building text-to-speech and speech recognition applications. He obtained his Ph.D. from the University of Edinburgh in 2018 and a dual degree from IIITH in 2012. His prolific career includes authoring more than 40+ research papers and 10+ patents, which have collectively received over 1,100 citations.