[month] [year]

P Chandrasekhar – Acoustic Classification

September 2022

P Chandrasekhar received  his  doctorate in Electronics and Communication Engineering (ECE). His research work was supervised by Dr. Suryakanth V G. Here’s a summary of his research work on Investigation of features for acoustic scene classification:

Environmental sounds convey a large amount of information regarding day to day activities in nature. These sounds also provide powerful means of communication. The acoustic scene of environmental sounds contains a set of audio events, which occur over a certain period of time. Human hearing enables recognition of specific sounds and allows processing of these sounds continuously without any effort. Therefore, even in the absence of visual cues, humans can identify most of the times, events and sounds with acoustic cues. Acoustic Scene Classification (ASC) mainly focuses on portrayal of the acoustic surroundings from an audio track by choosing a textual tag for it. The motivation of this work is from the continuous demand for advanced functionality by automatically adapting the device configuration to the situation or context. In this regard, the main goal of this research work is to develop a robust acoustic scene classification system which analyses and classifies acoustic scenes. As the acoustic scene is a complex phenomenon, it would be challenging to extract scene-specific information from the environmental audio. However, further research is still needed to recognize the sound scenes reliably and to identify the individual sound events in the realistic soundscapes. Therefore, there is a need to investigate good feature representation that is useful for ASC. In this thesis, we have described the significance of various acoustic features for the ASC. In our work, we have explored the following features namely MFCC, IMFCC, LOGMEL, LPCC, APGD, SCMC, SSFC, and SFFCC for the representation of acoustic scenes. The effect of these features for the acoustic scene classification of DCASE 2017 and 2018 dataset has been studied using DNN classification models. From our studies, it has been observed that the complementary nature of different spectral features are used to represent better or capture perceptual quality present in the acoustic scene than single features. With this motivation to approximate the human auditory perception system, complementary specific spectral feature representations are used, analyzed, and investigated for better performance than any specific type of features. From our studies, it has been observed that, no single system has performed better for all the acoustic scenes. Therefore, we have performed DNN score fusion for the various features combination to improve the performance of ASC task. In general, when the number of acoustic scenes are more, it may be difficult for single classifier to discriminate all the classes correctly. Therefore, we have proposed a two-level hierarchical classification approach. This is done initially by identifying the meta-category of acoustic scene followed by the fine-grained classification belonging to a particular meta category. The methods proposed in this thesis for the categorization of acoustic scenes are helpful in some of the applications such as assistive robotic systems, context-aware mobile phones, and multimedia archive analysis. 

Keywords: Acoustic Scene Classification, Mel-Frequency Cepstral Coefficients, Inverted MelFrequency Cepstral Coefficients, Spectral Centroid Magnitude Coefficients, Subband Spectral Flux Coefficients, Single Frequency Filtering Cepstral Coefficients, Deep Neural Networks, Late Fusion Mechanism, Hierarchical Classification, DCASE.