Abstract: Environmental Sound Recognition (ESR) is an essential task in audio analysis, involving the identification and classification of sounds from various environmental contexts. This study ...
Abstract: In this work, we propose CleanMel, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance ...
End-to-end implementation of a learnable cross-modal adapter for IoT intrusion detection, strictly adapted from: "A Learnable Cross-Modal Adapter for Industrial Fault Detection Using Pretrained Vision ...
All the datasets must be located in the datasets folder. This folder should contain the following subfolders after downloading the datasets: GTZAN Speech_Music: Contains the GTZAN Speech Music dataset ...