Emotion recognition in speech is a key research topic in human-computer interaction. Understanding emotions in conversations can shed light on a person's well-being. This study introduces a hybrid architecture that combines acoustic and deep features for improved speech emotion recognition. Acoustic features like RMS energy and MFCC are extracted from voice records. Additionally, sound spectrogram images are processed using deep networks like VGG16 and ResNet to obtain deep features. These are merged into a hybrid feature vector, refined by the ReliefF algorithm. For classification, the Support Vector Machine is employed. Testing on datasets like RAVDESS and EMO-DB yielded accuracy rates up to 90.21%. Our method consistently outperformed existing techniques in accuracy.
Copyrights © 2023