Speech Emotion Recognition (SER) recognizes the emotional features of speech signals regardless of semantic content. Deep Learning techniques have proven superior to conventional techniques for emotion recognition due to advantages such as speed and scalability and infinitely versatile operation. However, since emotions are subjective, there is no universal agreement on evaluating or categorizing them. The main objective of this paper is to design a suitable model of Convolutional Neural Network (CNN) – Stride-based Convolutional Neural Network (SCNN) by taking a smaller number of convolutional layers and eliminate the pooling-layers to increase computational stability. This elimination tends to increase the accuracy and decrease the computational time of the SER system. Instead of pooling layers, deep strides have been used for the necessary dimension reduction. SCNN is trained on spectrograms generated from the speech signals of two different databases, Berlin (Emo-DB) and IITKGP-SEHSC. Four emotions, angry, happy, neutral, and sad, have been considered for the evaluation process, and a validation accuracy of 90.67% and 91.33% is achieved for Emo-DB and IITKGPSEHSC, respectively. This study provides new benchmarks for both datasets, demonstrating the feasibility and relevance of the presented SER technique.
Dettaglio pubblicazione
2020, IEEE Xplore, Pages -
Speech emotion recognition using convolution neural networks and deep stride convolutional neural networks (04b Atto di convegno in volume)
Wani T. M., Gunawan T. S., Qadri S. A. A., Mansor H., Kartiwi M., Ismail N.
keywords