Speech emotion recognition using machine learning — A systematic review

Edit Your Profile

Speech emotion recognition using machine learning — A systematic review Article

Madanian, S, Chen, T, Adeleye, O et al. (2023). Speech emotion recognition using machine learning — A systematic review . 20 10.1016/j.iswa.2023.200266

cited authors

Madanian, S; Chen, T; Adeleye, O; Templeton, JM; Poellabauer, C; Parry, D; Schneider, SL

authors

Poellabauer, Christian

abstract

Speech emotion recognition (SER) as a Machine Learning (ML) problem continues to garner a significant amount of research interest, especially in the affective computing domain. This is due to its increasing potential, algorithmic advancements, and applications in real-world scenarios. Human speech contains para-linguistic information that can be represented using quantitative features such as pitch, intensity, and Mel-Frequency Cepstral Coefficients (MFCC). SER is commonly achieved following three key steps: data processing, feature selection/extraction, and classification based on the underlying emotional features. The nature of these steps, coupled with the distinct features of human speech, underpin the use of ML methods for SER implementation. Recent research works in affective computing employed various ML methods for SER tasks; however, only a few of them capture the underlying techniques and methods that can be used to facilitate the three core steps of SER implementation. In addition, the challenges associated with these steps, and the state-of-the-art approaches used in tackling them are either ignored or sparsely discussed in these works. In this paper, we present a systematic review of research that addressed SER tasks from ML perspectives over the last decade, with emphasis on the three SER implementation steps. Different challenges, including the issue of low-classification-accuracy of Speaker-Independent experiments, and solutions associated with them, are discussed in detail. The review also provides guidelines for SER evaluation with a focus on common baselines, and metrics available for experimentation. This paper is expected to serve as a comprehensive guideline for SER researchers to design SER solutions using ML techniques, motivate possible improvements of existing SER models, or trigger novel techniques to enhance SER performance.

publication date

November 1, 2023

Digital Object Identifier (DOI)

https://doi.org/10.1016/j.iswa.2023.200266

volume

20