Significance of Phonological Features in Speech Emotion Recognition

Document Type

Article

Publication Date

9-1-2020

School

Computing Sciences and Computer Engineering

Abstract

© 2020, Springer Science+Business Media, LLC, part of Springer Nature. A novel Speech Emotion Recognition (SER) method based on phonological features is proposed in this paper. Intuitively, as expert knowledge derived from linguistics, phonological features are correlated with emotions. However, it has been found that they are seldomly used as features to improve SER. Motivated by this, we set our goal to utilize phonological features to further advance SER’s accuracy since they can provide complementary information for the task. Furthermore, we will also explore the relationship between phonological features and emotions. Firstly, instead of only based on acoustic features, we devise a new SER approach by fusing phonological representations and acoustic features together. A significant improvement in SER performance has been demonstrated on a publicly available SER database named Interactive Emotional Dyadic Motion Capture (IEMOCAP). Secondly, the experimental results show that the top-performing method for the task of categorical emotion recognition is a deep learning-based classifier which generates an unweighted average recall (UAR) accuracy of 60.02%. Finally, we investigate the most discriminative features and find some patterns of emotional rhyme based on the phonological representations.

Publication Title

International Journal of Speech Technology

Volume

23

Issue

3

First Page

633

Last Page

642

Find in your library

Share

COinS