Speech emotion recognition based on Graph-LSTM neural network

EURASIP Journal on Audio, Speech, and Music Processing

Table 1 Comparison between SER baselines and proposed model

Model	UA (%)	WA (%)	Condition
DCNN 2020 [41]	-	64.3	4490 utterances
ResNet34 2021 [42]	61.61	66.02
ADNN + SVM 2019 [43]	-	65.01
Graph baselines
PATCHY-SAN 2016 [11]	56.27	60.34
PATCHY-Diff 2018 [11]	58.71	63.23
Compact SER 2021 (cycle) [11]	62.27	65.29
Ours (Mean pooling)	59.16	68.15
Ours (Weighted pooling)	65.39	71.83
LSTM-GIN 2022 [46]	65.53	64.65	5531 utterances
CoGCN 2022 [33]	63.67	62.64
GA-GRU 2020 [25]	63.8	62.27
Ours (Mean pooling)	68.65	68.11

The Bold represents the best results. ’-’ means that the result is not recorded in the report