Skip to main content

Table 1 Accuracy for five systems on noise type 1 (subway noise) of test set A

From: Sparse coding of the modulation spectrum for noise-robust automatic speech recognition

 

Clean

20 dB

15 dB

10 dB

5 dB

0 dB

−5 dB

Sys1

90.51

91.00

89.53

87.69

83.76

76.76

65.31

(single frame)

Sys2

(single frame)

89.19

89.62

87.57

83.54

76.51

62.57

36.91

(LDA transformed)

Sys3

(29 frames)

87.50

88.70

87.41

85.42

77.62

59.41

27.85

(LDA transformed)

Sys4

89.71

90.57

89.28

87.41

84.13

77.71

63.83

(9 bands - GA)

Sparse coding [24]

93.12

90.18

87.22

82.62

72.64

56.31

34.57

5-frame exemplars

Sparse coding [24]

93.21

91.86

91.53

89.62

87.47

80.01

61.61

30-frame exemplars

  1. Sys1, 135-D vectors; Sys2, LDA-transformed 135-D vectors of Sys1; Sys3, LDA-transformed 29× 135-D vectors of 29 consecutive frames; Sys4, Sys1 plus nine recognizers operating on 15-D vectors, weights obtained from a genetic algorithm. Recognition results for noise type 1 using the sparse coding approach [20],[24] using 5 and 30 frame windows are included for comparison in the bottom part.