Skip to main content

Table 1 Ablation study of the proposed model is shown in terms of averaged SDR, STOI, and PESQ metrics. The proposed model is indicated in the BOLD Italic text. N indicated the depth of UNet

From: Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement

Metrics

TAN model

Par. (M)

PESQ

STOI (%)

SDR (in dB)

SNR (in dB)

ATAB

AFAB

AHA

-

− 5.00

0.00

5.00

Avg.

− 5.00

0.00

5.00

Avg.

− 5.00

0.00

5.00

Avg.

Raw speech

x

x

x

x

1.48

1.66

1.87

1.67

32.14

41.24

50.17

41.18

− 2.98

0.14

3.15

0.10

SCUNet (= 5)

x

x

x

13.20

2.18

2.41

2.68

2.42

62.01

69.46

76.04

69.17

5.78

8.03

10.56

8.12

TANSCUNet

\(\checkmark\)

x

x

3.25

2.31

2.69

2.91

2.63

64.35

71.05

78.32

71.24

6.89

9.07

11.09

9.02

TANSCUNet

x

\(\checkmark\)

x

3.25

2.53

2.78

3.02

2.78

66.16

73.26

80.72

73.38

7.83

10.18

11.57

9.86

TANSCUNet

\(\checkmark\)

\(\checkmark\)

x

3.51

2.66

2.91

3.16

2.90

68.33

75.54

82.26

75.38

8.55

10.91

12.13

10.53

TANSCUNet

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

3.51

2.85

3.12

3.37

3.08

72.52

79.65

84.36

78.84

9.81

11.85

13.62

11.76