Skip to main content

Table 7 Performance of WB-PESQ, NB-PESQ, STOI, and SI-SDR in the ablation study

From: Channel and temporal-frequency attention UNet for monaural speech enhancement

Model

#Para.

With reverb

Without reverb

  

WB-PESQ

NB-PESQ

STOI(%)

SI-SDR

WB-PESQ

NB-PESQ

STOI(%)

SI-SDR

Noisy

-

1.822

2.753

86.62

9.03

1.582

2.454

91.52

9.07

+ISA

6.5M

\(2.525\pm 0.089\)

\(3.290\pm 0.042\)

\(89.36\pm 0.19\)

\(13.38\pm 0.50\)

\(2.214\pm 0.076\)

\(3.023\pm 0.051\)

\(92.32\pm 0.15\)

\(13.84\pm 0.57\)

+ASA

5.4M

\(3.155\pm 0.012\)

\(3.658\pm 0.007\)

\(93.44\pm 0.03\)

\(16.14\pm 0.13\)

\(2.945\pm 0.017\)

\(3.520\pm 0.008\)

\(96.45\pm 0.05\)

\(17.35\pm 0.14\)

CTFUNet

6.1M

\({\textbf {3.196}}\pm 0.014\)

\({\textbf {3.673}}\pm 0.003\)

\({\textbf {93.63}}\pm 0.01\)

\({\textbf {16.36}}\pm 0.03\)

\({\textbf {2.979}}\pm 0.003\)

\({\textbf {3.540}}\pm 0.001\)

\({\textbf {96.64}}\pm 0.03\)

\({\textbf {17.60}}\pm 0.03\)

−RCAM

4.9M

\(3.157\pm 0.015\)

\(3.648\pm 0.006\)

\(93.51\pm 0.07\)

\(16.27\pm 0.08\)

\(2.951\pm 0.001\)

\(3.517\pm 0.002\)

\(96.53\pm 0.03\)

\(17.52\pm 0.05\)

−MCHCA

5.1M

\(3.143\pm 0.007\)

\(3.643\pm 0.003\)

\(93.38\pm 0.06\)

\(16.08\pm 0.02\)

\(2.951\pm 0.012\)

\(3.510\pm 0.001\)

\(96.54\pm 0.02\)

\(17.43\pm 0.16\)

−CTFSC

5.9M

\(2.996\pm 0.150\)

\(3.576\pm 0.054\)

\(92.93\pm 0.34\)

\(15.57\pm 0.60\)

\(2.820\pm 0.117\)

\(3.428\pm 0.072\)

\(96.08\pm 0.25\)

\(16.97\pm 0.48\)