From: Channel and temporal-frequency attention UNet for monaural speech enhancement
SNR | −5 dB | 0 dB | 5 dB | 10 dB | 15 dB | Avg. | −5 dB | 0 dB | 5 dB | 10 dB | 15 dB | Avg. |
---|---|---|---|---|---|---|---|---|---|---|---|---|
 | WB-PESQ | DNSMOS | ||||||||||
Noisy | 1.257 | 1.379 | 1.593 | 1.901 | 2.323 | 1.691 | 1.809 | 2.135 | 2.491 | 2.758 | 2.936 | 2.423 |
Uformer | 1.588 | 1.761 | 1.192 | 2.008 | 2.060 | 1.723 | 2.711 | 2.876 | 2.994 | 3.047 | 3.054 | 2.936 |
MTFAA | 1.575 | 1.831 | 2.153 | 2.485 | 2.822 | 2.173 | 2.695 | 2.920 | 3.101 | 3.228 | 3.310 | 3.051 |
CTFUNet | 2.005 | 2.341 | 2.711 | 3.052 | 3.358 | 2.693 | 3.000 | 3.145 | 3.252 | 3.323 | 3.368 | 3.218 |
 | STOI (%) | SI-SDR | ||||||||||
Noisy | 77.117 | 84.300 | 90.089 | 94.036 | 96.674 | 88.443 | 0.925 | 5.903 | 10.953 | 15.860 | 20.872 | 10.903 |
Uformer | 82.139 | 86.073 | 88.034 | 86.620 | 88.257 | 86.625 | 7.615 | 9.481 | 10.576 | 10.900 | 10.624 | 9.8392 |
MTFAA | 83.204 | 89.440 | 93.281 | 95.608 | 96.970 | 91.701 | 6.245 | 9.805 | 12.674 | 14.843 | 16.393 | 11.992 |
CTFUNet | 88.186 | 92.517 | 95.329 | 96.987 | 98.006 | 94.205 | 10.288 | 13.331 | 16.309 | 19.001 | 21.390 | 16.064 |