From: Supervised Attention Multi-Scale Temporal Convolutional Network for monaural speech enhancement
 | Training dataset | English | Mix | English\(\varvec{\rightarrow }\)Mix | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Test dataset | Metrics | PESQ | STOI (%) | SDR | PESQ | STOI (%) | SDR | \(\varvec{\Delta }\)PESQ | \(\varvec{\Delta }\)STOI (%) | \(\varvec{\Delta }\)SDR |
English | Unprocessed | 1.96 | 92.6 | 13.63 | 1.96 | 92.6 | 13.63 | - | - | - |
CRN | 2.41 | 94.8 | 18.49 | 2.45 | 95.0 | 18.56 | 0.04 | 0.2 | 0.07 | |
MSTCN | 2.68 | 94.8 | 16.40 | 2.66 | 95.1 | 16.32 | − 0.02 | 0.3 | − 0.08 | |
LSTM-IRM | 2.80 | 95.9 | 18.84 | 2.80 | 96.0 | 18.92 | 0.00 | 0.1 | 0.08 | |
GCRN | 2.76 | 95.5 | 19.98 | 2.76 | 95.5 | 19.98 | 0.00 | 0.0 | 0.00 | |
GaGNet | 2.89 | 96.1 | 20.43 | 2.92 | 96.1 | 20.35 | 0.03 | 0.0 | − 0.08 | |
Conv-TasNet | 3.07 | 96.6 | 21.67 | 2.93 | 96.2 | 20.94 | − 0.14 | − 0.4 | − 0.73 | |
DCCRN | 3.17 | 96.6 | 21.20 | 3.14 | 96.5 | 20.57 | − 0.03 | − 0.1 | − 0.63 | |
DPCRN | 3.16 | 96.6 | 20.75 | 3.11 | 96.4 | 20.59 | − 0.05 | − 0.2 | − 0.16 | |
SA-MSTCN\(^{1}\) | 3.26 | 96.6 | 20.11 | 3.30 | 96.8 | 20.57 | 0.04 | 0.2 | 0.46 | |
SA-MSTCN\(^{2}\) | 3.29 | 96.7 | 20.46 | 3.35 | 97.0 | 21.13 | 0.06 | 0.3 | 0.67 | |
Chinese | Unprocessed | 2.31 | 88.2 | 16.81 | 2.31 | 88.2 | 16.81 | - | - | - |
CRN | 2.44 | 86.8 | 16.77 | 2.76 | 89.4 | 20.30 | 0.32 | 2.6 | 3.53 | |
MSTCN | 2.26 | 86.9 | 14.94 | 2.99 | 91.5 | 17.45 | 0.73 | 4.6 | 2.51 | |
LSTM-IRM | 2.80 | 90.4 | 18.37 | 3.15 | 92.6 | 20.97 | 0.35 | 2.2 | 2.60 | |
GCRN | 3.09 | 90.2 | 21.46 | 3.09 | 90.2 | 21.45 | 0.00 | 0.0 | − 0.01 | |
GaGNet | 2.77 | 89.1 | 18.02 | 3.16 | 91.2 | 21.38 | 0.41 | 2.1 | 3.36 | |
Conv-TasNet | 2.95 | 90.5 | 20.12 | 3.16 | 91.1 | 21.39 | 0.21 | 0.6 | 1.27 | |
DCCRN | 2.66 | 89.4 | 17.98 | 3.43 | 92.7 | 21.70 | 0.77 | 3.3 | 3.72 | |
DPCRN | 3.01 | 90.7 | 18.99 | 3.40 | 92.5 | 22.49 | 0.39 | 1.8 | 3.50 | |
SA-MSTCN\(^{1}\) | 3.21 | 92.2 | 20.74 | 3.58 | 93.6 | 21.92 | 0.37 | 1.4 | 1.18 | |
SA-MSTCN\(^{2}\) | 3.26 | 92.3 | 21.00 | 3.60 | 93.8 | 22.21 | 0.34 | 1.5 | 1.21 | |
Mix | Unprocessed | 2.08 | 91.7 | 14.81 | 2.08 | 91.7 | 14.81 | - | - | - |
CRN | 2.55 | 93.8 | 19.29 | 2.55 | 93.8 | 19.29 | 0.00 | 0.0 | 0.00 | |
MSTCN | 2.63 | 93.5 | 16.40 | 2.77 | 94.3 | 16.82 | 0.14 | 0.8 | 0.42 | |
LSTM-IRM | 2.82 | 94.6 | 19.26 | 2.90 | 95.2 | 19.90 | 0.08 | 0.6 | 0.64 | |
GCRN | 2.86 | 94.4 | 20.81 | 2.85 | 94.4 | 20.83 | − 0.01 | 0.0 | 0.02 | |
GaGNet | 2.83 | 94.0 | 19.84 | 2.98 | 94.9 | 21.04 | 0.15 | 0.9 | 1.20 | |
Conv-TasNet | 2.96 | 94.6 | 20.55 | 2.99 | 95.0 | 21.50 | 0.03 | 0.4 | 0.95 | |
DCCRN | 3.08 | 95.0 | 20.65 | 3.22 | 95.7 | 21.48 | 0.14 | 0.7 | 0.83 | |
DPCRN | 3.14 | 95.2 | 20.98 | 3.19 | 95.6 | 21.53 | 0.05 | 0.4 | 0.55 | |
SA-MSTCN\(^{1}\) | 3.24 | 95.7 | 20.98 | 3.38 | 96.1 | 21.45 | 0.14 | 0.4 | 0.47 | |
SA-MSTCN\(^{2}\) | 3.26 | 95.8 | 21.30 | 3.41 | 96.2 | 21.95 | 0.15 | 0.4 | 0.65 |