Channel and temporal-frequency attention UNet for monaural speech enhancement

EURASIP Journal on Audio, Speech, and Music Processing

Table 8 Denoising performance comparison of CTFUNet with other models

Model	#Para.	Year	With reverb				Without reverb
			WB-PESQ	NB-PESQ	STOI(%)	SI-SDR	WB-PESQ	NB-PESQ	STOI(%)	SI-SDR
Noisy	-	-	1.822	2.753	86.62	9.033	1.582	2.454	91.52	9.07
DCCRN [51]	3.7M	2020	-	3.077	-	-	-	3.266	-	-
DCCRN+ [52]	4.7M	2021	-	3.30	-	-	-	3.33	-	-
Conv-TasNet [8]	5.1M	2019	2.75	-	-	-	2.73	-	-	-
PoCoNet [53]	50M	2020	2.832	-	-	-	2.748	-	-	-
CTS-Net [54]	4.4M	2021	3.02	3.47	92.7	15.58	2.94	3.42	96.66	17.99
FullSubNet [14]	5.6M	2021	3.057	3.584	92.11	16.04	2.882	3.428	96.32	17.30
GaGNet [55]	5.9M	2022	3.18	3.57	93.22	16.57	3.17	3.56	97.13	18.91
FullSubNet+ [15]	8.7M	2022	3.177	3.648	93.64	16.44	3.002	3.503	96.67	18.00
FS-CANet [56]	4.2M	2022	3.218	3.665	93.93	16.82	3.017	3.513	96.74	18.08
CTFUNet	6.1M	2023	3.367	3.741	94.39	17.16	3.176	3.639	97.17	18.66