EURASIP Journal on Audio, Speech, and Music Processing

Table 7 STOI values of baselines under unseen noise condition. Proposed model represented by bold and italic letters

From: Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement

Metric	STOI
Noise	Train				Airport				Exhibition hall
SNR (dB)	− 5	0	5	Avg.	− 5	0	5	Avg.	− 5	0	5	Avg.
Noisy mixture	53.54	59.14	66.55	59.74	57.57	63.16	68.84	63.19	59.17	65.61	69.28	64.69
Bi-LSTM [31]	68.31	72.47	76.19	72.32	69.25	73.11	78.05	73.47	69.42	74.14	78.53	74.03
Bi-CRN [34]	69.59	73.55	77.45	73.53	70.76	75.53	79.34	75.21	70.14	75.98	80.31	75.48
SEGAN [40]	70.61	75.44	79.75	75.27	71.26	76.81	81.76	76.61	71.43	77.76	81.66	76.95
GRN [30]	71.57	76.51	81.58	76.55	73.39	78.63	82.37	78.13	73.13	79.36	82.22	78.24
DCN [38]	73.25	78.62	83.64	78.50	74.86	79.15	83.19	79.07	75.86	80.04	84.18	80.03
DCCRN [35]	74.62	79.35	84.03	79.33	75.29	80.73	84.54	80.19	77.04	81.23	86.55	81.61
TSTNN [41]	75.31	80.45	85.71	80.49	76.36	81.92	85.35	81.21	78.36	83.15	87.94	83.15
MASENet [46]	77.29	81.04	86.16	81.50	78.29	83.79	87.43	83.17	79.47	84.26	88.23	83.99
SADNUNet [47]	78.14	83.87	87.08	83.03	79.08	84.64	88.02	83.91	80.24	85.23	89.03	84.83
MCGN [42]	79.86	84.73	88.14	84.24	79.64	85.53	89.01	84.72	80.82	86.01	90.42	85.75
DBT-Net [51]	80.41	85.07	89.11	84.86	80.57	86.23	89.38	85.39	81.02	86.73	91.31	86.35
*TANSCUNet*	81.91	86.59	90.95	86.48	82.61	87.85	91.06	87.17	83.14	87.97	92.21	87.77

Back to article page