EURASIP Journal on Audio, Speech, and Music Processing

Table 1 The ablation study of the effect of different modules on the model performance

From: Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting

	Model	Seen			Unseen
	Model	EER (%)	minDCF	Acc (%)	EER (%)	minDCF	Acc (%)
Without noise	Convmixer	-	-	96.25	-	-	94.44
	ECAPA-TDNN	0.17	0.009	-	1.88	0.149	-
	Baseline	0.18	0.010	96.16	1.83	0.125	94.77
	+SA	0.18	0.007	96.45	1.69	0.113	95.15
	+SE	0.14	0.006	96.56	1.68	0.112	95.06
	+DCA	0.11	0.006	98.37	1.55	0.110	96.57
With noise	Convmixer	-	-	91.96	-	-	90.36
	ECAPA-TDNN	1.62	0.077	-	4.18	0.223	-
	Baseline	1.64	0.078	92.08	4.16	0.214	90.61
	+SA	1.59	0.079	92.21	4.10	0.209	90.91
	+SE	1.58	0.069	92.89	4.07	0.208	91.11
	+DCA	1.27	0.059	95.61	3.98	0.200	93.08

Back to article page