Skip to main content

Table 1 The ablation study of the effect of different modules on the model performance

From: Multi-task deep cross-attention networks for far-field speaker verification and keyword spotting

 

Model

Seen

Unseen

 

EER (%)

minDCF

Acc (%)

EER (%)

minDCF

Acc (%)

Without noise

Convmixer

-

-

96.25

-

-

94.44

ECAPA-TDNN

0.17

0.009

-

1.88

0.149

-

Baseline

0.18

0.010

96.16

1.83

0.125

94.77

+SA

0.18

0.007

96.45

1.69

0.113

95.15

+SE

0.14

0.006

96.56

1.68

0.112

95.06

+DCA

0.11

0.006

98.37

1.55

0.110

96.57

With noise

Convmixer

-

-

91.96

-

-

90.36

ECAPA-TDNN

1.62

0.077

-

4.18

0.223

-

Baseline

1.64

0.078

92.08

4.16

0.214

90.61

+SA

1.59

0.079

92.21

4.10

0.209

90.91

+SE

1.58

0.069

92.89

4.07

0.208

91.11

+DCA

1.27

0.059

95.61

3.98

0.200

93.08