Fig. 2From: Multi-task deep cross-attention networks for far-field speaker verification and keyword spottingShared encoder: a multi-layer stacked shared encoder is utilized across both tasks to reduce the impact of noise on the recognition rateBack to article page