Fig. 3From: Multi-task deep cross-attention networks for far-field speaker verification and keyword spottingKeyword spotting branch: the structure diagram of Convmixer block. The convolution across both time-frequency and time-domain spaces yields frequency- and time-enriched embeddings. Soft attention from the previous output and 2D features connected to the block outputBack to article page