Skip to main content
Fig. 3 | EURASIP Journal on Audio, Speech, and Music Processing

Fig. 3

From: Transformer-based autoencoder with ID constraint for unsupervised anomalous sound detection

Fig. 3

The architecture of the proposed IDC-TransAE for normal sound reconstruction. \(\varvec{X}\) and \(\varvec{\Phi }\) are the inputs to the model, which are obtained from the sound signal by removing the center frame. The final predicted center frame is obtained by average pooling of the output of the decoder \(\overline{\varvec{X}}\) in frames and a linear layer, and \(\hat{\varvec{l}}\) is the predicted machine ID probability of sound signal, which is obtained by max pooling of output of encoder \(\varvec{z}\) in frames and two linear layers with softmax. IDC-TransAE is optimized by the combination of reconstruction error and classification error

Back to article page