Fig. 1From: Training audio transformers for cover song identificationThe overall framework of our proposed ASimT for CSIBack to article page