Fig. 4From: W2VC: WavLM representation based one-shot voice conversion with gradient reversal distillation and CTC supervisionBlock diagram of speaker encoder. The input features are fed to the reference encoder followed by a style token layer to get the reference embeddingBack to article page