Fig. 11From: Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resourcesThe general framework of wav2vec technique and its utilization as a feature extractor for generating speech representations as input to the TTS modelBack to article page