Skip to main content

Table 5 List of papers addressing main expressive speech synthesis challenges. “IL” stands for information leakage, “LR” is a shortcut for inference that lack reference audio, “PC” stands for prosody controllability and “US” stands for unseen style/speaker

From: Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources


Challenges addressed










[59, 96, 126]





[97, 112, 120]





[18, 22, 25, 52, 93, 98, 107, 119]




[23, 54, 101, 102, 117, 130, 137]



[33, 63, 68, 70, 116]





[21, 47, 111]









[31, 48, 49, 53, 55, 57, 58, 61, 71, 72, 79, 99, 105, 113, 122,123,124,125, 128, 133, 138]




[17, 35, 37, 44, 46, 50, 73, 91, 94, 100, 129, 131]

