Skip to main content

Table 5 List of papers addressing main expressive speech synthesis challenges. “IL” stands for information leakage, “LR” is a shortcut for inference that lack reference audio, “PC” stands for prosody controllability and “US” stands for unseen style/speaker

From: Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources

References

Challenges addressed

IL

LR

PC

US

[62]

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

[59, 96, 126]

\(\checkmark\)

\(\checkmark\)

 

\(\checkmark\)

[97, 112, 120]

\(\checkmark\)

 

\(\checkmark\)

\(\checkmark\)

[18, 22, 25, 52, 93, 98, 107, 119]

\(\checkmark\)

  

\(\checkmark\)

[23, 54, 101, 102, 117, 130, 137]

\(\checkmark\)

   

[33, 63, 68, 70, 116]

 

\(\checkmark\)

\(\checkmark\)

 

[21, 47, 111]

\(\checkmark\)

\(\checkmark\)

  

[19]

\(\checkmark\)

 

\(\checkmark\)

 

[31, 48, 49, 53, 55, 57, 58, 61, 71, 72, 79, 99, 105, 113, 122,123,124,125, 128, 133, 138]

  

\(\checkmark\)

 

[17, 35, 37, 44, 46, 50, 73, 91, 94, 100, 129, 131]

 

\(\checkmark\)