Model/method | Utilized for: | ||
---|---|---|---|
Inference without reference audio | ETTS based on text only | Additional ETTS features | |
BERT language model | |||
ELECTRA language model | [125] | ||
ELMo language model | [83] | ||
RoBERTa language model | |||
XLNet language model | |||
(GPT)-3 language model | [64] | ||
Parsing trees | [129] | ||
Prosody boundaries in text | |||
Constituency trees | [131] | ||
Sentiment analysis model | [30] | ||
Stanford Sentiment Parser | [135] | ||
Syntax-related features (such as POS: part of speech) | [127] | ||
Word emotion lexicon | [40] | ||
Term Frequency-Inverse Document Frequency (TF-IDF) (TF-IDF) | [99] | ||
Character/phoneme embedding | [20, 33, 37, 44, 47, 48, 63, 71, 72, 91, 94,95,96, 103, 111] |