Developing a unit selection voice given audio without corresponding text

EURASIP Journal on Audio, Speech, and Music Processing

Table 5 Posterior probability and duration zscore thresholds used to achieve different amounts of data pruning, for all four voices

Percent units used	Posterior probability and duration zscore thresholds
	Test data = Olive		Test data = Intro. to Public Speaking
	ASR trained on Olive data	ASR trained on LibriSpeech	ASR trained on lecture data	ASR trained on LibriSpeech
100	–	–	–	–
	1.00, ≈97 %	1.00, ≈92 %	1.00, ≈93 %	1.00, ≈65 %
50	1.00, ± 0.51	1.00, ± 0.70	1.00, ± 0.57	1.00, ± 0.98
30	1.00, ± 0.35	1.00, ± 0.45	1.00, ± 0.39	1.00, ± 0.60