EURASIP Journal on Audio, Speech, and Music Processing

Table 7 Details of speech datasets used in our experiments

From: Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling

Corpus	#Speakers	#Utterances	Duration (hours)	Environment	Usage
Indic TTS, IITM [43]	2	8601	14	Studio	Training
Open SLR 63 - Train [44]	37	3346	5	Studio	Training
IMaSC [45]	8	34,473	49	Studio	Training
MSC [46]	75	1541	1	Natural	Training
IIITH [47]	1	1000	1	Studio	Development
Open SLR 63 - Test [44]	7	679	1	Studio	Testing

Back to article page