Skip to main content

Table 7 Details of speech datasets used in our experiments

From: Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling

Corpus

#Speakers

#Utterances

Duration (hours)

Environment

Usage

Indic TTS, IITM [43]

2

8601

14

Studio

Training

Open SLR 63 - Train [44]

37

3346

5

Studio

Training

IMaSC [45]

8

34,473

49

Studio

Training

MSC [46]

75

1541

1

Natural

Training

IIITH [47]

1

1000

1

Studio

Development

Open SLR 63 - Test [44]

7

679

1

Studio

Testing