Gallery
AI4Bharat
Share
Explore
IndicASR

Multilingual ASR

Things to do

Accuracy
Augmentation in fine-tuning data <- read papers
Proper noun hack <- talk to Harveen
Adding hot words / domain specialization <- read papers, engineering efforts
Evaluating multilingual model options
Evaluating model size, batch size, ...
Latency
Time in AM vs LM <- measure
Reduce LM time with smaller LM <- compare accuracy, latency
Evaluating all preprocessing steps and then applying them to fine-tuning and pre-training data

Multilingual Finetuning results
3
Model Name
Language
odia
63
bengali
63
telugu
87
gujarati
84
hindi
65
marathi
69
tamil
84
tamil_32_2_-1
20
odia_32_2_-1
15
telugu_32_2_-1
20
bengali_32_2_-1
15
marathi_32_2_-1
15
gujarati_32_2_-1
20
hindi_32_2_-1
16
4
Test set
dcunk_new
21
dckn_new
21
mucs
21
dcunk_new
21
dckn_new
21
openslr
21
dcunk_new
22
dckn_new
22
mucs
21
msr
22
dcunk_new
21
dckn_new
21
mucs
21
msr
21
dcunk_new
22
dckn_new
21
mucs
22
dcunk_new
24
dckn_new
21
mucs
24
dcunk_new
21
dckn_new
21
mucs
21
msr
21
dcunk_new
5
dckn_new
5
mucs
5
msr
5
dcunk_new
5
dckn_new
5
mucs
5
dcunk_new
5
dckn_new
5
mucs
5
msr
5
dcunk_new
5
dckn_new
5
openslr
5
dcunk_new
5
dckn_new
5
mucs
5
dcunk_new
5
dckn_new
5
mucs
5
msr
5
dcunk_new
5
dckn_new
5
mucs
6
4
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
WER
CER
base_existing_test4
48
72
20.04
71.1
19.28
33.73
7.48
47.38
15.63
49.76
16.23
30.99
11.27
44.33
10.02
43.6
9.85
40.46
9.92
33.19
7.21
26.16
6.96
33.21
9.38
34.29
12.02
27.49
8.75
30.63
9.63
27.14
8.87
18.67
6.39
74.03
21.93
69.46
18.84
29.57
7.42
36.56
6.22
39.86
7.32
35.64
7.88
29.42
5.85
22.5
4.05
23.11
4.96
24.61
5.35
22.55
4.69
42.68
16.89
41.21
15.73
27.84
6.8
25.41
8.19
25.11
8.36
24.27
6.32
22.61
5.54
29.19
11.34
30.81
11.96
12.44
3.93
43.28
21.68
37.92
17.26
17.69
6.6
13.82
4.95
17.95
7.67
21.38
9.16
18.87
7.49
12.06
5.61
11.54
5.31
14.33
5.56
large_existing
48
57.34
13.52
57.08
13.17
28.77
4.64
35.55
12.18
37.83
12.99
25.85
9.79
37.61
7.87
36.15
7.3
35.8
8.35
29.2
6.14
21.86
5.75
26.2
7.08
29.59
10.08
23.63
7.38
23.86
7.05
20.05
6.17
16.38
5.25
58.69
15.08
56.32
13.29
20.74
4.26
33.13
5.37
34.38
5.91
32.5
6.93
26.99
5.25
21.43
3.62
21.42
3.93
23.59
4.94
21.91
4.45
32.46
9.56
32.74
9.25
24.25
4.19
22.33
5.99
21
5.44
22.63
5.59
21.28
4.97
23.5
7.83
25.17
8.43
10.95
3.34
31.06
12.74
28.6
10.41
13
3.88
12.05
3.91
14.22
5
19.02
7.6
17.58
6.72
9.25
3.83
8.58
3.37
12.97
4.56
base_bs2x_existing
24
78.28
23.39
78.17
22.72
37.81
9.95
46.29
15.52
48.07
15.67
27.65
10.29
40.31
8.63
39.74
8.63
37.36
9.08
30.21
6.55
24.38
6.7
31.67
9.18
32.35
11.24
25.65
8.14
29.86
9.61
26.95
8.98
19.34
6.66
74.62
22.83
71.18
19.49
36.59
9.33
34.86
5.93
36.79
6.72
33.36
7.31
27.25
5.32
base_steps2x_existing
24
81.94
25.65
81.77
24.8
39.23
10.96
47.08
16.12
50.47
16.69
27.79
10.28
40.84
8.98
41.5
9.75
37.77
9.1
30.35
6.52
26.87
7.43
33.87
9.97
32.88
11.43
25.76
8.17
28.76
9.49
27.79
9.31
19.79
6.82
74.44
23.73
71.05
20.45
38.53
10.92
34.6
5.88
37.39
6.9
33.39
7.24
27.32
5.31
base_multisoftmax_with_lid
24
71.77
19.88
70.39
19.04
34.38
7.54
47
15.54
47.35
15.78
30.43
10.93
45.49
10.46
43.93
9.92
39.68
9.77
32.78
7.11
26.17
7.04
34.31
9.62
34.18
12.01
27.23
8.65
30.5
9.67
27.92
9.17
18.61
6.33
75.33
22.4
68.57
18.45
31.44
7.4
37
6.24
39.9
7.54
35.66
7.8
29.06
5.7
base_multisoftmax_with_lid_wt10
24
74.84
21.35
74.33
20.81
35.37
8.29
45.92
15.28
48.97
16.25
29.31
10.67
42.44
9.43
42.07
9.53
39.02
9.49
31.69
6.89
26.53
7.14
35.37
10.19
33.11
11.6
26.5
8.39
29.64
9.57
28.08
9.25
18.28
6.22
74.99
22.99
70.39
19.71
37.48
9.42
35.74
6.07
38.26
7.05
34.18
7.55
28.07
5.57
base_embedding_768_1layer
24
72.26
20.5
71.85
19.63
34.71
7.66
46.87
15.56
49.89
16.37
30.61
11.14
44.53
10.12
43.92
9.94
40.16
9.81
32.61
7.11
27.87
7.48
36.25
10.34
34.18
12.04
27.37
8.75
32.02
10.22
28.56
9.37
18.57
6.36
76.19
23.5
71.13
20.01
33.21
7.86
37.06
6.35
39.48
7.42
35.31
7.8
28.87
5.71
base_embedding_768_3layer
48
73.55
21.26
73.1
20.34
36.46
8.34
45.1
15.02
46.27
15.54
29.6
10.7
43.97
9.84
42.63
9.48
38.97
9.52
31.9
7.01
25.65
6.91
31.99
9.13
33.46
11.7
26.73
8.54
29.7
9.36
27.48
8.87
18.5
6.22
72.37
21.8
68.56
18.27
31.21
7.24
36.16
6.09
38.74
7.06
34.73
7.53
28.48
5.6
22.5
4.07
22.85
4.95
24.14
5.22
22.53
4.66
45.21
18.59
43.92
17.12
30.78
7.83
25.58
8.41
24.75
8.42
23.94
6.27
22.34
5.5
27.92
10.57
28.82
10.85
11.89
3.71
44.39
22.76
38.27
17.04
17.36
6.37
14.34
5.39
18.37
8.15
21.15
9.11
18.88
7.45
11.92
5.53
11.93
5.36
14.07
5.38
singlesoftmax
30
77.95
23.45
77.76
22.65
36.16
8.68
56.81
34.94
65.99
46.24
27.25
10.18
42.22
10.52
42.22
10.52
41.79
11.04
41.79
11.04
39.06
9.59
31.67
6.95
31.67
6.95
32.15
14.81
46.23
27.82
36.27
13.09
28.97
9.21
43.66
31.5
43.66
31.5
35.98
21.52
19
6.64
19
6.64
94.76
70.86
86.71
53.62
42.53
17.17
35.86
6.27
38.73
7.38
35.08
7.74
28.72
5.63
14.88
6.09
bm
24
22.45
3.74
21.96
3.9
22.07
4.67
19.83
4.04
33.56
11.69
31.22
10.39
23.02
3.74
22.88
5.16
22.19
4.68
20.75
4.74
18.58
4.13
23.59
7.54
24.4
7.9
10.23
2.62
31.84
12.1
29.92
10.36
13.42
4.29
14.67
4.35
16.06
4.93
20.53
7.78
16.89
5.97
11.62
5.14
11.59
5
13.17
4.89
base_existing_vkdict
24
67.56
18.06
67.36
17.49
31.52
6.62
43.21
14.41
44.2
14.76
31.41
11.41
41.46
9.06
40.4
8.52
40.03
9.84
32.8
7.19
23.67
6.28
28.05
7.79
34.24
12.05
27.45
8.72
27.46
8.72
24.74
8.03
20.97
7.86
120.23
47.39
111.27
43.81
44.9
14.93
36.37
6.15
38.75
6.98
35.22
7.84
29.34
5.81
base_existing_fulldict
48
73.14
20.69
72.79
19.92
33.17
7.65
46.45
13.39
48.63
14.13
29.28
6.81
40.88
8.92
40
8.62
38.82
9.55
31.74
6.95
24.01
6.46
30.05
8.44
33.29
11.67
26.67
8.4
28.36
9.05
26.03
8.58
18.8
6.48
71.07
20.65
67.27
17.5
28.16
7.5
35.2
5.95
37.74
6.8
34.62
7.58
28.78
5.62
22.19
3.94
22.65
4.51
24.17
5.26
22.45
4.63
43.72
18.1
42.83
16.96
28.05
6.84
23.83
6.95
22.95
6.72
24.01
6.25
22.19
5.38
29.45
11.87
31.45
12.52
12.44
3.94
41
20.05
35.75
15.03
17.79
6.79
13.08
4.52
16.41
6.63
21.16
9.14
18.76
7.48
11.57
5.26
11.35
5.12
14.29
5.63
base_existing_unnormalized
24
73.82
20.91
73.26
19.77
33.84
7.55
49.69
15.48
51
15.67
29.61
8.37
42.46
9.24
42.85
9.75
37.55
9.08
30.36
6.61
27.94
7.72
36.14
10.41
33.06
11.44
26.17
8.24
28.84
9.27
26.97
8.9
18.16
6.21
77.14
23.94
71.26
19.59
34.3
8.41
35.18
5.99
37.38
6.82
32.99
7.22
27.32
5.37
base_existing_normalized
24
67.78
18.19
67.59
17.69
32.36
6.97
37.6
9.87
36.77
9.53
30.63
8.57
40.33
8.58
38.91
8.05
38.83
9.51
31.35
6.87
22.92
6.09
26.94
7.57
33.35
11.68
26.68
8.44
26.39
8.14
24.14
7.73
17.92
5.87
101.26
73.78
98.82
66.63
50.05
17.28
34.59
5.75
36.95
6.51
33.33
7.34
27.66
5.39
base_dc_normalized
24
26.23
6.78
22.42
5.81
36.45
7.46
18.28
4.94
17.69
4.85
69.25
12.9
29.59
5.53
26.27
4.29
41.44
10.13
34.08
7.53
18.55
4.91
19.12
5.19
43.47
15.44
36.76
11.86
12.83
3.91
11.06
3.41
29.66
10.55
31.6
8.89
29.58
7.58
50.05
22.54
27.93
4.34
27.1
4.18
35.64
8.01
30.01
6.04
base_dc_unnormalized
24
26.83
6.98
22.88
6.1
37.65
7.9
20.1
5.72
19.19
5.54
74.63
18.21
29.18
5.32
25.4
4.09
41.55
10.11
33.92
7.43
18.01
4.8
18.32
4.93
42.25
15.05
35.18
11.44
12.64
3.93
10.57
3.29
30.67
11.46
22.88
6.06
20.87
5.16
35.41
6.65
27.57
4.29
26.35
4.13
35.34
7.9
29.89
5.98
marathi_monolinugal
2
91.05
52.3
55.53
24.73
marathi_monolinugal_dc
2
23.9
6.79
34.79
6.86
multilingual_dc_unnormalized_singlesoftmax
24
27.51
7.87
22.84
6.06
41.32
14.62
20.07
5.75
19.29
5.49
71.89
22.93
28.82
5.4
25.42
4.11
41.42
10.27
33.58
7.48
17.66
4.73
18
4.9
42.05
15.58
35.28
12.09
11.97
3.88
10.18
3.17
30.6
13.15
22.73
6.11
21.02
5.18
35.25
7.34
26.57
4.12
25.65
3.94
35.34
7.89
29.28
5.87
base_dc_multisoftmax_with_lid
24
26.76
6.98
23.07
6.16
37.61
7.84
20.39
5.84
19.68
5.66
88.01
16.93
29.47
5.47
25.97
4.28
42.1
10.21
34.45
7.47
18.01
4.85
18.6
5.01
42.52
15.12
35.53
11.58
12.69
3.97
10.92
3.4
30.22
11.29
23.12
6.16
21.34
5.34
37.58
6.2
27.48
4.32
26.65
4.22
35.74
8.05
30.17
6.11
base_dc_embedding_768_3layer
24
26.48
6.91
22.94
6.08
35.89
7.21
19.96
5.72
19.18
5.52
85.28
17.29
29.01
5.37
25.3
4.12
41.25
9.86
33.89
7.31
17.77
4.78
18.25
4.95
42.29
15.09
35.15
11.51
12.29
3.84
10.62
3.29
30.06
11.24
22.65
6.01
20.93
5.2
35.28
6.2
27.12
4.31
26.29
4.12
35.6
7.95
29.89
6.01
base_dc_embedding_768_3layer_bs2x
24
25.93
6.7
21.93
5.85
36.05
7.1
18.75
5.45
17.79
5.24
88.93
19.18
26.87
4.83
23.65
3.73
39.95
9.45
32.43
6.96
16.63
4.45
16.9
4.55
40.61
14.48
33.71
10.92
11.32
3.57
9.7
2.99
28.92
10.85
21.11
5.68
19.37
4.82
34.89
6.29
25.84
3.95
24.35
3.7
34.12
7.57
28.63
5.72
marathi_monolinugal_unnormalized
2
90.63
50.97
54.81
22.56
base_dc_normalized_fixed
24
28.03
7.49
23.74
6.44
37.12
7.69
27.78
9
27.05
8.93
71.77
13.22
30.34
5.73
26.31
4.41
42.23
10.33
35.04
7.61
18.31
4.89
18.96
5.17
43.28
15.38
35.93
11.74
12.99
4.08
11.18
3.5
30.65
11.5
24.22
6.47
22.13
5.57
36.1
6.54
27.89
4.45
27.46
4.3
36.53
8.24
30.68
6.18
base_dc_normalized_nodrop
24
26.29
9.43
22.24
8.77
37.44
7.96
27.13
10.96
26.71
10.89
63.18
15.33
29.36
5.54
25.62
4.25
41.69
10.11
34.06
7.45
18.01
5.64
18.54
5.8
42.22
15.11
35.35
11.63
12.71
4.01
10.95
3.4
30.77
11.46
23.35
7.28
21.52
6.65
38.1
9.62
27.6
4.33
26.97
4.22
35.82
8.11
30.04
6.09
4
Share
 
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.