IndicBERT v1
IndicBERT is pre-trained with IndicNLP corpus which covers 12 Indian languages (including English)
IndicTrans v0.3
IndicTrans is a Transformer-4x ( ~434M ) multilingual NMT model trained on Samanantar dataset which is the largest publicly available parallel corpora collection for Indic languages at the time of writing ( 14 April 2021 ). It is a single script model i.e we convert all the Indic data to the Devanagari script which allows for better lexical sharing between languages for transfer learning, prevents fragmentation of the subword vocabulary between Indic languages and allows using a smaller subword vocabulary. We currently release two models - Indic to English and English to Indic and supports 11 indic languages.
IndicASR v1
IndicWav2Vec is a multilingual speech model pretrained on 40 Indian langauges. This model represents the largest diversity of Indian languages in the pool of multilingual speech models. We fine-tune this model for downstream ASR for 9 languages and obtain state-of-the-art results on 3 public benchmarks, namely MUCS, MSR and OpenSLR.
IndicBART v1
IndicBART is a multilingual, sequence-to-sequence pre-trained model focusing on 11 Indic languages and English.
IndicNLGSuite
IndicNLGSuite has a collection of models trained for five different tasks: Biography Generation, Headline Generation, Paraphrase Generation, Sentence Summarization, and Question Generation.