JavaScript required
We’re sorry, but Coda doesn’t work properly without JavaScript enabled.
Gallery
AI4Bharat
Share
Explore
Gallery
AI4Bharat
AI4Bharat Public
Seminars
Publications
People
Models
AI4Bharat Admin
Members
Planning
Licensing
Meity Timelines
Hiring
AI4Bharat Summer of Code
IndicMining
Meeting Minutes
NeurIPS dataset paper plan
IndicASR
RNN-T
Multilingual ASR
Analysis
Adaptation in End-to-End Speech Recognition
Data Augmentation
Text Normalization for speech
Shoonya
Documentation - User Manual
Welcome Page
User-Roles on Shoonya
Getting Started with Workflow
Manager Workflow
Language-Experts Workflow
Annotation Workflow
Collection Workflow
Terminology
FAQs and Feedback
Management Dashboard
Language Experts
Annotation Tasks
Reporting and Analytics
Projects DataExports
Task Details
Shoonya Development Document
Shoonya Workflow
Software Architecture Diagrams
Technology Used
Shoonya Code Structure
Shoonya Deployment
Shoonya Forms
Feature Suggestions
Report Bugs for Shoonya
User Feedbacks
Stats-collection Forms
IndicASR
Text Normalization for speech
IndicCorp Statistics for LM Training
Hindi
Vocab size 50k -
Your text file has 4202013424 words in total
It has 6792348 unique words
Your top-50000 words are
98.1308
percent of all words
Your most common word "के" occurred 174383875 times
The least common word in your top-k is "जाउंगी" with 1268 times
The first word with 1269 occurrences is "सुहैब" at place 4998
Bengali
Vocab size 50k -
Your text file has 1421370708 words in total
It has 7255906 unique words
Your top-50000 words are
94.4976
percent of all words
Your most common word "হাজার" occurred 13420488 times
The least common word in your top-k is "জমলে" with 1108 times
The first word with 1109 occurrences is "ব্রিকসের" at place 49996
Telugu
Vocab size 50k -
Your text file has 504285066 words in total
It has 6744453 unique words
Your top-50000 words are
88.2641
percent of all words
Your most common word "ఈ" occurred 6493097 times
The least common word in your top-k is "కనిపించినప్పుడు" with 687 times
The first word with 688 occurrences is "సేకరించాడు" at place 49988
Gujarati
Vocab size 50k -
Your text file has 654314870 words in total
It has 5115246 unique words
Your top-50000 words are
92.8772
percent of all words
Your most common word "છે" occurred 28282403 times
The least common word in your top-k is "લગાવીએ" with 618 times
The first word with 619 occurrences is "પાલનપોષણ" at place 49946
Tamil
Vocab size 50k -
Your text file has 660601684 words in total
It has 10664066 unique words
Your top-50000 words are
84.9398
percent of all words
Your most common word "இந்த" occurred 4422034 times
The least common word in your top-k is "ஸ்கரப்" with 1004 times
The first word with 1005 occurrences is "ஃபிடல்" at place 49978
Odia
Vocab size 50k -
Your text file has 71018901 words in total
It has 1168499 unique words
Your top-50000 words are
94.4467
percent of all words
Your most common word "ଓ" occurred 767261 times
The least common word in your top-k is "ଆଲିସା" with 53 times
The first word with 54 occurrences is "ଗ୍ରିନ" at place 49747
Marathi
Vocab size 50k -
Your text file has 571536994 words in total
It has 6105964 unique words
Your top-50000 words are 90.5154 percent of all words
Your most common word "आहे" occurred 14187400 times
The least common word in your top-k is "लिऑन" with 639 times
The first word with 640 occurrences is "वेलमध्ये" at place 49982
Gallery
Share
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
Ctrl
P
) instead.