AI4Bharat

Pages
- AI4Bharat Public
  Seminars
  Publications
  People
  Models
- AI4Bharat Admin
  Members
  Planning
  Licensing
  Meity Timelines
  Hiring
  AI4Bharat Summer of Code
- IndicMining
  Meeting Minutes
  NeurIPS dataset paper plan
- IndicASR
  RNN-T
  Multilingual ASR
  Analysis
  Adaptation in End-to-End Speech Recognition
  Data Augmentation
  Text Normalization for speech
- Shoonya
  Documentation - User Manual
  Welcome Page
  User-Roles on Shoonya
  Getting Started with Workflow
  Manager Workflow
  Language-Experts Workflow
  Annotation Workflow
  Collection Workflow
  Terminology
  FAQs and Feedback
  Management Dashboard
  Language Experts
  Annotation Tasks
  Reporting and Analytics
  Projects DataExports
  Task Details
- Shoonya Development Document
  Shoonya Workflow
  Software Architecture Diagrams
  Technology Used
  Shoonya Code Structure
  Shoonya Deployment
- Shoonya Forms
  Feature Suggestions
  Report Bugs for Shoonya
  User Feedbacks
  Stats-collection Forms

AI4Bharat

...

Terminology

Explore

Terminology

Term

⁠

Dataset Instance

⁠

Task

⁠

Task Status

Unlabeled

Accepted

Skipped

⁠

Organization

⁠

Workspace

⁠

Project

⁠

Project Domain

Translation

OCR

Monolingual

⁠

Project Type

Monolingual Collection

Monolingual Translation

Translation Editing

Sentence Splitting

OCR Annotation

⁠

Project Status

Published

Draft

Archived

⁠

Project Mode

Collection Project

Annotation Project

⁠

Data Sampling Type

Full

Batch

Random

⁠

Annotators-per-task

⁠

Dashboard

Organization Dashboard (Landing Page)

Project Dashboard

Task Table

Task Dashboard

⁠

Description

⁠

A set of tasks to be annotated by language team e.g.: set of “English to Sanskrit” translation sentences. A Dataset Instance is uploaded on Shoonya by an

Organization Owner⁠

or a

Workspace Manager⁠

⁠

A task is an individual item in a dataset to be annotated. e.g.: an image or sentence. All

user-roles⁠

in a Project have authority to annotate a task.

⁠

Defines the current status of any annotation

task⁠

. A task can be in one of the following states at a time:

Initially all tasks are in unlabeled. Task state changes once an annotation is added or skip button is used.

Indicates that a given task is annotated.

Indicates given task has been skipped by the Language expert.

*Language experts can use skip button to skip a task.

⁠

Acts as an Umbrella for a collective set of people, workspaces, projects and distinct tasks. e.g.: AI4Bharat, IIT Madras, IIT Bombay etc.

⁠

Used to group similar projects. There can be multiple Workspaces in an organization. A workspace can have multiple

Workspace Managers⁠

⁠

All annotation activities in Shoonya are managed inside a project. e.g.: “Hindi-Tamil Sentence Translation”. A project consists of number of distinct

tasks⁠

to be annotated. All

Workspace Managers⁠

can create and manage a project inside respective

workspace⁠

⁠

Project Domain consists of several flows where Language Experts can work to annotate or collect data to be used for different Machine Learning tasks listed below:

Translation involves conversion of sentences from one language to other.

OCR stands for Optical Character Recognition. It involves extraction of text from scanned copies of handwritten/ printed documents and converting it to typed text.

Projects where

translation⁠

of English and any one Indic Language is possible.

⁠

For every

project domain⁠

, Shoonya v0.1 supports few project types listed here -

This involves collection of text blocks in the form of paragraphs in a single language.

This project type involves translation of English sentences to a desired Indic language based on the expertise of Language expert.

In this type of project, Language experts work on improving the translations done during

Monolingual Translation⁠

by doing a cross-check and correcting mistakes wherever required.

In sentence splitting, Language experts verify the correctness of sentences extracted from text blocks collected during

Monolingual Collection⁠

This project type involves Annotation of text extracted from written documents.

⁠

Every project can be in one of the following states at a time:

The state where data is added, tasks are created and annotators have been assigned to a project, it is set published by a

workspace manager⁠

The parameters of a project (annotators, tasks or data) can only be edited in draft mode.

Once Archived, no annotations can be performed on any task of that project.

*No new users can be added to a project when it is in

Archived⁠

state.

⁠

Shoonya is built to annotate as well as collect data from various sources over Internet. To fulfill this purpose, Shoonya supports two project modes namely

collection project⁠

and

annotation project⁠

Involves collection of raw data by users. Types of collection projects supported by

Shoonya v0.1⁠

are -

Involves annotation/ editing of pre-populated data by Language Experts.

Shoonya v0.1⁠

⁠

While adding data to a project, Shoonya allows managers the freedom to select a subset of data instance for a particular project. Shoonya v0.1 supports three sampling types:

Select all the tasks from a dataset instance.

Select a specific batch of tasks using some parameters. e.g.: Select “Batch Number : 1”, “Batch Size: 100” . This selects Data having ID 0-100.

Select a percentage of random tasks from the added data.

⁠

This field is required while creating a new project in Shoonya. Signifies the required number of annotators to set a task as

Accepted⁠

⁠

Page where things are shown to the user. There is a distinct page for each component:

This is the Landing page which consists of all project cards and list of workspaces.

Project Dashboard contains Project Title along followed by other important information. This page also contains a Task Table.

A task table lists down all the tasks present inside a project.

Task Dashboard is the place where Annotations are done.

⁠

Gallery

Want to print your doc?
This is not the way.

Try clicking the ··· in the right corner or using a keyboard shortcut (

CtrlP

) instead.