AI4Bharat
Share
Explore
Documentation - User Manual

icon picker
Terminology


Term

Dataset Instance

Task
Task Status

Unlabeled

Accepted
Skipped
Organization
Workspace
Project

Project Domain

Translation
OCR

Monolingual
Project Type
Monolingual Collection

Monolingual Translation

Translation Editing

Sentence Splitting

OCR Annotation
Project Status
Published

Draft

Archived
Project Mode

Collection Project

Annotation Project
Data Sampling Type

Full
Batch

Random
Annotators-per-task
Dashboard

Organization Dashboard (Landing Page)
Project Dashboard

Task Table
Task Dashboard

.

Description

A set of tasks to be annotated by language team e.g.: set of “English to Sanskrit” translation sentences. A Dataset Instance is uploaded on Shoonya by an or a .
A task is an individual item in a dataset to be annotated. e.g.: an image or sentence. All in a Project have authority to annotate a task.
Defines the current status of any annotation
. A task can be in one of the following states at a time:
Initially all tasks are in unlabeled. Task state changes once an annotation is added or skip button is used.
Indicates that a given task is annotated.
Indicates given task has been skipped by the Language expert.
*Language experts can use skip button to skip a task.
Acts as an Umbrella for a collective set of people, workspaces, projects and distinct tasks. e.g.: AI4Bharat, IIT Madras, IIT Bombay etc.
Used to group similar projects. There can be multiple Workspaces in an organization. A workspace can have multiple .
All annotation activities in Shoonya are managed inside a project. e.g.: “Hindi-Tamil Sentence Translation”. A project consists of number of distinct to be annotated. All can create and manage a project inside respective .
Project Domain consists of several flows where Language Experts can work to annotate or collect data to be used for different Machine Learning tasks listed below:
Translation involves conversion of sentences from one language to other.
OCR stands for Optical Character Recognition. It involves extraction of text from scanned copies of handwritten/ printed documents and converting it to typed text.
Projects where of English and any one Indic Language is possible.
For every , Shoonya v0.1 supports few project types listed here -
This involves collection of text blocks in the form of paragraphs in a single language.
This project type involves translation of English sentences to a desired Indic language based on the expertise of Language expert.
In this type of project, Language experts work on improving the translations done during by doing a cross-check and correcting mistakes wherever required.
In sentence splitting, Language experts verify the correctness of sentences extracted from text blocks collected during .
This project type involves Annotation of text extracted from written documents.
Every project can be in one of the following states at a time:
The state where data is added, tasks are created and annotators have been assigned to a project, it is set published by a .
The parameters of a project (annotators, tasks or data) can only be edited in draft mode.
Once Archived, no annotations can be performed on any task of that project.
*No new users can be added to a project when it is in state.
Shoonya is built to annotate as well as collect data from various sources over Internet. To fulfill this purpose, Shoonya supports two project modes namely and .
Involves collection of raw data by users. Types of collection projects supported by are -
Involves annotation/ editing of pre-populated data by Language Experts. .
While adding data to a project, Shoonya allows managers the freedom to select a subset of data instance for a particular project. Shoonya v0.1 supports three sampling types:
Select all the tasks from a dataset instance.
Select a specific batch of tasks using some parameters. e.g.: Select “Batch Number : 1”, “Batch Size: 100” . This selects Data having ID 0-100.
Select a percentage of random tasks from the added data.
This field is required while creating a new project in Shoonya. Signifies the required number of annotators to set a task as .
Page where things are shown to the user. There is a distinct page for each component:
This is the Landing page which consists of all project cards and list of workspaces.

Project Dashboard contains Project Title along followed by other important information. This page also contains a Task Table.
A task table lists down all the tasks present inside a project.
Task Dashboard is the place where Annotations are done.



Share
 
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.