AI4Bharat

Explore

Gallery

AI4Bharat

Shoonya Development Document

Shoonya Workflow

Shoonya User Management

The login system management is done through

Djoser⁠

in Shoonya. A user is invited to Shoonya through the invite management system. A user will receive a link in their email for signing up for Shoonya. While signing up, the password validation is handled through

Django password validator⁠

. Once a user is authorized after logging in, the session management for them is handled using

JWT (JSON Web Token)⁠

. At the backend, apart from handling the authorization and session management token, the user management module supports basic CRUD (Create, Read, Update and Delete) operations to be performed on its Django model.

Shoonya Organization Management

At the topmost of the user and data management hierarchy in Shoonya is an organization which has a group of workspaces, users, projects and tasks belonging to it. An organization will have an organization owner who will have access to all the data within an organization. At the backend, the organization management module supports basic CRUD operations to be performed on its Django model.

Shoonya Workspace Management

Within an organization, there can be multiple workspaces which will have a group of similar projects. Each workspace will have its own workspace manager. At the backend, the workspace management module supports basic CRUD operations to be performed on its Django model along with features for assigning a manager to a workspace and archiving a workspace.

Shoonya Data Management

The annotation data - the data to be annotated as well as the annotation results - are stored in the form of datasets in Shoonya. A dataset instance is a named set of data items belonging to a specific dataset type. A dataset instance is first created. A dataset is a logical table with a defined schema of fields and corresponding metadata.

For each column in every dataset, a parent_id which has the id pointing to the annotation data row in its source dataset and also the metadata which contains details of how the data in that row was created, are stored. Metadata will contain details like whether this data has been annotated by users with a link pointing to the annotation id or has been created by a function specifying arguments to the function.

Considering an example scenario, for an annotation task involving verifying the machine translation of a given English sentence into an Indian language, a dataset instance named ‘English to Hindi Translation Pair’ is created. Further, a dataset of type ‘Translation Pair’ is created with its dataset instance id pointing to the ‘English to Hindi Translation Pair’ dataset instance.

Shoonya Project Management

Within a workspace, there will be multiple projects, either belonging to the same type, say, translation or the same language. A project acts as a human annotation task definition with a predesigned user-interface (UI) and pre-mapped schema for input and output sources.

The project management module uses a project registry to explicitly specify the project specifications for different annotation types like translation, OCR or monolingual data collection:

Project Type - Monolingual Translation, Translation Editing, OCR Annotation, etc.

Project Mode - Data Annotation or Data Collection

Label Studio template to be used for UI

Input Dataset

Class - Source of the input data like Sentence Text or Translation Pair

Fields - The fields to be used as input like the language and the text to be annotated

Output Dataset

Class - The dataset to which the annotation output has to be exported, which can either be same as the input source or a different dataset. For example, an OCR annotation can result in data which has to be stored into a dataset having blocks of text.

Fields - The annotation result to be exported to the dataset

The project management starts with creating a project by sampling a set of data from a dataset having data to be annotated. Upon creation of a project, each data row of its input data is populated as an annotation task in the task model.

Upon creating a project, the project will be in ‘draft’ status. Once language experts/annotators are assigned to it, a project is published and it will then be in ‘Published’ status. The language experts can work on the annotation tasks only after a project is published. A project moves to ‘Archived’ status if it is explicitly archived by the workspace manager or the organization owner or the admin.

Once the annotation tasks of a project are completed, the annotation outputs of the task can be exported to the output dataset.

At the backend, the project management module supports the above-mentioned flow along with allowing the basic CRUD operations to be performed on its Django model.

Shoonya Task Management

A project has a set of annotation tasks belonging to it. The task and annotation models work together to store the data to be annotated, the annotation result, along with whom it is annotated by, reviewed by, metadata and the task status.

The task status is initially ‘Unlabeled’. Upon submission of annotation by the annotator, it changes to ‘Accepted’. If the task is not annotated and instead skipped, its status changes to ‘Skipped’.

At the backend, the task management module supports the above-mentioned flow through basic CRUD operations to be performed on its Django model.

Gallery

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.