A set of tasks to be annotated by language team e.g.: set of “English to Sanskrit” translation sentences. A Dataset Instance is uploaded on Shoonya by an
Project Domain consists of several flows where Language Experts can work to annotate or collect data to be used for different Machine Learning tasks listed below:
Translation involves conversion of sentences from one language to other.
OCR stands for Optical Character Recognition. It involves extraction of text from scanned copies of handwritten/ printed documents and converting it to typed text.
Shoonya is built to annotate as well as collect data from various sources over Internet. To fulfill this purpose, Shoonya supports two project modes namely
While adding data to a project, Shoonya allows managers the freedom to select a subset of data instance for a particular project. Shoonya v0.1 supports three sampling types:
Select all the tasks from a dataset instance.
Select a specific batch of tasks using some parameters. e.g.: Select “Batch Number : 1”, “Batch Size: 100” . This selects Data having ID 0-100.
Select a percentage of random tasks from the added data.
This field is required while creating a new project in Shoonya. Signifies the required number of annotators to set a task as