Skip to content
Glocal Evaluation of Models
  • Pages
    • icon picker
      Pariksha
    • Health Pariksha
    • Leaderboard of Leaderboards
    • Updation Process of GEM
    • Complete Table

Pariksha

By Ishaan Watts, Vivek Seshadri, Manohar Swaminathan, Sunayana Sitaram (Microsoft Research India)
Read the full paper here:
Pariksha aims to evaluate the performance of large language models (LLMs) for Indic languages in a scalable, democratic, and transparent manner.

📐 Evaluation Method

The first Pariksha Pilot compares the responses of different LLMs to prompts that are curated to be relevant to Indian languages, culture and ethos. Instead of using traditional multilingual benchmarking techniques such as in our prior work MEGA [1] and MEGAVERSE [2], Pariksha leverages Karya, an ethical data collection platform to conduct large-scale high-quality human evaluation. The ranks obtained by human evaluation are converted into ELO scores to create the Pariksha leaderboard. We believe that current benchmarks are not sufficient to measure progress in Indic LLMs due to problems caused by contamination, benchmark translation and the lack of representative tasks in many traditional benchmarks. We plan to release all evaluation artifacts in order to enable the community to improve their models' using prompts, evaluation scores and preference data.

In addition to human evaluation, we also employ LLMs-as-evaluators by building upon new research on multilingual evaluation, METAL [3, 4]. This has the potential to augment human evaluation and increase the overall efficiency of the evaluation pipeline. We also present leaderboards created using LLMs as evaluators for the Pariksha Pilot.
megaphone
More details on the evaluation process can be found in the .

🎖️ Leaderboard with Various Views

info
The Pariksha Pilot was conducted in March 2024 and Round 1 is currently ongoing. The Round 1 leaderboard should be treated as a preview. We plan to add more models in subsequent rounds of Pariksha.
Pariksha Pilot Leaderboard

🎖️ Pariksha Round 1 Leaderboard

MLE Elo by Language

Pariksha - MLE Elo for Bengali
There are no rows in this table
Pariksha - MLE Elo for Gujarati
There are no rows in this table

Pariksha - MLE Elo for Hindi
There are no rows in this table

Pariksha - MLE Elo for Kannada
There are no rows in this table
Pariksha - MLE Elo for Malayalam
There are no rows in this table
Pariksha - MLE Elo for Marathi
There are no rows in this table
Pariksha - MLE Elo for Punjabi
There are no rows in this table

Pariksha - MLE Elo for Odia
There are no rows in this table
Pariksha - MLE for Tamil
There are no rows in this table
Pariksha - MLE for Telugu
There are no rows in this table

📊 Summary of Data Points

The tables below summarize the number of models, languages and data points included in the Pariksha Pilot leaderboard.
help
What is a Data Point?
A data point is a single battle, where an evaluator is shown a prompt with responses from two LLMs and asked to pick which one is better, or tie.
Summary for Human-Eval (Karya)
There are no rows in this table
Summary for LLM-Eval
There are no rows in this table

🧵 References

[1] Kabir Ahuja, Harshita Diddee, Rishav Hada, Millicent Ochieng, Krithika Ramesh, Prachi Jain, Akshay Nambi, Tanuja Ganu, Sameer Segal, Mohamed Ahmed, Kalika Bali, and Sunayana Sitaram. 2023. . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4232–4267, Singapore. Association for Computational Linguistics.
[2] Sanchit Ahuja, Divyanshu Aggarwal, Varun Gumma, Ishaan Watts, Ashutosh Sathe, Millicent Ochieng, Rishav Hada, Prachi Jain, Maxamed Axmed, Kalika Bali, and Sunayana Sitaram. 2024.
[3] Rishav Hada, Varun Gumma, Adrian Wynter, Harshita Diddee, Mohamed Ahmed, Monojit Choudhury, Kalika Bali, and Sunayana Sitaram. 2024. . In Findings of the Association for Computational Linguistics: EACL 2024, pages 1051–1070, St. Julian’s, Malta. Association for Computational Linguistics.
[4] Rishav Hada, Varun Gumma, Mohamed Ahmed, Kalika Bali, and Sunayana Sitaram. 2024.

Share Feedback

Reach out to the Pariksha team to share feedback.
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.