People+ai Theses

icon picker
Models

What’s my Model

There are more than 8 billion people that speak more than 7,100 languages in the world. AI is not able to converse as fluently or take action with 98% of them—leaving out billions of people. Existing models are far behind popular English models in many language, voice and vision tasks and do not cater to an Indian audience and Indian market- models need to be more adaptable and accessible.
For the reader:
- This is thesis on our understanding of the optimal routes to build better models considering the current and extrapolated future state of Indian language datasets and compute availability
- If any part of the document sparks a though or idea- leave some notes for the community to tinker with.
- We are looking for model builders, users and evaluators to contribute to the projects. Click on the buttons to volunteer or partner in the cause.

Exploring novel non-transformer architectures for better performance at lower inference costs

Lower inference costs will enable Indian application developers to use models and deploy products at affordable prices for an Indian audience.

Transformer models are compute expensive.

The attention computation for transformers grows quadratically with the length of the sequence due to the pairwise attention scores calculated for each element. This can become a significant memory burden for very long sequences.
The model sizes and depth of transformer architectures lead to billions of parameters in a model which needs substantial memory and processing power making models expensive to train and to inference.

State Space Models(SSMs), Test-Time Training and RWKV can serve as alternate, more computer efficient models.

SSMs and RWKV significantly reduce computational costs compared to transformers, making them ideal for tasks involving long sequences, such as music and text generation, while maintaining performance across extensive data.
SSMs excel in handling continuous-time data, such as audio signals
They also demonstrate superior generalisation capabilities across diverse domains, effectively managing varying data distributions and sampling rates, which enhances their robustness in real-world applications; and multilingual capabilities ensure effective performance across various languages
More on non-transformer models inn Appedix A
Projects:
Foster a research community to bounce thoughts around non-transformer architectures
Pool funding for projects in the domain

Tuning the Transformer architecture for Indian Language Understanding

Indian languages pose several unique challenges for Transformer models, including linguistic complexity, script diversity, cultural context, data scarcity, and word sense disambiguation.

Optimising the components of a Transformer for Indian languages can improve models

Changing the fundamental blocks of a transformer like it's tokenisation mechanisms and embeddings to better suit an Indian language will make models become better at understanding and generating languages.
Projects:
Improve tokenizers for representation for complex scripts like Devanagari
Create better embeddings specific to a language and extrapolate to other languages

Creating better model evaluation and feedback loops will speed up model builders’ progress

Continuous improvement to the benchmarks and leaderboard datasets is a must to allow model builders to better models and users to choose the right models. Currently benchmark metrics specific to the nuances of Indian languages are being researched but not easily available for developers to access. The broad domain of language tasks are not evaluated in a consolidated and usable format.
Model alignment can be improved by prioritising feedback which will lead to more ‘Indian’ answers. Loops for model use cases to provide feedback through RLHF or prompting are not being improved.
Projects:
Leaderboard aggregation, continuous improvement and private public collaboration to improve benchmarks- Ongoing project as Glocal Evaluation Model(GEM)

Enhancing capabilities of foundational models and personalising results can broaden the scope of AI usage in Indian applications

Users engage better with audio and visuals- making voice and image capabilities key.

Many applications can benefit from better foundational models that may translate voice from one language to another would break many language barriers. Vision models would generate Indian skin tones and have cultural context in creating images.
Projects:
Creating better voice-to-voice capabilities by improving the models and the datasets
Improving multimodality capabilities(specially audio and vision) an Indian context for foundation models

Personalising the experience creates incentives for users to stick with products and engage with the application.

Personalising creates user retention and interest. By realising privacy preserving collection and training of models, personalisation for any use case will be possible.
Building models that are domain specific are important to create reliable and specific answers.
Projects:
Federated learning models for AI in healthcare and finances
Improving fine tuning for region specific changes

We need people thinking about this- collectively: Fostering research and industry-academia collaboration

India has a substantial pool of AI talent, accounting for 16% of the global workforce, but there is a shortage of top-tier researchers capable of developing advanced AI models. Many of the best AI researchers relocate abroad, which hampers the country's ability to create cutting-edge AI applications.
Creating a community building, deploying and improving on transformer alternative models
Public Private partnerships will enrich the open source AI stack and the AI ecosystem from data-> model-> application
Projects:
Open source Indian tech stacks for self sufficiency
Empowering skills- english, programming and learning for learners and educators
Mapping AI research in Indian Academia + Industry
Attracting research talent back to India
Coming up with some sort of an alternative to research funding that is commercial

Projects

Project name
Desciption
Status
Charter
Volunteer/ Patner
1
Glocal Evaluation of Models
Ongoing
Open
Click me
2
Open
Click me
3
Open
Click me
There are no rows in this table

Further reading

[A] Non-transformer architectures

Relaxation of attention and Linear RNNs (non-state space models)
State space models
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.