Before foundation models, AI was task-specific and required extensive manual engineering for each application. Foundation models changed this by being versatile and multimodal, handling tasks like text, image, or audio generation. They can be standalone systems or serve as a base for other applications.
Foundation models enable:
Versatility: They can perform a wide range of tasks, such as text translation, summarization, report generation, email drafting, and content creation. Efficiency: They can apply learned patterns from one task to another, reducing the need for separate models for each task. Multimodality: They can handle various input formats, including text, images, or videos, making them suitable for a wide range of applications. Broad applicability: They can be used in various industries, from customer service chatbots to content creation tools, and in research fields like natural language processing and computer vision. Foundation models are powerful and versatile, but they also require careful consideration regarding their potential impacts and risks, such as bias, disinformation, and power concentration.
Types of Foundational Models
For the scope of this report we will limit to discussing the foundational models in context to India specific use cases, as the overall scope is very broad and heavy competition.
Understanding Foundational Models
Conclusion of Investment thesis on Foundational Models
Drivers & Challenges
A vast amount of data is needed to train these models effectively; which is not available for many of the Indian languages and context. Furthermore this data needs to be available in digital format, even the data that does exists needs to be digitised, which is a huge task. According to only 11% of total population speaks English, which leaves a huge market untapped by existing western models like Chat GPT & Llamma. Western models do have feature of translation, but it is simply translating the knowledge of western context to the desired language. Such translation often fails to capture the essence of conversation, rendering is useless of commercial applications like handeling customer queries in regional language. Such translation requires extra compute power, making their use for commercial purposes even more expensive. A LLM trained natively on regional languages and with Indian context could enable AI mainstream revolution in India, bringing it to the untapped market of 89% of the population. In respect to auditory LLMs, there is again a problem of vast amount of diverse data in regional languages. Auditory LLMs trained on majority of western data has inherent bias with respect to Indian accent, pronunciations & vocabulary, which makes it not so useful for commercial use cases. In respect to computer vision LLMs, data training should not make a lot of difference. Any upcoming biases in the western LLMs can be finetuned without additional affect on costs. However the difference can arise on the prompt input, since input is only available in English, it might not be as useful for the B2C segment. Additionally there is question of data security, with AI taking more and more personal information and actively tracking it to make conversations and recommendations, how safe is it for such data to be in a foreign servers with us having no control. Development of LLMs is not an easy task, it requires great engineering innovation and effort. This takes lot of time and investment. The typical starting capital requirement of LLM makes it difficult for early stage VCs to fund such startups. Furthermore, these new startups would be semi competitors of well established proven LLMs like Chat GPT & Llama Early stage investment in foundational LLM does not make sense, unless following criterion meets:-
Team with exceptional track record. Strong leadership capability to attract best talent pool. A technological innovation that brings down development & training time significantly. Strong interest from Entrepreneur community to participate in such ecosystem.