Crawling the web for indic data
Setu (AI4B)
Vendor based- digitisation of published data
How do we choose the right kind of vendors?
What are the costs and incentives?
Volunteer based- data creation camps and activities(eg- colleges)
How to create engagement to bring an audience?
How do you set up the process to ensure quality?
Textualising Youtube, News and other Audio Visual data sources
How do we choose the correct existing data- what are the metrics?
Can we use Same Language Subtitling effort to increase data availability?
Government and public good use cases to generate data.
What kind of data can the use case generate?
How do we create an architecture that allows the data to be shared back in public domain while preserving user privacy?
Private Public Partnerships with licensing
Can we get the government to share royalty free access to all data from Doordarshan, AIR, Government publishing houses