Latest AI News India

The Future is AI

BharatGen IndiaAI Initiative Launches Project for Multilingual LLMs in Indian Languages
AI Research

BharatGen: IndiaAI Initiative Launches Project for Multilingual LLMs in Indian Languages

BharatGen: IndiaAI Initiative Launches Project for Multilingual LLMs in Indian Languages: Bridging the AI Divide for Bharat’s Voices

Introduction

Imagine a farmer in rural Odisha using a voice-based AI tool to check crop prices, only to find that the app understands only English. A grandmother in Tamil Nadu struggles with a telehealth service that replies in Hindi. A student in Assam wants to learn coding online, but all the tutorials are in unfamiliar languages. These are not hypothetical scenariosβ€”they reflect the daily struggles of a country where 96% of the population doesn’t speak English fluently, yet most AI tools cater to the remaining 4%.

BharatGen IndiaAI Initiative Launches Project for Multilingual LLMs in Indian Languages

This is the gap India’s National AI Mission aims to bridge with its groundbreaking project: β€œBharatGen” (Language AI). Announced on the IndiaAI portal, this initiative focuses on developing multilingual Large Language Models (LLMs) tailored to India’s linguistic diversity.

Why BharatGen Matters: Overcoming the AI-Language Barrier

Despite being home to the world’s second-largest online population (900+ million users), 88% of India’s internet consumption occurs in regional languages. Yet, as of 2024, Hindiβ€”spoken by 600 million peopleβ€”has fewer AI training resources than Icelandic, a language spoken by only 400,000 people.

Dr. Anika Rao, lead researcher at IndiaAI’s NLP lab, explains:

β€œThis project isn’t just about translationβ€”it’s about enabling governance, education, and entrepreneurship in languages that have existed for millennia.”

How β€˜BharatGen’ Plans to Transform Indian Language AI

BharatGen’s first phase focuses on 12 major languages, including Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Odia, Malayalam, Punjabi, Assamese, and Urdu. However, the real innovation lies in its hybrid LLM architecture, allowing fluid switching between languages, dialects, and even mixed speech like Hinglish or Tanglish.

Key Innovations of BharatGen:

  1. Crowdsourced Datasets: Partnering with Koo, ShareChat, and local platforms to collect colloquial text and voice samples from tier-2/3 cities.
  2. Ethical Sourcing: Unlike Western models criticized for data scraping, BharatGen collaborates with authors, poets, and publishers to license content.
  3. Bhasha Compute Grid: Leveraging India’s National Supercomputing Mission, training a Hindi LLM locally costs 1/10th of outsourcing.

Challenges in Building AI for Indian Languages

Developing AI for India’s diverse languages presents unique challenges:

  • Dialects and Regional Variations: A model trained on Delhi’s Hindi might fail in Chhattisgarh, where β€œrotī” (bread) becomes β€œrotiya.”
  • Low-Resource Languages: Bhojpuri, spoken by 50 million people, has limited digital content. The team is digitizing folk songs, panchayat records, and WhatsApp voice notes.
  • Bias in AI: Early models showed biases, such as associating Sanskrit only with religion. AI datasets are now being audited for caste, gender, and regional stereotypes.

Real-World Applications of BharatGen

BharatGen’s impact spans multiple sectors:

  • Agriculture: AI assistants for farmers in Marathi or Telugu, trained on local soil and pest data.
  • Legal Tech: Translating British-era laws into vernacular languages for rural lawyers.
  • Healthcare: Mental health chatbots in Malayalam to support Kerala’s rising depression rates.

Startups are also adopting this technology. Kochi-based Vani AI developed a Malayalam chatbot for fisherfolk, alerting them to monsoon patterns in coastal slang. Founder Rajeev Nair shares:

β€œEarlier, alerts were in English or formal Malayalam. Now, the chatbot says, β€˜Maushi, kadal kadichu varum!’ (Sister, the sea will rage!).”

The Road Ahead: Democratizing AI for the Next Billion Users

Despite a β‚Ή2,000 crore budget, some critics argue BharatGen is overly ambitious. However, IndiaAI director Priya Krishnan counters:

β€œThis isn’t just tech for the sake of techβ€”it’s about democratizing AI before the next billion users come online.”

By 2026, BharatGen aims to support 22 languages, aligning with India’s official languages. But the true test will be cultural acceptance. As novelist Perumal Murugan (Tamil) quipped at the launch event:

β€œIf AI can write a villupaattu (folk song) that makes my grandmother smile, that’s intelligence.”

Thoughts of the Team

While the world focuses on making AI sound human, India is making AI listen to humans. Whether a Tamil farmer or a Gujarati grandmother, BharatGenΒ isn’t just building modelsβ€”it’s rebuilding dignity. And that makes it the most Indian AI project yetβ€”messy, pluralistic, and unapologetically ambitious.

for more details please visit the Government’s official website IndiaAI for BharatGen.Β 

you can also watch this YouTube video of the Channel Galaxy Classes, we especially thank to the owner of the channel for creating such an informative video.

 

Written by Rajendra Singh Rathore & Team

LEAVE A RESPONSE

Your email address will not be published. Required fields are marked *

I am a technology lover and want to spread my AI related knowledge to this beautiful world.