Beyond English: Advancing Indic Language AI with Sarvam-M and Bulbul-v2
Kurian Benoy,
ML Engineer, Sarvam
About Me
  • ML Engineer @ Sarvam
  • Volunteer @ Swathanthra Malayalam Computing
  • Loves Walking and likes to participate in marathon or any sport for that matter be it football, basketball, pickleball
  • Bird Watching is my hobby (PS: Sarvam Models are named with bird names because of me)
About Sarvam
  • At Sarvam we are on a mission to make generative AI real for bharat
  • Sarvam is dedicated to building the bedrock of Sovereign AI for India.
  • Co-founders: Dr Vivek Raghavan and Dr Pratyush Kumar
  • India has 22 Indian languages, 1000+ dialects and 130 crore people
  • Towards this end, we see merit in a ‘Sovereign AI Ecosystem’ in India, whose imperative is not to decouple from the rest of the world but to provide strategic autonomy in building foundational components and tailoring them to the country’s unique needs
  • In addition to the final foundational models trained from scratch, starting this week we will also have weekly drops of open-source models along with technical reports detailing our research findings.
Indian language problem
  • Lack of high-quality, authentic AI models for Indic languages.
  • Current models often lack cultural nuance, accurate pronunciation, or speed.
  • The need for "voices that feel like India" and AI that understands India.
Sarvam-M:
  • It is available under Apache 2.0 LICENSE.
  • It's not our flagship LLM
10+ Indic Languages
Supports Hindi, Tamil, Telugu, and more.
Post-training techniques
20-30% Better Performance
Outperforms open-source models on same size significantly on Indic language + coding benchmarks
Fine tuned on Mistral
Support reasoning + non-reasoning model.
Using Wiki-feature significantly increases accuracy
Sarvam-M results quickly
Sarvam-M was all about post-training
Sarvam-M: Key points from our blogpost
  • Supervised Finetuning (SFT):
  • Prompt Curation: Diverse, high-quality prompts (11.5M -> 3.7M) with quality/hardness scoring.
  • Character Training: Debiaising completions, re-biasing towards culturally relevant outputs. (Show an example, if appropriate and quick).
  • Multi-lingual focus: 10 Indian languages (Hindi, Bengali, Gujarati, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu) and English.
  • Language Forms: Native script, Code-mixed, Romanised.
  • Reinforcement Learning with Verifiable Rewards (RLVR):
  • Why RLVR? Improved scores in math, programming, instruction following.
  • Task-wise curriculum: Tailored learning for different domains (GSM8K, MATH, Code).
  • Reward Engineering: Binary and partial rewards for complex tasks (e.g., code execution success, translation quality).
  • Inference Optimization:
  • Efficient deployment: FP8 quantization for smaller model size & computational efficiency.
  • Lookahead Decoding: Faster responses.
  • Built for scale & cost-effectiveness.
Sarvam-M highlights
  • Key Results:
  • +20% avg. improvement on Indian language benchmarks.
  • +21.6% on math, +17.6% on programming benchmarks.
  • Outperforms Llama-4 Scout and comparable to larger models in many Indic benchmarks.
  • Knowledge Grounding with Wikipedia (RAG):
  • How it enhances factual accuracy, when comparing with Sarvam-M + Wikipedia.
Sarvam-M is on par with grounding with frontier models
  • With this feature, Sarvam-M gets to 71% on OpenAI’s Simple-QA. O3 which is >10x more expensive is at 49%.
  • Our founder, Vivek Raghavan was testing even in Complex QA and it was impressive to him.
How to use Sarvam-M?
  • Download Modal weights from Huggingface
    huggingface.co/sarvamai/sarvam-m
    (Requires like 2H100 GPUs to run without quantization)
from openai import OpenAI base_url = "https://api.sarvam.ai/v1" model_name = "sarvam-m" api_key = "Your-API-Key" # get it from https://dashboard.sarvam.ai/ client = OpenAI( base_url=base_url, api_key=api_key, ).with_options(max_retries=1) messages = [ {"role": "system", "content": "You're a helpful AI assistant"}, {"role": "user", "content": "Explain quantum computing in simple terms"}, ] response1 = client.chat.completions.create( model=model_name, messages=messages, reasoning_effort="medium", # Enable thinking mode. `None` for disable. max_completion_tokens=4096, ) print("First response:", response1.choices[0].message.content)
  • Playground
Interesting Articles about Sarvam-M
  • Sarvam-M was integrated to open-source projects like Open-Router and Mem0 .

alokbishoyi.com

What I Learned Testing Sarvam's AI on 64 Controversial Questions

I've been thinking about AI bias lately. Not the usual "models are biased" complaints, but something more specific: what happens when you force an AI to take positions on genuinely controversial topics?

Bulbul-v2: Our authentic voice engine
2-3x
Faster Inference
11
Supports 10 Indian languages + English
Rs 15
For 10K, we are one of the cheapest providers with on-usage plans. You can start building for free.
7
Supports 4 Female voices and 3 male voices which can speak in all languages.
100%
Authentic accents and pronounciations
Bulbul-v2 Video
Loading...
How to use Bulbul-v2?
API
  1. pip install sarvamai
from sarvamai import SarvamAI from sarvamai.play import save client = SarvamAI(api_subscription_key="YOUR_API_SUBSCRIPTION_KEY") # Convert text to speech audio = client.text_to_speech.convert( target_language_code="en-IN", text="Welcome to Sarvam AI!", model="bulbul:v2", speaker="anushka" ) save(audio, "output1.wav")
Use through Playground
Cool applications build with this
Multilingual Debugger
Loading...
Other cool apps
  • Mom's kitchen helper which helps in coming with good cooking recipes with Sarvam-M with whatever ingredients you have.
  • Form Saathi - specially gig workers, don’t know what to fill in forms when they’re not in their own language so built a demo use case where platforms like zomato, swiggy, urban company can use sarvam to make forms voice-first and language-friendly.
  • Indic language learning made by Ravi Theja (IndicLinguo)
  • Learn Kannada one phrase at a time and Learn hindi
  • Thulika webapp (For tracking weight loss and tracking calories)
Call to Action
Start Building Today
Leverage Sarvam-M and Bulbul-v2 for your next project.

dashboard.sarvam.ai

Sarvam API Dashboard

Get help and support for the Sarvam API Dashboard

Sarvam API Docs

👋 Welcome to Sarvam AI Docs | Sarvam API Docs

GitHub

GitHub - sarvamai/sarvam-ai-cookbook: Open Source Sarvam AI Cookbook

Open Source Sarvam AI Cookbook. Contribute to sarvamai/sarvam-ai-cookbook development by creating an account on GitHub.

Join Our Community
Connect with developer community in discord and get support.
Visit sarvam.ai
Read our blogs, mission statements to understand what we are planning to do
Contact Us
Discuss tailored enterprise solutions. Feel free to email us at: developer@sarvam.ai
Interested in research collaboration? Reach out at jointhecircle@sarvam.ai
I know I haven't talked about agents
Time for Q&A
  • Time for questions
  • I may not be answering spicy questions like: Is competitor X better than you?
kurianbenoy2
kurianbenoy
kurianbenoy.com
Thank you
Made with