About Us

We are a language‑ and speech‑focused AI company dedicated to the Nepali language and the diverse languages of Nepal. Our team of linguists, AI scientists, technologists, and community advocates is committed to preserving and advancing the rich linguistic heritage of Nepal.

Our Story

Founded in 2020, just before the COVID‑19 pandemic, a group of passionate AI scientists and Nepali engineers had come together with a shared vision: to make technology truly speak our languages. We saw early on that Nepali and many other languages of Nepal were underrepresented in the digital world, and we believed AI could change that.

When the pandemic struck, our mission gained urgency. We launched our first chatbot to help communities access reliable information in Nepali during those challenging times. That milestone proved what was possible when local expertise and advanced technology work hand in hand.

From that foundation, our work quickly expanded. We began building Language and Speech AI, Conversational AI, and tools designed to connect people through their mother tongues.

Today, we are a team of linguists, AI scientists, technologists, and community advocates dedicated to localizing AI for Nepal. Our goal is to be recognized as the company devoted to Nepali and all the languages of Nepal—creating a future where every language is heard, valued, and thrives in the digital age.

Our Journey

The Sabdakunja Ecosystem

A coordinated suite of tools that collect, refine, and protect Nepali language data—from raw voices and web text to high-quality, AI-ready datasets.

01

Audio Collection

Awaj-Sangraha

Sangraha means collection or gathering. This tool crowdsources the “Voice of Nepal”.

02

Text Collection

Lipi-Khoj

Lipi (Script) + Khoj (Search). The engine that hunts for written Nepali text across the web.

03

Audio Annotation

Shruti-Lekhan

Shruti (Listening) + Lekhan (Writing). The workbench for transcribing and labeling speech.

04

Text Annotation

Sabda-Sadhana

Sadhana means disciplined practice and refinement. The tool for perfecting text and translation.

05

Storage

Data-Kunda

Kunda is a sacred reservoir. The deep, secure repository where all Nepali data resides.

Process Explanation: The Flow of Intelligence

From crowdsourced voices and discovered text to powerful AI models and real-world applications, Sabdakunja turns raw language into impact in five connected stages.

1Stage 1: Gathering the Source

Sangraha & Khoj

Awaj-Sangraha captures diverse oral traditions and dialects, while Lipi-Khoj scans the digital landscape to archive written Nepali text—together filling the Data-Kunda.

2Stage 2: Cultivating the Data

Lekhan & Sadhana

Through Shruti-Lekhan we give AI “ears” by turning recordings into labeled datasets, while Sabda-Sadhana lets linguistic experts refine grammar, context, intent, and emotion.

3Stage 3: Forging the Model

Training

Refined data from the Data-Kunda is used to teach neural networks the relationship between sound and script, and between Nepali sentences and their translations.

4Stage 4: Bridging the Gap

API & Integration

The trained intelligence is packaged into Sabdakunja Connect APIs—bridges that allow external apps to talk to our AI models in real time.

5Stage 5: Local Impact

Application

The result is a localized digital ecosystem where technology speaks the language of the people, ensuring no Nepali is left behind in the AI revolution.

Timeline of Milestones

2020

Registration & Conversational AI

Registered and began work in Conversational AI

2022

Core Technology Development

Diyo.AI began creating Translation and Transliteration APIs

2023

Speech Technology Development

Began work in Speech Technology

2024

Expansion in Other Languages of Nepal

Advancing in sector of Language Documentation, and Computational Linguistics

Join Our Mission

Help us preserve the linguistic heritage of the Himalayan region for future generations

Join Us