About Us
We are a language‑ and speech‑focused AI company dedicated to the Nepali language and the diverse languages of Nepal. Our team of linguists, AI scientists, technologists, and community advocates is committed to preserving and advancing the rich linguistic heritage of Nepal.
Our Story
Founded in 2020, just before the COVID‑19 pandemic, a group of passionate AI scientists and Nepali engineers had come together with a shared vision: to make technology truly speak our languages. We saw early on that Nepali and many other languages of Nepal were underrepresented in the digital world, and we believed AI could change that.
When the pandemic struck, our mission gained urgency. We launched our first chatbot to help communities access reliable information in Nepali during those challenging times. That milestone proved what was possible when local expertise and advanced technology work hand in hand.
From that foundation, our work quickly expanded. We began building Language and Speech AI, Conversational AI, and tools designed to connect people through their mother tongues.
Today, we are a team of linguists, AI scientists, technologists, and community advocates dedicated to localizing AI for Nepal. Our goal is to be recognized as the company devoted to Nepali and all the languages of Nepal—creating a future where every language is heard, valued, and thrives in the digital age.
Our Journey
The Sabdakunja Ecosystem
A coordinated suite of tools that collect, refine, and protect Nepali language data—from raw voices and web text to high-quality, AI-ready datasets.
Audio Collection
Awaj-Sangraha
Sangraha means collection or gathering. This tool crowdsources the “Voice of Nepal”.
Text Collection
Lipi-Khoj
Lipi (Script) + Khoj (Search). The engine that hunts for written Nepali text across the web.
Audio Annotation
Shruti-Lekhan
Shruti (Listening) + Lekhan (Writing). The workbench for transcribing and labeling speech.
Text Annotation
Sabda-Sadhana
Sadhana means disciplined practice and refinement. The tool for perfecting text and translation.
Storage
Data-Kunda
Kunda is a sacred reservoir. The deep, secure repository where all Nepali data resides.
Process Explanation: The Flow of Intelligence
From crowdsourced voices and discovered text to powerful AI models and real-world applications, Sabdakunja turns raw language into impact in five connected stages.
Sangraha & Khoj
Awaj-Sangraha captures diverse oral traditions and dialects, while Lipi-Khoj scans the digital landscape to archive written Nepali text—together filling the Data-Kunda.
Lekhan & Sadhana
Through Shruti-Lekhan we give AI “ears” by turning recordings into labeled datasets, while Sabda-Sadhana lets linguistic experts refine grammar, context, intent, and emotion.
Training
Refined data from the Data-Kunda is used to teach neural networks the relationship between sound and script, and between Nepali sentences and their translations.
API & Integration
The trained intelligence is packaged into Sabdakunja Connect APIs—bridges that allow external apps to talk to our AI models in real time.
Application
The result is a localized digital ecosystem where technology speaks the language of the people, ensuring no Nepali is left behind in the AI revolution.
Timeline of Milestones
Registration & Conversational AI
Registered and began work in Conversational AI
Core Technology Development
Diyo.AI began creating Translation and Transliteration APIs
Speech Technology Development
Began work in Speech Technology
Expansion in Other Languages of Nepal
Advancing in sector of Language Documentation, and Computational Linguistics
Join Our Mission
Help us preserve the linguistic heritage of the Himalayan region for future generations