If the language you work in isn't on the big commercial platforms, you know the struggle. We offer speech-to-text for over 300 languages the industry tends to overlook. It's affordable, it's quick, and it works without fuss. The transcripts won't be perfect, and they're not a substitute for a professional translator. But they give you a solid starting point you can work from, even while you're still in the field.
Languages covered & growing
Transcription + translation included
Free trial, no card needed
Three steps from audio to a usable transcript. No account setup, no monthly commitment.
MP3, WAV, M4A, or AAC, up to 150 minutes per file. Drag and drop or select from your device.
Pick the source language and optionally a target for translation. Our models handle 300+ languages today.
Your transcript is delivered by email and in your dashboard. Choose DOCX, Markdown (for easy AI or LLM processing), or plain text.
Explore all languages →
Over 7,000 languages are spoken in the world today, yet commercial transcription and translation
services cover fewer than 150 of them. The communities that speak the rest are not small, and their
voices matter just as much. Health workers, farmers, teachers, community leaders, people whose
knowledge and experience deserve to be documented and shared.
We started with 300+ languages across the Pacific, Africa, South Asia, and Southeast Asia.
The service is meant to help, not to replace. Whether you're a researcher, a programme team,
or a human translator looking for a rough draft to work from, we hope this makes the job a little easier.
Accuracy ratings are based on the Meta Omnilingual ASR 7B LLM-ASR model's published Character Error Rates (CER). "Good" means the model achieves CER at or below 0.5% on Meta's test sets, a strong benchmark for practical use. "Medium" reflects a CER above 0.5% where transcripts need light editing, and "Needs work" covers higher CER ranges best for keyword extraction. Important: Meta's test sets, while rigorous, may not fully represent real-world recording conditions, speaker diversity, or regional dialects. We are building our own evaluation benchmarks from community audio samples and will publish more representative accuracy data as we grow. Model accuracy improves over time through fine-tuning on curated data.
A quick primer on the terms you'll see, so you know what you're getting.
Converting spoken audio into written text in the same language. A 30-minute interview in Swahili becomes a Swahili transcript.
Converting the transcript from one language into another. Take that Swahili transcript and get it in English.
Accuracy varies by language pair. Tok Pisin → English is stronger than Tok Pisin → French, simply because more training data exists. Each step (transcription → translation) adds its own margin for error.
A few examples of where this kind of service makes a real difference.
Health researchers conduct interviews with community health workers in rural Mozambique. Recordings are transcribed and translated, turning hours of spoken insight into data that can be analysed for programme evaluation.
Programme teams collect MSC stories from communities in Tok Pisin. Transcription captures the narrative evidence about what changed and why, which feeds into evaluations, donor reports, and learning cycles.
Focus groups and key informant interviews across Ethiopian communities in Amharic, Oromo, and Somali. Transcription and translation speed up the research cycle without losing depth.
No subscriptions, no commitments. Just per-minute pricing and a free trial to get you started.
Transcription in your source language plus translation to English or another target language. All in one price, no separate tiers.
NGOs and researchers work with sensitive content. We keep that in mind at every step.
Your audio files are processed on our own servers, never sent to third-party APIs for processing.
We do not use customer recordings, transcripts, or personal information to train or improve our models. Your data stays yours — always.
Audio files are deleted after processing. Transcripts are retained in your account until you remove them.
Accuracy depends heavily on recording conditions. We'd rather be upfront about that than overpromise. See our accuracy guide for tips on getting the best transcript.
Most technology builds for the centre: well-resourced languages, stable connectivity, established markets. The last mile is everything outside that radius.
It's the thousands of languages with no transcription tools. The community health workers whose field reports go undocumented. The programme evaluators who rely on oral stories because written records don't exist in the languages they work in.
We're building speech-to-text for these languages, starting with the strongest foundations we can find and improving through continuous fine-tuning, in partnership with the communities we serve.
Powered by Meta Omnilingual ASR (Apache 2.0). Models are improved over time through our own fine-tuning work.
Start with 20 free minutes. No credit card, no commitment.
Get started free