Speech recognition: speech to text
Automated speech transcription service for Estonian speech and a user interface for transcription editing.
How does it work?
Tekstiks.eeis a public speech recognition service of TalTech's Laboratory of Language Technology. The system demonstrates technologies and models developed in this lab. These currently achieve state-of-the-art Estonian speech recognition results, even compared to commercial alternatives. The system is fully automated and can process multiple files in parallel. There can be a queue and delays, especially during business hours. The average processing time is about half of the recording's length.

1. Upload a speech recording in Estonian
Most of the popular audio and video formats are supported. Max size limit is 500MB
2. Wait for the speech recognition to complete.
The system trained using machine learning methods will search for Estonian speech segments and tries to differentiate multiple speakers. Then it will transcribe the speech segments into text and finally will add punctuation. Many Estonain celebrities and radio personalities can be identified by name as well.
3. Correct speech recognition mistakes
The editing of the transcription is interactive. The integrated audio player and the text are in sync. The currently playing word in coloured to simplify the manual editing.
4. Download the result
Download the transcription, currently the DOCX format is supported.
Recommendations
- The speech in the audio file should be of the best possible quality, i.e. recorded with a microphone near the mouth in a noise-free environment. The audio file should be at least 16-bit encoded and 16 KHz in frequency, preferably in WAV format.
- As the maximum file size that can be uploaded is 500 MB, longer WAV files can be encoded in mp3 or ogg format, but it is recommended that you use at least 128 kbit encoding. Converting the stereo format to mono also saves volume (it is done anyway during detection).
- The system does not work well with audio files longer than two hours. With such files, recognition may fail and the recognition result will not be returned. We recommend splitting long files first.
- NB! Due to limited resource of the recognition server, please do not upload more than 10 recordings per day. Otherwise, there will be a long queue in the system for all users. If you need to transcribe a large number of files (eg an entire audio archive), please contact us.
Citing
If you use this system for research, please refer to the article below in your publications (available at here): Olev, Aivo; Alumäe, Tanel (2024). Open source platform for Estonian speech transcription. Language Resources and Evaluation, 1−18. DOI: 10.1007/s10579-024-09777-1.
Open source
Tekstiks.ee is based on freeware solutions that are easy to set up yourself. The recognition system can also be run from inside a Docker container (recommended).
- Speech recognition system. It works as a step-by-step process that can be observed. Executable from the command line. https://github.com/taltechnlp/est-asr-pipeline
- A web server solution that allows you to create a simple API through which to use speech recognition. Also supports returning of real-time processing progress information. https://github.com/taltechnlp/est-asr-backend
