Speech recognition: speech to text

Automated speech transcription service for Estonian speech and a user interface for transcription editing.

Get started View demo

How does it work?

Tekstiks.eeis a public speech recognition service of TalTech's Laboratory of Language Technology. The system demonstrates technologies and models developed in this lab. These currently achieve state-of-the-art Estonian speech recognition results, even compared to commercial alternatives. The system is fully automated and can process multiple files in parallel. There can be a queue and delays, especially during business hours. The average processing time is about half of the recording's length.

Ekraanitõmmis rakendusest

1. Upload a speech recording in Estonian

Most of the popular audio and video formats are supported. Max size limit is 500MB

2. Wait for the speech recognition to complete.

The system trained using machine learning methods will search for Estonian speech segments and tries to differentiate multiple speakers. Then it will transcribe the speech segments into text and finally will add punctuation. Many Estonain celebrities and radio personalities can be identified by name as well.

3. Correct speech recognition mistakes

The editing of the transcription is interactive. The integrated audio player and the text are in sync. The currently playing word in coloured to simplify the manual editing.

4. Download the result

Download the transcription, currently the DOCX format is supported.

Recommendations

  • The speech in the audio file should be of the best possible quality, i.e. recorded with a microphone near the mouth in a noise-free environment. The audio file should be at least 16-bit encoded and 16 KHz in frequency, preferably in WAV format.
  • As the maximum file size that can be uploaded is 500 MB, longer WAV files can be encoded in mp3 or ogg format, but it is recommended that you use at least 128 kbit encoding. Converting the stereo format to mono also saves volume (it is done anyway during detection).
  • The system does not work well with audio files longer than two hours. With such files, recognition may fail and the recognition result will not be returned. We recommend splitting long files first.
  • NB! Due to limited resource of the recognition server, please do not upload more than 10 recordings per day. Otherwise, there will be a long queue in the system for all users. If you need to transcribe a large number of files (eg an entire audio archive), please contact us.

Citing

If you use this system for research, please refer to the article below in your publications (available at here): Olev, Aivo; Alumäe, Tanel (2024). Open source platform for Estonian speech transcription. Language Resources and Evaluation, 1−18. DOI: 10.1007/s10579-024-09777-1.

Open source

Tekstiks.ee is based on freeware solutions that are easy to set up yourself. The recognition system can also be run from inside a Docker container (recommended).

Finnish speech recognition

Finnish speech recognition has been developed in co-operation with Finnish Language Bank and Aalto University. Speech recordings are processed by a service hosted by Finnish Language Bank.

Kielipankki logo