Preprocessing script for Estonian text-to-speech applications

Converts Estonian texts to a suitable format for speech synthesis. That includes converting numbers, symbols and abbreviations to words while considering the other elements in the sentence and declining words to maintain agreement.

The script follows the rules of Estonian orthography. Internationally used forms are only converted when they don't conflict with any Estonian use cases.

Ranges use a dash (not a hyphen).
Long numbers are grouped by spaces (not commas or dots)
Dashes between numbers that are separated by spaces are considered to be minuses, otherwise they are ranges
Decimal fractions use commas (not dots)

Requirements:

Python (>= 3.10)
EstNLTK (>= 1.7.0)

Usage

Install the latest release version as a Python library with the required dependencies:\

pip install git+https://github.com/TartuNLP/tts_preprocess_et.git

Alternatively you can define a specific release version or commit hash to ensure reproducibility. For example:\

pip install git+https://github.com/TartuNLP/[email protected]

pip install git+https://github.com/TartuNLP/tts_preprocess_et.git@698dcbf

Usage:

from tts_preprocess_et.convert import convert_sentence
convert_sentence("1, 2, 3!")

Output: 'üks, kaks, kolm!'

Usage with accessibility mode:

from tts_preprocess_et.convert import convert_sentence
convert_sentence("1, 2, 3!", accessibility=True)

Output: 'üks, kaks, kolm hüüumärk'

Features

Accessibility mode:

Differentiating capital letters in alphanumeric codes. Example: 2KMc7hy → kaks, suur-täht-kaa, suur-täht-emm, tsee, seitse, haa, igrek
Reading out exclamation and question marks. Example: Appi! → Appi hüüumärk
Reading out bracket endings. Example: 2. koht (hõbe) → Teine koht, sulgudes hõbe, sulu lõpp

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
tts_preprocess_et		tts_preprocess_et
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
readme.md		readme.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preprocessing script for Estonian text-to-speech applications

Requirements:

Usage

Features

About

Releases 2

Packages

Contributors 3

Languages

License

TartuNLP/tts_preprocess_et

Folders and files

Latest commit

History

Repository files navigation

Preprocessing script for Estonian text-to-speech applications

Requirements:

Usage

Features

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages