Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support text normalization #55

Merged
merged 5 commits into from
Nov 4, 2023

Conversation

csukuangfj
Copy link
Collaborator

TODOs

  • Add documentation about how to use it
  • Add examples about how to generate the rule.fst

Example

#!/usr/bin/env python3

import kaldifst

rule = "./rule.fst"
normalizer = kaldifst.TextNormalizer(rule)
text = "3年前中国总人口为1411778724 人"
out = normalizer(text)
print(out)

The output is given below:

三年前中国总人口为十四亿一千一百七十七万八千七百二十四 人

The rule.fst used in the above example is attached below.
rule.fst.zip


Note: The above example uses Chinese numbers, but the implementation is very generic.

@csukuangfj csukuangfj merged commit 0872dbc into k2-fsa:master Nov 4, 2023
30 checks passed
@csukuangfj csukuangfj deleted the text-normalization branch November 4, 2023 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant