Best engines to choose from #816
-
Hello! Thanks for your hard work on that library, it looks really interesting. I'm learning the Thai language and I thought of using your library for automating flashcard creation through text word tokenizing and transliteration. Basically, I want to input any Thai word or sentence and receive the correct transliteration with set tones signs. For example: There are two methods that can help me achieve this:
But the issue is that they may work with multiple engines, and as a non-native speaker, it is hard for me to estimate their output. Do you have any recommendations on which engines are better to use for my specific case? The resource usage is not important to me, so any neural network model will work too. I only care about the correctness of the output at this stage. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Hello! Our language doesn't has Thai word segmentation (or word tokenization) standard from the planning and regulation of the Thai language, so word segmentation depend on each standards. If you has the resource, you should use deep learning base but It can has out-of-domain problem. (see Handling Cross- and Out-of-Domain Samples in Thai Word Segmentation). If your work is out-of-domain from deep learning model and you doesn't has resource to hire a Thai linguist, you can use newmm and improve Thai dictionary. |
Beta Was this translation helpful? Give feedback.
-
For transliterate, you should use machine learning based model to get the best result. (thai2rom, thai2rom_onnx, or tltk) |
Beta Was this translation helpful? Give feedback.
-
@wannaphong thank you for the fast response! I'm familiar with the running deep learning models and I will check options that you provided. |
Beta Was this translation helpful? Give feedback.
Hello! Our language doesn't has Thai word segmentation (or word tokenization) standard from the planning and regulation of the Thai language, so word segmentation depend on each standards. If you has the resource, you should use deep learning base but It can has out-of-domain problem. (see Handling Cross- and Out-of-Domain Samples in Thai Word Segmentation). If your work is out-of-domain from deep learning model and you doesn't has resource to hire a Thai linguist, you can use newmm and improve Thai dictionary.