Skip to content
This repository has been archived by the owner on Jun 15, 2024. It is now read-only.

Relation among CLD2 Score and CLD3 Accuracy #24

Open
loretoparisi opened this issue Mar 14, 2019 · 2 comments
Open

Relation among CLD2 Score and CLD3 Accuracy #24

loretoparisi opened this issue Mar 14, 2019 · 2 comments

Comments

@loretoparisi
Copy link

loretoparisi commented Mar 14, 2019

In my project I have to port the language detector from CLD2 to CLD3. The CLD2 has a concept of Score, and Percentage of some language in the text. Internally the Score is calculated from a probability (not exposed in my understanding) in some way (my assumption was from the field textBytes that represents the size in bytes of the text, the accuracy and distribution of each label in the text), something like Acc=1-textBytes/Score
In CLD2 the function that normalizes these scores is

normalized_score3[2] = GetNormalizedScore(language3[2],
                                                  ULScript_Common,
                                                  bytecount3,
                                                  doc_tote->Score(2));

That said, since I need to upgrade to CLD3, I have at some point to convert from CLD2 Score to CLD3 accuracy value. Any hint how to achieve that?

Here for reference:
dachev/node-cld#52

@loretoparisi
Copy link
Author

@loretoparisi
Copy link
Author

@jasonriesa any help on this?
Thank you

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant