Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parameters minFreqIndiv and nbStopWords #16

Open
erwanm opened this issue Nov 7, 2017 · 0 comments
Open

parameters minFreqIndiv and nbStopWords #16

erwanm opened this issue Nov 7, 2017 · 0 comments

Comments

@erwanm
Copy link
Owner

erwanm commented Nov 7, 2017

issue migrated from original private gitlab repo

In the new implementation it is much more convenient to put all possible combinations of obs types together, e.g.

word.T.mf2.sw50
word.T.mf3.sw50
word.T.mf5.sw50
word.T.mf2.sw100
word.T.mf3.sw100
word.T.mf5.sw100

The disadvantage is the potential redundancy of information: if a basic obs type X, say POS trigrams, is very useful, it will be selected by the genetic algorithm multiple times with the different variants, possibly with little information gain between the variants.

In the original prototype the options were independent: there was only one possible value for each option in the config file, hence only one possible variant of each basic obs type selected. This can be implemented via some more complex config generation mechanism in the new version.

Since it is unclear to me at this point what the best solution is, the former option (easiest) is currently used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant