Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentence Length Histogram Job #555

Merged
merged 3 commits into from
Dec 2, 2024
Merged

Conversation

christophmluscher
Copy link
Contributor

Job to generate a histogram of the sentence lengths

text/info.py Outdated Show resolved Hide resolved
Co-authored-by: michelwi <[email protected]>
Copy link
Contributor

@JackTemaki JackTemaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the job is very specific, and maybe user would want to adjust the plot (e.g. different bin sizes). One extension idea: you could print the removed lines in a different color or so.

But technically it seems fine, so I guess I can approve.

@christophmluscher
Copy link
Contributor Author

I would let future users extend this if they want further functionality :D

@christophmluscher christophmluscher merged commit 23704ff into main Dec 2, 2024
4 checks passed
@christophmluscher christophmluscher deleted the CML-sentence-length-job branch December 2, 2024 11:51
christophmluscher added a commit that referenced this pull request Dec 2, 2024
* add sentence length histogram job

* improvements

* Fix max num check

Co-authored-by: michelwi <[email protected]>

---------

Co-authored-by: michelwi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants