Skip to content

Issues: NVIDIA/NeMo-Curator

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Graceful handling when no LSH duplicates found. duplicate This issue or pull request already exists
#381 opened Nov 19, 2024 by davzoku
Update columns documentation documentation Improvements or additions to documentation
#378 opened Nov 18, 2024 by sarahyurick
Use CrossFit for TokenizerFertilityFilter enhancement New feature or request
#377 opened Nov 15, 2024 by sarahyurick
Add GPU test with NeMo 2.0
#376 opened Nov 15, 2024 by sarahyurick
[IMP] Decrease Merge Peak Memory Usage of ConnectedComponents bug Something isn't working
#375 opened Nov 15, 2024 by VibhuJawa
Zyda2 tutorial - key error when running compute_counts script bug Something isn't working
#345 opened Nov 5, 2024 by ronjer30
Zyda2 tutorial - TypeError when initializing Dask CPU cluster bug Something isn't working
#344 opened Nov 5, 2024 by ronjer30
Deprecate max_text_bytes_per_part enhancement New feature or request
#331 opened Oct 28, 2024 by sarahyurick
Improve Pytorch Model Performence enhancement New feature or request
#329 opened Oct 28, 2024 by VibhuJawa
3 tasks
Resuming the job on slurm after it gets cancelled. enhancement New feature or request
#297 opened Oct 11, 2024 by uahmed93
Unmanaged memory is high and frozen execution bug Something isn't working
#295 opened Oct 11, 2024 by pappagari
Improve NeMo Curator Experience for Pytorch Models (with crossfit) enhancement New feature or request
#288 opened Oct 9, 2024 by VibhuJawa
3 tasks done
Check Pytorch cuda context is valid across GPUs bug Something isn't working
#284 opened Oct 8, 2024 by VibhuJawa
Semantic Dedup doesn't work with UCX bug Something isn't working
#283 opened Oct 8, 2024 by praateekmahajan
GitHub workflows improvements
#259 opened Sep 24, 2024 by sarahyurick
2 of 5 tasks
Translation example with ctranslate2's Translator. enhancement New feature or request
#246 opened Sep 16, 2024 by uahmed93
Grammar and punctuation nits in Jupyter Notebooks documentation Improvements or additions to documentation good first issue Good for newcomers
#228 opened Sep 4, 2024 by sarahyurick
10 tasks
ProTip! What’s not been updated in a month: updated:<2024-10-24.