Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zyda2 tutorial - key error when running compute_counts script #345

Open
ronjer30 opened this issue Nov 5, 2024 · 0 comments
Open

Zyda2 tutorial - key error when running compute_counts script #345

ronjer30 opened this issue Nov 5, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@ronjer30
Copy link
Contributor

ronjer30 commented Nov 5, 2024

Describe the bug
When running the 2_compute_counts.py script, it fails with an error Exception: 'KeyError("[\'size\'] not in index")'

Steps/Code to reproduce bug

  1. Follow steps in tutorial
  2. Run python3 2_dupes_removal/2_compute_counts.py
  3. Script fails with following error
NeMo-Curator/tutorials/zyda2-tutorial/2_dupes_removal/2_compute_counts.py", line 55, in group_partition
    return result[
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py", line 6252, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
KeyError: "['size'] not in index"

Expected behavior
Successful run with size calculated correctly.

Environment overview (please complete the following information)

Environment location: Slurm
Method of NeMo-Curator install: docker container, dev image from nvcr.io/nvidia/nemo:dev

Additional context
Adding this line sizes = sizes.rename(columns={0: 'size'}) after sizes = partition.groupby("group").size().reset_index() appears to correctly rename the column and fixes the error

@ronjer30 ronjer30 added the bug Something isn't working label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant