You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NeMo-Curator/tutorials/zyda2-tutorial/2_dupes_removal/2_compute_counts.py", line 55, in group_partition
return result[
File "/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py", line 6252, in _raise_if_missing
raise KeyError(f"{not_found} not in index")
KeyError: "['size'] not in index"
Expected behavior
Successful run with size calculated correctly.
Environment overview (please complete the following information)
Environment location: Slurm
Method of NeMo-Curator install: docker container, dev image from nvcr.io/nvidia/nemo:dev
Additional context
Adding this line sizes = sizes.rename(columns={0: 'size'}) after sizes = partition.groupby("group").size().reset_index() appears to correctly rename the column and fixes the error
The text was updated successfully, but these errors were encountered:
Describe the bug
When running the 2_compute_counts.py script, it fails with an error
Exception: 'KeyError("[\'size\'] not in index")'
Steps/Code to reproduce bug
python3 2_dupes_removal/2_compute_counts.py
Expected behavior
Successful run with size calculated correctly.
Environment overview (please complete the following information)
Environment location: Slurm
Method of NeMo-Curator install: docker container, dev image from nvcr.io/nvidia/nemo:dev
Additional context
Adding this line
sizes = sizes.rename(columns={0: 'size'})
aftersizes = partition.groupby("group").size().reset_index()
appears to correctly rename the column and fixes the errorThe text was updated successfully, but these errors were encountered: