Skip to content

Commit

Permalink
implement DatasetDict.shuffle
Browse files Browse the repository at this point in the history
  • Loading branch information
ArneBinder committed Sep 9, 2024
1 parent e1db8f3 commit ba1168d
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions src/pie_datasets/core/dataset_dict.py
Original file line number Diff line number Diff line change
Expand Up @@ -694,6 +694,15 @@ def cast_document_type(
)
return result

def shuffle(self, **kwargs):
result = DatasetDict.from_hf(super().shuffle(**kwargs), document_type=self.document_type)

Check warning on line 698 in src/pie_datasets/core/dataset_dict.py

View check run for this annotation

Codecov / codecov/patch

src/pie_datasets/core/dataset_dict.py#L698

Added line #L698 was not covered by tests

# TODO: integrate into DatasetDict.from_hf
for split_name, split in result.items():
split.document_converters = self[split_name].document_converters

Check warning on line 702 in src/pie_datasets/core/dataset_dict.py

View check run for this annotation

Codecov / codecov/patch

src/pie_datasets/core/dataset_dict.py#L701-L702

Added lines #L701 - L702 were not covered by tests

return result

Check warning on line 704 in src/pie_datasets/core/dataset_dict.py

View check run for this annotation

Codecov / codecov/patch

src/pie_datasets/core/dataset_dict.py#L704

Added line #L704 was not covered by tests


def load_dataset(*args, **kwargs) -> Union[DatasetDict, Dataset, IterableDataset]:
dataset_or_dataset_dict = datasets.load_dataset(*args, **kwargs)
Expand Down

0 comments on commit ba1168d

Please sign in to comment.