Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add from_hf_dataset method #1568

Closed
wants to merge 1 commit into from

Conversation

SauravMaheshkar
Copy link
Collaborator

Adds a new helper method to create a LightlyDataset from a HuggingFace Dataset.

Comment on lines +235 to +240
def apply_transform(batch, transform=tranform, key=key):
assert key in batch.keys(), f"the provided key, {key} does not exist in the dataset"

batch[key] = [transform(image) for image in batch[key]]

return batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this works with distributed training as local functions cannot be pickled and sent to other processes. We might have to define a transform class that takes the transform and key as input.

@guarin
Copy link
Contributor

guarin commented Jul 10, 2024

@SauravMaheshkar is it ok if we close this as we now have the new tutorial from #1569 ?

@SauravMaheshkar
Copy link
Collaborator Author

@SauravMaheshkar is it ok if we close this as we now have the new tutorial from #1569 ?

Yup closed 😄

@SauravMaheshkar SauravMaheshkar deleted the saurav/from-hf-dataset branch July 10, 2024 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants