We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug At dataset creation, the dataset generated will always get the cached version despite change in file.
To Reproduce
toolkit.py
Expected behavior
Environment:
Ubuntu
The text was updated successfully, but these errors were encountered:
This is caused by huggingface Dataset.from_generator() method checking to see if dataset is cached. See code.
Dataset.from_generator()
Easiest solution is to pass in a cache_dir parameter (like ./dataset_cache) with each Ingestor class, for example here.
cache_dir
./dataset_cache
Ingestor
That way whenever there's a change in local file, user can delete the cache directory ./dataset_cache.
Future Enhancement
no_cache
config.data
Sorry, something went wrong.
benjaminye
No branches or pull requests
Describe the bug
At dataset creation, the dataset generated will always get the cached version despite change in file.
To Reproduce
toolkit.py
toolkit.py
will not create a new dataset with desired changesExpected behavior
Environment:
Ubuntu
The text was updated successfully, but these errors were encountered: