Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(de-)serialize document_type when calling DatasetDict.(to|from)_json() #346

Merged
merged 3 commits into from
Sep 17, 2023

Conversation

ArneBinder
Copy link
Owner

With this PR, we serialize the document_type when calling DatasetDict.to_json(). The document_type is saved in a file metadata.json in the output path. And it is tried to load it from there when calling DatasetDict.from_json() if no document_type is specified (which is now optional for this method).

@ArneBinder ArneBinder merged commit bc897a5 into main Sep 17, 2023
@ArneBinder ArneBinder deleted the serialize_document_type branch September 17, 2023 19:14
@ArneBinder ArneBinder changed the title (de-)serialize document_type when calling DatasetDict (de-)serialize document_type when calling DatasetDict.(to|from)_json() Sep 17, 2023
ArneBinder added a commit to ArneBinder/pytorch-ie-hydra-template-1 that referenced this pull request Sep 18, 2023
ArneBinder added a commit to ArneBinder/pytorch-ie-hydra-template-1 that referenced this pull request Sep 18, 2023
* require pytorch-ie>=0.24.2 because of ArneBinder/pytorch-ie#346

* JsonSerializer.(dump|read)(): split parameter path to path + optional file_name and add optional parameters metadata_file_name and split; save|read document_type to|from metadata file; allow to overwrite any default_kwargs in __call__

* remove mandatory document_type from from_serialized_documents dataset config

* adjust readme
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant