Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UFAL/Cannot download and preview files after migration #453

Closed
milanmajchrak opened this issue Oct 30, 2023 · 2 comments · Fixed by #454
Closed

UFAL/Cannot download and preview files after migration #453

milanmajchrak opened this issue Oct 30, 2023 · 2 comments · Fixed by #454
Assignees
Labels
bug Something isn't working

Comments

@milanmajchrak
Copy link
Collaborator

milanmajchrak commented Oct 30, 2023

Problem description

Downloading of the bitstream thrown a RuntimeException because bitstream file has not extensions e.g., zip, pdf,.. what is not a true.
Preview - generating of preview content cannot be processed because the bitstream extensions are empty.

That means both errors are because of the same reason.
The bitstream format is not properly mapped to the bitstream during migration because the bitstream_format_id is different in the CLARIN-DSpace5 and CLARIN-DSpace7.
Solution could be to find bitstream format by mimetype instead of its ID

@milanmajchrak milanmajchrak self-assigned this Oct 30, 2023
@milanmajchrak milanmajchrak added the bug Something isn't working label Oct 30, 2023
@milanmajchrak
Copy link
Collaborator Author

Python update:

diff --git a/data_pump/bitstream.py b/data_pump/bitstream.py
index 88e0562..d2d6519 100644
--- a/data_pump/bitstream.py
+++ b/data_pump/bitstream.py
@@ -24,9 +24,15 @@ def import_bitstream(metadata_class,
     """
     bitstream_json_name = 'bitstream.json'
     bundle2bitstream_json_name = 'bundle2bitstream.json'
+    bitsteamformat_json_name = 'bitstreamformatregistry.json'
     bitstream_url = 'clarin/import/core/bitstream'
     imported = 0

+    bitstreamformat_json_list = read_json(bitsteamformat_json_name)
+    if not bitstreamformat_json_list:
+        logging.info("Bitstreamformatregistry JSON is empty.")
+        return
+
     # load bundle2bitstream
     bundle2bitstream_json_list = read_json(bundle2bitstream_json_name)
     if bundle2bitstream_json_list:
@@ -66,8 +72,7 @@ def import_bitstream(metadata_class,
             bitstream['bitstream_format_id'] = unknown_format_id_val
         params = {'internal_id': bitstream['internal_id'],
                   'storeNumber': bitstream['store_number'],
-                  'bitstreamFormat': bitstreamformat_id_dict[
-                      bitstream['bitstream_format_id']],
+                  'bitstreamFormat': bitstreamformat_json_list[bitstream['bitstream_format_id']]['mimetype'],
                   'deleted': bitstream['deleted'],
                   'sequenceId': bitstream['sequence_id'],
                   'bundle_id': None,

@milanmajchrak
Copy link
Collaborator Author

Pythone updated in this commit: dataquest-dev/dspace-import@16db97d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant