#18: Allow embed=[] in methods and embed resources in response #20

kshepherd · 2024-06-11T10:53:56Z

@alanorth This is one way of fixing #18 -- I am interested to hear if you think it's the right approach.

We can request linked HAL stuff to be embedded in the parent object using ?embed=... in the REST API call. So this change allows a client to say bitstreams = d.get_bitstreams(bundle=my_bundle, embeds=['format']), and then in the resulting Bitstream objects you can get format out like format = BitstreamFormat(bitstream.embedded['format']) without any additional API calls.

The only thing I wasn't sure about is if we should handle that in the client lib instead of making the actual client do it... this way keeps things relatively simple in the library, we don't have to know anything in the bitstream model because it'll just copy all _embedded JSON into a dict. If we wanted to make that model more 'smart' we'd tell it what kind of embeds to expect, and we could parse and instantiate the BitstreamFormat objects before handing back in the get_bitstreams() return value.

Here is my example script to test it out:

import pprint

from dspace_rest_client.client import DSpaceClient
from dspace_rest_client.models import BitstreamFormat

# Authenticate against the DSpace client
authenticated = d.authenticate()
if not authenticated:
    print('Error logging in! Giving up.')
    exit(1)

print('\nExample of ORIGINAL bundle output with format embedded.\n')
# Get top communities
top_communities = d.get_communities(top=True)
for top_community in top_communities:
    # Get all collections in this community
    collections = d.get_collections(community=top_community)
    for collection in collections:
        # Get all items in this collection - see that the recommended method is a search, scoped to this collection
        # (there is no collection/items endpoint, though there is a /mappedItems endpoint, not yet implemented here)
        items = d.search_objects(query='*:*', scope=collection.uuid, dso_type='item')
        for item in items:
            print(f'{item.name} ({item.uuid})')
            # Get all bundles in this item
            bundles = d.get_bundles(parent=item)
            for bundle in bundles:
                if bundle.name == 'ORIGINAL':
                    print(f'\n\nBUNDLE {bundle.name} ({bundle.uuid})')
                    # Get all bitstreams in this bundle
                    bitstreams = d.get_bitstreams(bundle=bundle, embeds=['format'])
                    for bitstream in bitstreams:
                        print(f'{bitstream.name} ({bitstream.uuid})')
                        if 'format' in bitstream.embedded:
                            format = BitstreamFormat(bitstream.embedded['format'])
                            pprint.pp(format.as_dict())

kshepherd · 2024-06-11T11:31:39Z

(i should note, you can also just bypass the python obj and use the dict directly like bitstream.embedded['format']['mimetype'] -- it's hard to predict with the different ways and context we use libs like this, whether the "knowledge" of how the data looks should be in the client impl, or further up in the library

kshepherd · 2024-10-29T10:27:21Z

@dpk interested in your thoughts on this PR / approaches like it

dpk · 2024-10-29T11:07:07Z

dspace_rest_client/client.py

@@ -16,6 +16,7 @@
 """
 import json
 import logging
+import pprint


This import doesn’t seem to be used

dpk · 2024-10-29T11:28:44Z

dspace_rest_client/client.py

@@ -595,6 +596,8 @@ def get_bitstreams(self, uuid=None, bundle=None, page=0, size=20, sort=None):
        @param size:    Size of results per page (default: 20)
        @return:        list of python Bitstream objects
        """
+        if embeds is None:
+            embeds = []


Is there any reason not to request the format by default? It seems a very weird default of DSpace not to include this information to begin with, but there might be a reason for it. (E.g. if the format embed can be very large, but this doesn’t seem to be the case.)

We could leave the option to pass embeds=[] explicitly if this is useful (for performance? I’m really struggling to understand why it is useful to know about the name of a file, the size of a file, the MD5 hash of a file, but not the format of a file 🤔)

Fair question - it comes down to DSpace modelling.. a bitstream format is itself an addressable object in the database and might need to be accessed or inspected outside the context of a bitstream or file (e.g. managing supported formats and their descriptions, known extensions, etc). So it's a separate resource, and is linked from a bitstream (as well as other places). You're right that it's usually embedded in the UI requests since... why wouldn't you want it.

And the idea of linked resources (where you can either follow the HAL link or ask for the resource to be embedded in the original request in advance) does give either option.

All good questions! With regards to how the backend resources look and work, we can question and change those in the scope of those projects (DSpace/DSpace, DSpace/RestContract, dspace-angular), but for this library I've been simply trying to work with how the REST API does work in practice, and try to play nice with the HAL paradigm

dpk · 2024-11-12T14:08:09Z

This feature should also be added to get_bitstreams_iter once #27 is merged

…t in response _embedded in HALResource new BitstreamFormat model get_bitstreams takes e.g. (embeds=['format'])

optional list of resources to embed instead of just link

kshepherd · 2024-11-20T17:56:30Z

@alanorth @dpk now most get, create, update (any method that ultimately returns a DSpace Object or list of DSpace Objects) can take an embeds list param, which will mean those links resources get embedded in the response JSON instead of linked, so you don't have to go off and do an extra request to see e.g. bitstream format, or community logo, or collection owningCommunity, etc.

for the search objects methods, the embeds param will apply to each object in the list of search results

dpk · 2024-11-21T15:37:46Z

I wonder how plausible it would be to auto-detect object types and wrap them in the correct model object class (for embeds and possibly also dso search)

kshepherd · 2024-11-25T10:39:39Z

@dpk yes, we could also cut down on a lot of duplicate code with better object/type detection in DSO methods (you can see where I am able to do some things with e.g. update_dso since I already have the constructed object from the user and can use type() to later cast the result obj), and expanding out the modelling - I included BitstreamFormat as an example. but for round one of embed I thought it might be easier just to get everything working as dicts for now so it was at least usable, and build out the modelling from there

dpk · 2024-11-25T11:17:56Z

>>> c.search_objects('lily', embeds=['bundle'])[0].embedded
{}
>>> c.search_objects('lily', embeds=['bundles'])[0].embedded
{'bundles': {'_embedded': {'bundles': [...]}}}

Would there be any way to raise an error if someone gives an embedding that doesn’t exist?

Otherwise looks good

kshepherd · 2024-11-25T11:35:20Z

>>> c.search_objects('lily', embeds=['bundle'])[0].embedded
{}
>>> c.search_objects('lily', embeds=['bundles'])[0].embedded
{'bundles': {'_embedded': {'bundles': [...]}}}
Would there be any way to raise an error if someone gives an embedding that doesn’t exist?

Otherwise looks good

A list of valid links/embeds could be included in our models... they're not actually super-clearly documented outside of code but the models in both the java and angular give a list of linked resources..
We could also control those references so instead of strings it's like, embeds=[Item.EMBEDS.BUNDLES, ...] or something? And use those constants for both validation and giving better hints to the user when constructing requests

dpk · 2024-11-25T11:50:31Z

Eh, I think it’s fine as it is. I’ll make an issue to consider doing that as a future improvement.

kshepherd self-assigned this Jun 11, 2024

dpk reviewed Oct 29, 2024

View reviewed changes

kshepherd added 4 commits November 20, 2024 18:43

the-library-code#18: Allow embed=[] in get_bitstreams and embed forma…

fbfbb1b

…t in response _embedded in HALResource new BitstreamFormat model get_bitstreams takes e.g. (embeds=['format'])

Allow 'embed' in most get/create/update methods

cc74c8a

optional list of resources to embed instead of just link

Add optional embeds param to all _iter methods

6b77564

Fix typos

88e0ec5

kshepherd force-pushed the bitstream_format_model branch from 27b9444 to 88e0ec5 Compare November 20, 2024 17:54

kshepherd requested a review from dpk November 20, 2024 17:54

kshepherd changed the title ~~#18: Allow embed=[] in get_bitstreams and embed format in response~~ #18: Allow embed=[] in methods and embed resources in response Nov 21, 2024

dpk approved these changes Nov 25, 2024

View reviewed changes

kshepherd merged commit c888f2d into the-library-code:main Dec 3, 2024

kshepherd mentioned this pull request Dec 11, 2024

Consider making more use of enums/sum type objects to improve ability to catch typos etc. #33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#18: Allow embed=[] in methods and embed resources in response #20

#18: Allow embed=[] in methods and embed resources in response #20

kshepherd commented Jun 11, 2024

kshepherd commented Jun 11, 2024

kshepherd commented Oct 29, 2024

dpk Oct 29, 2024

dpk Oct 29, 2024

kshepherd Oct 29, 2024

dpk commented Nov 12, 2024

kshepherd commented Nov 20, 2024

dpk commented Nov 21, 2024

kshepherd commented Nov 25, 2024

dpk commented Nov 25, 2024

kshepherd commented Nov 25, 2024

dpk commented Nov 25, 2024

#18: Allow embed=[] in methods and embed resources in response #20

#18: Allow embed=[] in methods and embed resources in response #20

Conversation

kshepherd commented Jun 11, 2024

kshepherd commented Jun 11, 2024

kshepherd commented Oct 29, 2024

dpk Oct 29, 2024

Choose a reason for hiding this comment

dpk Oct 29, 2024

Choose a reason for hiding this comment

kshepherd Oct 29, 2024

Choose a reason for hiding this comment

dpk commented Nov 12, 2024

kshepherd commented Nov 20, 2024

dpk commented Nov 21, 2024

kshepherd commented Nov 25, 2024

dpk commented Nov 25, 2024

kshepherd commented Nov 25, 2024

dpk commented Nov 25, 2024