Container Deep Enumeration Extension

Discussion related to https://github.com/SNIA/CDMI-spec/issues/265.

Use cases

Users of CDMI frequently list the contents of containers, then have to get additional data for each child. For example, when displaying a "directory"-style listing.
Users have also requested the ability to filter the contents of listings based on characteristics of children (contents of various fields)

This has overlap with exports and with query. Our response to this request in the past was to direct users and implementers to the "query" functionality in CDMI. However, since only a subset of CDMI implementations support query, this hasn't been a satisfactory answer for many end users.

The downsides to using exports are as follows:

The user has to create a new object
The user gets everything (all fields, always recursive)

The downsides of using query are as follows:

Query is a fairly heavyweight set of functionality, and is expensive for servers to implement and perform in its full generality
Query is built around long-running operations with persistent storage of results. This is partially addressed by the immediate query CDMI extension.

Example of a request from a customer

> GET /cdmi/2.0.0/MyContainer/?parentURI&children=[objectName,objectID,metadata/cdmi_size] HTTP/1.1
> Host: cloud.example.com

> Accept: application/cdmicontainer
< HTTP/1.1 200 OK
< ContentType: application/cdmicontainer
<
< {
<     "parentURI" : "/",
<     "children" : [
<         ["red", "<id>", "8393894"],
<         ["green", "<id>", "433253253"],
<         ["yellow", "<id>", "113253"],
<         ["orange/", "<id>", null],
<         ["purple/" "<id>", null],
<     ]
< }

This approach requires several changes to the CDMI specification:

A CDMI server has to support capabilities for extended container listing
A CDMI server has to detect the difference between a range and an array of requested field names in the children query parameter
A CDMI server has to handle both a children=<range> and/or children=<field list>
A CDMI server has to validate the provided children field list
A CDMI server has to return an array including the requested fields, in the order specified, for each child returned in the children field in the response body
A CDMI server has to return null if a requested field is not found for a given child
A CDMI server has to handle fields within JSON objects, specifically metadata - We need to specify the syntax for this, and if this results in new reserved characters for field names (which SNIA defines).

2021-02-26 TWG Discussion

We should do a survey of what other cloud storage interfaces have done, and compare them with other standards in REST APIs.

Gary will put some notes together on this, and put them up on github in the wiki page: https://github.com/SNIA/CDMI-spec/wiki/Container-Deep-Enumeration-Extension.

Discussion about concerns of "one-off" use cases. What about the next one?

This is why we didn't implement this specific approach the last time. Rather, we requested they use query for this feature.

2021-03-05 TWG Discussion

Talked with some stakeholders asking for this. Findings include:

Metadata listing is more important than recursive
Implementing query is viewed as "a lot of work", but the same complexity when bundled in container listing isn't
Next ask is for filtering (only show objects with name ending in ".jpg", etc (which gets into full query territory)

What other Cloud Storage Interfaces do:

S3

S3 provides a series of different parameters to allow for subsets of items in a bucket to be returned, namely:

"delimiter" - Indicates which character is used to create pseudo-directories e.g. "/"
"prefix" - Equivalent to a string starts-with "nnn" filter on object name
"start-after" - Equivalent to a string greater-than lexical sort filter on object name

S3 also returns a "contents" field for each matching object. This isn't true user-defined metadata, or specifiable, but rather is a fixed set of metadata ("Etag", "Key", "LastModified", "Owner", "Size", and "StorageClass")

https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html

GCP

GCP also provides prefix, delimiter and a start after ("startOffset"). They add an "endOffset", which says, stop here (string less-than lexical sort filter on object name).

GCP returns the following items for each listed item:

"generation", "metageneration", "contentType", "timeCreated", "updated", "customTime", "timeDeleted", "temporaryHold", "eventBasedHold", "retentionExpirationTime", "storageClass", "timeStorageClassUpdated", "size", "md5Hash", "mediaLink", "contentEncoding", "contentDisposition", "contentLanguage", "cacheControl", "metadata", "acl", "owner", "crc32c", "componentCount", "etag", "customerEncryption", "kmsKeyName"

This is the approach of returning everything on a container list.

https://cloud.google.com/storage/docs/json_api/v1/objects/list

Azure

Azure List Blogs provides prefix and delimiter, but no "start-after" equivalent.

GCP has an include statement, which allows a client to indicate what metadata they want returned for each matching object.

include={snapshots,metadata,uncommittedblobs,copy,deleted,tags,versions}

https://docs.microsoft.com/en-us/rest/api/storageservices/list-blobs

Analysis

All three cloud object interfaces are based around a flat listing with a delimiter and prefix for filtering. S3 and GCP add a "start-after"/"startOffset".

Amazon has the most limited approach for what information is returned for each object, with GCP returning everything always. Azure allows you to select what you want returned (similar to what is suggested to be added to CDMI).

All of these support "recursive" listing by default, with "delimiter" used to filter out items with the delimiter character and consolidate items that contain the delimiter character.

CDMI already provides a formal hierarchy, so the delimiter and prefix are not required. Instead we need a flag that indicates if recursive results are required.

We have two choices regarding returning metadata: Return everything, or allow the user to specify what is returned.

2021-03-12 TWG Discussion

Possible directions:

Leave spec as-is, direct people to use query for this purpose
We can write a whitepaper talking about how to use query for this purpose
We can include the whitepaper content into the spec as an example for query
We can write an extension to enhance container listing (easier to implement than query)

Next steps:

Straw poll: Continue with option #4

We are now at the point where we can start writing an extension. Scope will be:

Specifying fields to be returned (see use cases at top of document)
Recursive listing

Extension has been drafted and is now ready for review.

https://github.com/SNIA/CDMI-spec/tree/master/cdmi_extensions/extended_child_listing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly