Skip to content

Commit

Permalink
adr: Add ADR for query option to filter flow segment get_urls
Browse files Browse the repository at this point in the history
  • Loading branch information
philipnbbc committed Sep 17, 2024
1 parent 3917b1b commit 4f477c4
Show file tree
Hide file tree
Showing 2 changed files with 80 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ For more information on how we use ADRs, see [here](./adr/README.md).
| [0018](./adr/0018-restrict-direct-source-modification.md) | Restrict direct Source modification |
| [0019](./adr/0019-consolidate-modified-updated-terms.md) | Rename `modified_by` properties in Source and Flow schemas to `updated_by` |
| [0020](./adr/0020-version-signalling.md) | Improving the signalling of the supported API version in implementations |
| [0021](./adr/0021-storage-label-format.md) | Guidance for use of `get_urls` labels on Flow Segments |
| [0023](./adr/0023-filter-segment-get-urls.md) | Add query option to filter Flow Segment `get_urls` by `label` |

\* Note: ADR 0004a was the unintended result of a number clash in the early development of TAMS which wasn't caught before publication

Expand Down
78 changes: 78 additions & 0 deletions docs/adr/0023-filter-segment-get-urls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
status: "proposed"
---
# Query Option to Filter get_urls

## Context and Problem Statement

Each Flow Segment returned by the `/flows/<flow id>/segment` endpoint includes a `get_urls` propery that may include (labelled) URLs for fetching media objects.
The calculation of pre-signed URLs for example can result in an not insignificant increased request time.
Clients do not need the `get_urls` if they don't intend to fetch the media objects.
This ADR proposes adding a query option to filter the URLs that need to be included by a TAMS.

It was found that calculating a pre-signed URL for fetching a media object takes around 0.35 milliseconds when using the Python [boto3](https://github.com/boto/boto3) library.
That means a request to get 5000 Flow Segments will include around 1.75 seconds overhead to calculate the pre-signed URLs.
This was observed in a request to a TAMS which took around 4 seconds to return Flow Segments with pre-signed URLs and only around 2 seconds to return the same Flow Segments without pre-signed URLs.

The [boto3](https://github.com/boto/boto3) method to generate pre-signed URLs could also be optimised to reduce the time taken.
It was found that the older deprecated [boto](https://github.com/boto/boto) library took around 0.14 milliseconds to calculate a pre-signed URL, a 60% speed reduction when compared to [boto3](https://github.com/boto/boto3).
There are likely even more optimisations that could be made.

The pre-signed URL calculation may use a highly optimised implementation that avoids the large time overhead.
This would still result in larger response sizes because the pre-signed URLs tend to be quite large as they include signatures, keys and tokens.

It is difficult to predict what the expected number of flow segments will be in practice.
If it is low in all cases then including or not including pre-signed URLs doesn't make much difference.
If it is unknown then having an option to avoid the pre-signed calculation overhead may be useful in some cases.

## Considered Options

* Option 1: Always return all `get_urls`
* Option 2: Add a query option to return none or all `get_urls`
* Option 3: Add a simple query option to filter `get_urls` based on the `label`
* Option 4: Add more complex query option(s) to filter `get_urls`

## Decision Outcome

Chosen option: "Option 3: Add a simple query option to filter `get_urls` based on the `label`", because this covers the requirement to disable calculation of pre-signed URLs and at the same time allows other URLs to be retained.

### Implementation

Implemented in [PR #88](https://github.com/bbc/tams/pull/88).

## Pros and Cons of the Options

### Option 1: Always return all `get_urls`

This is the current state.

* Bad, because TAMS is doing work to calculate pre-signed URLs even though clients may not need them

### Option 2: Add a query option to return none or all `get_urls`

This adds a boolean option named `include_get_urls` that if set to `false` results in no `get_urls` in the response and therefore no pre-signed URLs are calculated by TAMS.

* Good, because it allows clients to indicate that `get_urls` are not required
* Bad, because some TAMS implementations may provide different types of URLs and a client does not have the option to filter out the ones that cause increased request time whilst retaining ones that don't

### Option 3: Add a simple query option to filter `get_urls` based on the `label`

This extends [Option 2](#option-2-add-a-query-option-to-return-none-or-all-get_urls) to add a (comma-separated) list option named `accept_get_urls` that specifies the `labels` associated with URLS to include in the response.
Omitting `accept_get_urls` will result in all URLs in the response.
Setting `accept_get_urls` to an empty string will result in no URLs in the response.
If `accept_get_urls` includes a `default` label then URLs with label `default` or no label are included in the response.

* Good, because it allows clients to indicate that `get_urls` are not required
* Good, because clients can retain URLs that don't cause increased request times
* Neutral, because clients may want more complex filters based on what is available

### Option 4: Add more complex query option(s) to filter `get_urls`

This changes [Option 3](#option-3-add-a-simple-query-option-to-filter-get_urls-based-on-the-label) to allow more complex filtering based on what URLs are available.
E.g. include a URL for direct access to the object if available and otherwise a pre-signed URL.

* Good, because it allows clients to indicate that `get_urls` are not required
* Good, because clients can retain URLs that don't cause increased request times
* Neutral, because clients can use more complex filters based on what is available
* Neutral, because it isn't clear whether there is a requirement for more complex filters
* Neutral, because [0021](./0021-storage-label-format.md) thus far only proposes an App note for guidance on the use of `get_urls` labels on flow segments

0 comments on commit 4f477c4

Please sign in to comment.