-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add "Problem: paginating requests from two sets of results"
- Loading branch information
Showing
1 changed file
with
106 additions
and
0 deletions.
There are no files selected for viewing
106 changes: 106 additions & 0 deletions
106
_posts/2023-11-02-problem-paginating-requests-from-two-sets-of-results.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
--- | ||
layout: post | ||
title: 'Problem: paginating requests from two sets of results' | ||
date: 2023-11-02 18:23 +0000 | ||
--- | ||
|
||
_This is an issue I bumped into at work, and I'm not entirely sure I got to the | ||
right conclusion._ | ||
|
||
Lets say you're a reading list curation company, and you run two services. | ||
|
||
The first you call BookSite, which is an internal service. That project's job | ||
is to collect ISBNs and book titles and match them together and offer that data | ||
through an API for you to use on the second website. | ||
|
||
BookListSite lets users pull together interesting lists of books, using filters | ||
and sort orders and what not. However, it doesn't keep any of the book's | ||
metadata in its own data stores. It makes a RecommendedBook, but whenever the | ||
user looks up that recommendation, we make a call to BookSite to get the extra | ||
information. | ||
|
||
BookSite offers a [Graphiti API](https://www.graphiti.dev/guides/) to get this | ||
data. That's nice because it handles the vast majority of this complexity for | ||
us: the user can search by author names, book titles, order by release date or | ||
title, even pagination is handled by Graphiti. BookListSite can largely just | ||
make a request and pass it through a Presenter and job done! | ||
|
||
Now, BookListSite has a brand new feature: recommended books. | ||
|
||
Since all user data is kept on BookListSite, that's where RecommendedBooks are | ||
generated and stored. Users has multiple recommendations with a ranking, 1 | ||
being the book we think they'll most like. (They might even be generated by a | ||
data analytics team who are happier working in Python, off the side somewhere, | ||
and give a CSV with BookId,UserId,Ranking each day.) | ||
|
||
So the problem: **when a user searches for an author, and then asks for the | ||
books to be sorted by Recommended first, how the heck do we do that?** When | ||
your PM helpfully tells you that the rest of the books should be ordered by | ||
release date, the problem gets a little more complicated. | ||
|
||
Graphiti doesn't support "order by this arbitrary sequence I'm providing," and | ||
who can blame it. | ||
|
||
My first attempt wasn't a complete solution: order by release date via | ||
Graphiti, and then sort them by ranking "locally". This worked just fine on my | ||
development machine. But on production my hubris quickly became evident: what | ||
if the recommended books appear on the *second page* of the Graphiti API? Those | ||
won't get pulled to the top. | ||
|
||
The solution we've gone live with is fairly simple, but still quite delicate. | ||
|
||
1. Make an API request by BookID, asking for the all recommended books first. | ||
2. Sort these by ranking locally. | ||
3. Make a second API request for page 1 of the books in the typical order. | ||
4. Filter out any from the Recommended list, to avoid them showing up again. | ||
5. Get page 2 of the books when they're (lazily) requested. | ||
|
||
One thing I'm nervous about here is that **this only works because I know there | ||
are typically a small number of Recommendations**. Fewer than 16 usually, which | ||
can be a large page, but not awful. | ||
|
||
The other disadvantage is that the first payload of books the users sees is a | ||
large one: potentially `(RecommendedBooks.count + NormalBookSearch.count)`. And | ||
then after that, it's possible that the second page is entirely empty! Our lazy | ||
loading will handle that, but the user will notice a delay whilst we load and | ||
throw away an entire page of results. | ||
|
||
## So, other ideas? | ||
|
||
**Would it be awful to let BookSite know about the Recommendations?** That way, | ||
we could continue living in bliss. Graphiti can be made aware of the custom | ||
behaviour around this sort order, and nothing special has to happen on | ||
BookListSite to get pagination or filtering working. | ||
|
||
The negatives here are largely around feature creep, I think. As well as the | ||
complexities of keeping the RecommendedBooks lists in sync between the two | ||
services. But ultimately, it just isn't BookSite's responsibility. (OR IS | ||
IT???) | ||
|
||
**Smarter pagination state.** I think the real fix here might be having | ||
BookListSite be cleverer with pagination. Instead of continuing to think in | ||
BookSite API result pages, we should be thinking about BookSite Books and using | ||
an **enumerator** on those, hiding away how we got them in the first place. | ||
|
||
Instead of the conversation being the 5 steps above, we should instead be | ||
asking a BookEnumerator (or something) for "the next book", or "the next 12 | ||
books". That Enumerator can keep track of where it's getting them from. It | ||
could have an array of endpoints the exhaust: | ||
|
||
``` Enumerator | ||
bookLocations: ['books?filter[id]=12,87,33,21,...&filter[author]=Orwell', 'books?filter[author]=Orwell&sort=release_date'] | ||
currentLocationIndex: 0, | ||
nextPage: 'books?filter[id]=12,87,33,21&page[number]=2' | ||
``` | ||
|
||
Once the first `bookLocation` has run out of books to return, the Enumerator | ||
can increment `currentLocationIndex`, and start on results from the next | ||
location. | ||
|
||
When asked for the next 10 items, they may come from two different pages or | ||
endpoints. Completely transparently! That means that BookListSite can keep | ||
asking for results until it has a page-full to send to the user. | ||
|
||
*I'll try implementing something like this tomorrow. Please do shout out if | ||
there are other, BETTER patterns. I have to run to the gym now in the terrible | ||
rain, so I must appolgeoise for the even-less-than-usual editing in this post.* |