Skip to content

Commit

Permalink
Add "Problem: paginating requests from two sets of results"
Browse files Browse the repository at this point in the history
  • Loading branch information
shamess committed Nov 2, 2023
1 parent f8af3af commit 26f6118
Showing 1 changed file with 106 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
layout: post
title: 'Problem: paginating requests from two sets of results'
date: 2023-11-02 18:23 +0000
---

_This is an issue I bumped into at work, and I'm not entirely sure I got to the
right conclusion._

Lets say you're a reading list curation company, and you run two services.

The first you call BookSite, which is an internal service. That project's job
is to collect ISBNs and book titles and match them together and offer that data
through an API for you to use on the second website.

BookListSite lets users pull together interesting lists of books, using filters
and sort orders and what not. However, it doesn't keep any of the book's
metadata in its own data stores. It makes a RecommendedBook, but whenever the
user looks up that recommendation, we make a call to BookSite to get the extra
information.

BookSite offers a [Graphiti API](https://www.graphiti.dev/guides/) to get this
data. That's nice because it handles the vast majority of this complexity for
us: the user can search by author names, book titles, order by release date or
title, even pagination is handled by Graphiti. BookListSite can largely just
make a request and pass it through a Presenter and job done!

Now, BookListSite has a brand new feature: recommended books.

Since all user data is kept on BookListSite, that's where RecommendedBooks are
generated and stored. Users has multiple recommendations with a ranking, 1
being the book we think they'll most like. (They might even be generated by a
data analytics team who are happier working in Python, off the side somewhere,
and give a CSV with BookId,UserId,Ranking each day.)

So the problem: **when a user searches for an author, and then asks for the
books to be sorted by Recommended first, how the heck do we do that?** When
your PM helpfully tells you that the rest of the books should be ordered by
release date, the problem gets a little more complicated.

Graphiti doesn't support "order by this arbitrary sequence I'm providing," and
who can blame it.

My first attempt wasn't a complete solution: order by release date via
Graphiti, and then sort them by ranking "locally". This worked just fine on my
development machine. But on production my hubris quickly became evident: what
if the recommended books appear on the *second page* of the Graphiti API? Those
won't get pulled to the top.

The solution we've gone live with is fairly simple, but still quite delicate.

1. Make an API request by BookID, asking for the all recommended books first.
2. Sort these by ranking locally.
3. Make a second API request for page 1 of the books in the typical order.
4. Filter out any from the Recommended list, to avoid them showing up again.
5. Get page 2 of the books when they're (lazily) requested.

One thing I'm nervous about here is that **this only works because I know there
are typically a small number of Recommendations**. Fewer than 16 usually, which
can be a large page, but not awful.

The other disadvantage is that the first payload of books the users sees is a
large one: potentially `(RecommendedBooks.count + NormalBookSearch.count)`. And
then after that, it's possible that the second page is entirely empty! Our lazy
loading will handle that, but the user will notice a delay whilst we load and
throw away an entire page of results.

## So, other ideas?

**Would it be awful to let BookSite know about the Recommendations?** That way,
we could continue living in bliss. Graphiti can be made aware of the custom
behaviour around this sort order, and nothing special has to happen on
BookListSite to get pagination or filtering working.

The negatives here are largely around feature creep, I think. As well as the
complexities of keeping the RecommendedBooks lists in sync between the two
services. But ultimately, it just isn't BookSite's responsibility. (OR IS
IT???)

**Smarter pagination state.** I think the real fix here might be having
BookListSite be cleverer with pagination. Instead of continuing to think in
BookSite API result pages, we should be thinking about BookSite Books and using
an **enumerator** on those, hiding away how we got them in the first place.

Instead of the conversation being the 5 steps above, we should instead be
asking a BookEnumerator (or something) for "the next book", or "the next 12
books". That Enumerator can keep track of where it's getting them from. It
could have an array of endpoints the exhaust:

``` Enumerator
bookLocations: ['books?filter[id]=12,87,33,21,...&filter[author]=Orwell', 'books?filter[author]=Orwell&sort=release_date']
currentLocationIndex: 0,
nextPage: 'books?filter[id]=12,87,33,21&page[number]=2'
```

Once the first `bookLocation` has run out of books to return, the Enumerator
can increment `currentLocationIndex`, and start on results from the next
location.

When asked for the next 10 items, they may come from two different pages or
endpoints. Completely transparently! That means that BookListSite can keep
asking for results until it has a page-full to send to the user.

*I'll try implementing something like this tomorrow. Please do shout out if
there are other, BETTER patterns. I have to run to the gym now in the terrible
rain, so I must appolgeoise for the even-less-than-usual editing in this post.*

0 comments on commit 26f6118

Please sign in to comment.