One of the most important thing make caching efficiently is Batching & Pipeline. Similar to the N+1 Problem when doing batching gets is much more efficient than doing get one by one.
There are points in the caching solution that when do batching can improve its effectiveness dramatically:
- Batching Get for keys of the same type of item: for example multi-get for multiple product infos.
- Batching Get for keys of different type of items, when there is no data dependent between items: for example, getting product information and getting its prices is often independent, and can be fetched at the same time.
- Batching Get to the database (will be N + 1 problem if it's not) and Batching Set to set back data to the Cache Server when many of the keys are missed at the same time, especially at the start when Cache is empty.
- Batching Get and "Batching" Sleep for the algorithm that Preventing Thundering Herd. Which means instead of sleep one by one, all keys will sleep for the same duration and then Multi-Get to retry all the keys again.
This library will help solve all these problems.
The idea started with the interface Pipeline
:
package memproxy
type Pipeline interface {
LeaseGet(key string, options LeaseGetOptions) func() (LeaseGetResponse, error)
LeaseSet(key string, data []byte, cas uint64, options LeaseSetOptions) func() (LeaseSetResponse, error)
Delete(key string, options DeleteOptions) func() (DeleteResponse, error)
// Execute flush commands to the network
Execute()
// Finish must be called after create a Pipeline, often by defer
Finish()
// LowerSession returns a lower priority session
LowerSession() Session
}
Let focus on these 3 functions:
LeaseGet(key string, options LeaseGetOptions) func () (LeaseGetResponse, error)
LeaseSet(key string, data []byte, cas uint64, options LeaseSetOptions) func () (LeaseSetResponse, error)
Delete(key string, options DeleteOptions) func () (DeleteResponse, error)
Instead of returning immediately with the result, those 3 functions when called will only collect the operations
.
Only after the returned anonymous functions are called, the collected operations
will actually be executed.
For example, in the following code:
fn1 := pipeline.LeaseGet("key01", LeaseGetOptions{})
fn2 := pipeline.LeaseGet("key02", LeaseGetOptions{})
fn3 := pipeline.LeaseGet("key03", LeaseGetOptions{})
resp1, err := fn1()
resp2, err := fn2()
resp3, err := fn3()
The first 3 calls will do nothing except collects the operations in an internal buffer.
ONLY at the forth call: resp1, err := fn1()
, the operations are actually being flushed to the network and
waiting for the results back from memcached servers.
The fifth and the sixth lines:
resp2, err := fn2()
resp3, err := fn3()
actually does NOT much else except mapping the results from the waiting in
resp1, err := fn1()
to the corresponding operations.
Readers familiar with Promise
in JavaScript
will find this function signature looks like a Promise
.
Instead of returning the result right away, the function will return a Promise
that the result can get from in the future.
In the case of Pipeline
, the Promise
is a simple and efficient anonymous
function that facilitates batching.
Using the idea of returning anonymous functions, we can make the batching between different data types easier to implements.
For example, consider the following example:
package example
import "context"
type ProductInfo struct {
// detail fields
}
type ProductPrice struct {
// detail fields
}
type Repository interface {
GetProductInfo(ctx context.Context, sku string) func() (ProductInfo, error)
GetProductPrice(ctx context.Context, sku string) func() (ProductPrice, error)
}
func main() {
var repo Repository // assume to be initialized with a correct implementation
ctx := context.Background()
infoFunc1 := repo.GetProductInfo(ctx, "SKU01")
infoFunc2 := repo.GetProductInfo(ctx, "SKU02")
priceFunc1 := repo.GetProductPrice(ctx, "SKU01")
priceFunc2 := repo.GetProductPrice(ctx, "SKU02")
info1, err := infoFunc1()
info2, err := infoFunc2()
price1, err := priceFunc1()
price2, err := priceFunc2()
}
We can see the 4 operations with 2 kinds of data can be batching efficiently using this approach.
Simply returning anonymous functions can help batching between unrelated kinds of data. But how to do batching when the actions are related?
For example, how to do batching for this chain of actions:
get from cache if not found => get from DB => then set back to the cache
The idea is using the help of the Session
interface:
package memproxy
type Session interface {
AddNextCall(fn func())
Execute()
// other functions
}
The function AddNextCall
simply appends to a list of defer calls.
The function Execute
loops through this list and executes each defer functions one by one.
Assuming 3 operations above belong to the same interface, if we implement as below:
package example
import (
"context"
"errors"
)
var ErrNotFound = errors.New("not found")
type UserData struct {
// other fields
}
type Session interface {
AddNextCall(fn func())
Execute()
}
type Repository interface {
GetFromCache(ctx context.Context) func() (UserData, error)
GetFromDB(ctx context.Context) func() (UserData, error)
SetToCache(ctx context.Context, data UserData) func() error
}
func GetCache(
ctx context.Context,
repo Repository,
sess Session,
) func() (UserData, error) {
cacheFunc := repo.GetFromCache(ctx)
var result UserData
var err error
sess.AddNextCall(func() {
result, err = cacheFunc()
if err == ErrNotFound {
getDBFunc := repo.GetFromDB(ctx)
sess.AddNextCall(func() {
result, err = getDBFunc()
if err != nil {
return
}
setCacheFunc := repo.SetToCache(ctx, result)
sess.AddNextCall(func() {
_ = setCacheFunc()
})
})
}
})
return func() (UserData, error) {
sess.Execute()
return result, err
}
}
The way the GetCache
function implemented will make the 3 operations:
- GetFromCache
- GetFromDB
- SetToCache
Behave in a batching manner.
The actual implement will be more complicated because of many options and have to deal with sleeping for Thundering Herd Protection. But the main idea remains the same.