Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Magic Migrate #246

Closed
wants to merge 2 commits into from
Closed

Introduce Magic Migrate #246

wants to merge 2 commits into from

Conversation

schneems
Copy link
Contributor

The goal of magic migrate is to simplify versioned metadata storage.

The problem

Cloud native build packs use toml to store metadata about a layer between builds. We commonly use this to store information like what ruby version was downloaded or a sha of some kind. On the next build we can look at this information to determine if some expensive process (like downloading a binary) can be skipped. Essentially we treat it like a cache key.

If we cannot load (deserialize) the old metadata into the currently requested structure then the default behavior is to clear the cache. This means that either the programmer must be careful to never make backwards incompatible changes to the metadata, or risk triggering a cache invalidation.

Now consider that we cannot guarantee that the cache was generated from the last version of the buildpack. Someone might deploy, then wait several years before deploying again.

The classic Ruby buildpack has this problem. It's cache is unversioned so if a mistake is made in the cache structure or contents in one version, then the fix must be hardcoded and checked on every subsequent deploy of every future version https://github.com/heroku/heroku-buildpack-ruby/blob/453b13983b638d68d9d65ab89d36a2fc18128e4a/lib/language_pack/ruby.rb#L1270-L1332.

Introducing magic migrate

Magic migrate doesn't make these problem go away, instead it makes the problem easier to reason about.

When the schema of the metadata changes, the programmer can introduce a new struct and tell rust how to migrate from one version to the next using either From or TryFrom (if fallible). Then they use the corresponding magic migrate trait to tell rust how to walk this chain backwards. Now when we try to load data from disk it will try to load the latest struc, if it can't it will go to the one before, and so-on. Once it finds the original serialized struct, it converts it forwards, one step at a time until we arrive at the currently desired struct.

Now instead of trying to hold all possible cache state in mind from all possible versions of the code, the programmer only needs to know how to make each conversion one by one.

The goal of magic migrate is to simplify versioned metadata storage.

## The problem

Cloud native build packs use toml to store metadata about a layer between builds. We commonly use this to store information like what ruby version was downloaded or a sha of some kind. On the next build we can look at this information to determine if some expensive process (like downloading a binary) can be skipped. Essentially we treat it like a cache key.

If we cannot load (deserialize) the old metadata into the currently requested structure then the default behavior is to clear the cache. This means that either the programmer must be careful to never make backwards incompatible changes to the metadata, or risk triggering a cache invalidation.

Now consider that we cannot guarantee that the cache was generated from the last version of the buildpack. Someone might deploy, then wait several years before deploying again.

The classic Ruby buildpack has this problem. It's cache is unversioned so if a mistake is made in the cache structure or contents in one version, then the fix must be hardcoded and checked on every subsequent deploy of every future version https://github.com/heroku/heroku-buildpack-ruby/blob/453b13983b638d68d9d65ab89d36a2fc18128e4a/lib/language_pack/ruby.rb#L1270-L1332.

## Introducing magic migrate

Magic migrate doesn't make these problem go away, instead it makes the problem easier to reason about. 

When the schema of the metadata changes, the programmer can introduce a new struct and tell rust how to migrate from one version to the next using either `From` or `TryFrom` (if fallible). Then they use the corresponding magic migrate trait to tell rust how to walk this chain backwards. Now when we try to load data from disk it will try to load the latest struc, if it can't it will go to the one before, and so-on. Once it finds the original serialized struct, it converts it forwards, one step at a time until we arrive at the currently desired struct.

Now instead of trying to hold all possible cache state in mind from all possible versions of the code, the programmer only needs to know how to make each conversion one by one.
@schneems schneems force-pushed the schneems/magic-migrate branch from 85415a7 to 6856d33 Compare January 8, 2024 18:50
@schneems schneems marked this pull request as ready for review January 8, 2024 18:58
@schneems schneems requested a review from a team as a code owner January 8, 2024 18:58
@schneems schneems marked this pull request as draft January 12, 2024 14:05
@edmorley edmorley removed the request for review from a team January 26, 2024 22:47
@schneems schneems closed this May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant