Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait some time before clean-up old files #8

Open
helperFunction opened this issue Jul 19, 2023 · 14 comments
Open

wait some time before clean-up old files #8

helperFunction opened this issue Jul 19, 2023 · 14 comments

Comments

@helperFunction
Copy link

Hi :)

first: i love this package! great how easy it is to deploy a spa to s3/cloudfront and leveraging tiered TTLs on the go.

I just stumbled about an edge-case for the --delete option:

  1. A user downloads index.html (v1)
  2. spa gets deployed (v2) and (v1) gets deleted
  3. user's browser wants to download v1 linked resources

so this only affects users with a slow connection and only if (v1)-files are not in the cache in cloudfront.

possible solution: add a cli parameter to configure how long to wait before delete old files. Probably a default of 30s/60s should be enough to give slower clients enough time to download everything

Thanks for your work!

and again: i love this one :)

@ottokruse
Copy link
Owner

Thanks for the feedback and great idea! Will look into it when I have some bandwidth

@krcourville
Copy link

Interesting topic. Glad I read this before using the delete feature.

An initial thought was to set a TTL on old objects in s3 vs deleting them. So far, I don't see that you can directly apply a TTL to objects. But it does look like you can apply a tag such as "delete-me=true". Then define a lifecycle policy in the bucket that removes those tagged objects after some time.

From the perspective of this utility, maybe the delete option could be extended to use multiple strategies:

  1. just delete (current strategy)
  2. Apply a given tag

@ottokruse
Copy link
Owner

TTL is a great idea for sure

@krcourville
Copy link

I may be tempted to contribute a PR if you want to hammer out the desired augments.

maybe

s3-spa-upload dist-dir my-s3-bucket-name --prefix mobile --delete-with-tag 'archive=true'

or...

--apply-life-cycle-tag 'archive=true'

@ottokruse
Copy link
Owner

ottokruse commented Feb 8, 2024

That would be awesome 👏

How about:

--tag-old-files 'key=value'

Let's make actually providing 'key=value' optional and default to 's3-spa-upload:archive=true'

And add a line in the docs that you're supposed to create an accompanying lifecycle rule to delete the files?

@krcourville
Copy link

I like it.

@helperFunction
Copy link
Author

hey guys :)

like the idea with the tag.

but for the ease of use of this tool i think it would be better if we find a solution without having to setup a bucket lifecycle rule.

what about tagging a TTL=timestamp and at the next run/deployment all files with an expired TTL get removed?

@ottokruse
Copy link
Owner

Both solutions make sense. Your option requires another deploy to clean up the previous one (which works and is pragmatic), the lifecycle rule doesn't need that though which is nice as well.

it would be better if we find a solution without having to setup a bucket lifecycle rule

What's your take there, why do you want to skip creating the bucket lifecycle rule? Just simpler if you didn't have to do that?

@krcourville
Copy link

I suppose, If you don’t control your own aws infrastructure, I could see where it might be a pain to deal with a lifecycle policy, depending on your relationship with the s3 bucket administrator.

@ottokruse
Copy link
Owner

To prevent the problem listed by @helperFunction we need to tag each uploaded file with a version nr or timestamp, and then upon deleting old files, only delete files with version nr == current version - 2. This way you will always have 2 generations of files on S3––but not more:

  • The last upload: V3
  • The upload before that: V2

The upload before V2, V1, would be deleted.

In that way, users with slow connections who already started downloading V2 during the upload of V3, can proceed without error. (And we are then assuming there aren't any users downloading V1 still which is likely but not guaranteed)

Maybe we flag this as such:

s3-spa-upload dist bucketname --delete --keep-old-generations 1

And maybe we should make the default of --keep-old-generations to be 1 so that just typing s3-spa-upload dist bucketname --delete would be enough to tap into this functionality.

The current functionality can still be achieved then by doing:

s3-spa-upload dist bucketname --delete --keep-old-generations 0

Would that work for both your use cases @krcourville @helperFunction ?

@krcourville
Copy link

krcourville commented Feb 12, 2024

As far as I can tell, a lifecycle policy filter can only work against constant values on tags. The filtering only supports "equals", not "greater than", "starts with", or otherwise. more info

Would we need separate arguments for each strategy?

That said, I'm not attached to the s3 lifecycle-managed option if the versioned option works fine.

With the versioned option, would we need to bootstrap an existing deployment somehow? Otherwise, to start, would there be objects with no generation/version tag?

Maybe I'm overthinking. If the tag does not exist, could we assume it's the previous version?

Also, how is "current version" determined?

@ottokruse
Copy link
Owner

Regard objects without the version tag to be old and eligible for delete? Easiest I guess.

Can use a timestamp with second precision as version. We'd have to list the bucket to find out which previous versions exist. But we have to do list bucket anyway to do the deletes.

And agreed if we also want to support the lifecycle method that would need a static tag value. And a separate cli parameter to trigger it.

@krcourville
Copy link

krcourville commented Feb 12, 2024

A timestamp does add more meaning and potential usefulness to the tag.

Ok. Since the bucket has to be listed anyway, assuming that is cached, it could be avoided again. Makes sense.

To keep the most recent "n" generations, do we first iterate the bucket, accumulate a list of distinct versions, determine which will be kept based on sort order, and purge the rest?

I suspect the next request will be: "can we use the version number generated by my code pipeline?" In which case, we could default to timestamp and allow override. If maintaining more than one old version is required, it's up to the lib consumer to ensure their version is chronologically sortable.

Maybe that doesn't matter. If the scope here is about syncing from built web app to deployment bucket, while minimizing the chance of someone getting a 404 response. Support for rollback/revert to previous versions is the only reason I can think of to get that complicated.

Personally, using a timestamp as the current version, pushing new files to the bucket with that value, deleting anything else that doesn't match that value , for my use case would be fine. I wouldn't be looking to keep multiple versions.

At this point, I can commit to adding the little bit of code that would be required for "add this specified tag instead of delete". Based on your specification above:

--tag-old-files 'key=value'
Let's make actually providing 'key=value' optional and default to 's3-spa-upload:archive=true'

And also providing an example lifecycle policy.

@ottokruse
Copy link
Owner

Adding that flag to enable lifecycle policies would be awesome, please go for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants