-
Notifications
You must be signed in to change notification settings - Fork 38
File Uploads
Within the near future, GoGovSG will release a feature that allows users to upload files and share them via shortlinks. This page seeks to document our design decisions and thought process behind the implementation of this feature.
When designing our implementation of this feature, these were the constraints we took into account.
- One-to-one mapping between a shortlink and an S3 bucket's object key - This allows us to very quickly tell at a glance which link does an object belong to. This provides us with another guarantee—if a particular short link has not been taken yet, this also means the corresponding S3 bucket's key is available.
- Deletion of shortlinks are not allowed.
The S3 bucket was originally configured with a bucket-wide public-read policy. This was in alignment with the philosophy of GoGovSG being a public link shortener. However, if we wished to allow file urls to be disabled, we would need to be able to set certain S3 objects to be private. This could only be done through setting an object's access control list (ACL). The behavior of S3 bucket policies and object-specific ACLs necessitated a switch in our configuration. Instead, we now have a bucket policy that sets all objects to be private by default; each object would need to have the 'public-read' ACL set in order to be visible.
In light of the constraints and S3 configuration, the file upload process is to be split into three operations.
- Creation of the shortUrl - This serves as a way for us to 'reserve' both the shortlink and S3 bucket key. If this operation fails, we know that there might be a collision in bucket key, and therefore should not perform the upload operation.
- Upload file to S3 - In this upload step, the client could either obtain a pre-signed URL to upload the file directly to S3, or send the file to the server and have it forwarded to the bucket.
- Set object's ACL to be 'public-read'
The fact that this upload operation spans multiple services necessitates a guarantee on atomicity. We would not want shortlinks pointing to nonexistent S3 objects, and neither should there be any orphaned S3 objects that do not belong to a shortlink.
One option we considered was to let the upload task be done directly from the client. This would entail the following steps:
- Client makes a regular request to create a link. This would count as a reservation of the shortlink.
- Client requests a pre-signed URL from the server, which will allow the client to send an authorised upload request to S3.
- Client makes another API call to the server to trigger an S3 update of the ACL.
Benefits: Having the upload operation done from the client would save bandwidth usage since the binary data goes directly to the S3 bucket.
Problems encountered: Difficult, if not impossible to guaranteed atomicity in this entire upload operation because of constraint 2, which states that deletion of shortlinks are not allowed. If something goes wrong with step 2 or 3, constraint 2 makes it impossible to roll back the creation of the shortlink.