-
Notifications
You must be signed in to change notification settings - Fork 38
File Uploads
Within the near future, GoGovSG will release a feature that allows users to upload files and share them via shortlinks. This page seeks to document our design decisions and thought process behind the implementation of this feature.
When designing our implementation of this feature, these were the constraints we took into account.
- One-to-one mapping between a shortlink and an S3 bucket's object key - This allows us to very quickly tell at a glance which link does an object belong to. This provides us with another guarantee—if a particular short link has not been taken yet, this also means the corresponding S3 bucket's key is available.
- Deletion of shortlinks are not allowed.
The S3 bucket was originally configured with a bucket-wide public-read policy. This was in alignment with the philosophy of GoGovSG being a public link shortener. However, if we wished to allow file urls to be disabled, we would need to be able to set certain S3 objects to be private. This could only be done through setting an object's access control list (ACL). The behavior of S3 bucket policies and object-specific ACLs necessitated a switch in our configuration. Instead, we now have a bucket policy that sets all objects to be private by default; each object would need to have the 'public-read' ACL set in order to be visible.
In light of the constraints and S3 configuration, the file upload process is to be split into three operations.
- Creation of the shortUrl - This serves as a way for us to 'reserve' both the shortlink and S3 bucket key. If this operation fails, we know that there might be a collision in bucket key, and therefore should not perform the upload operation.
- Upload file to S3 - In this upload step, the client could either obtain a pre-signed URL to upload the file directly to S3, or send the file to the server and have it forwarded to the bucket.
- Set object's ACL to be 'public-read'
The fact that this upload operation spans multiple services necessitates a guarantee on atomicity. We would not want shortlinks pointing to nonexistent S3 objects, and neither should there be any orphaned S3 objects that do not belong to a shortlink.
One option we considered was to let the upload task be done directly from the client. This would entail the following steps:
- Client makes a regular request to create a link. This would count as a reservation of the shortlink.
- Client requests a pre-signed URL from the server, which will allow the client to send an authorised upload request to S3.
- Client makes another API call to the server to trigger an S3 update of the ACL.
Benefits: Having the upload operation done from the client would save bandwidth usage since the binary data goes directly to the S3 bucket.
Problems encountered: Difficult, if not impossible to guaranteed atomicity in this entire upload operation because of constraint 2, which states that deletion of shortlinks are not allowed. If something goes wrong with step 2 or 3, constraint 2 makes it impossible to roll back the creation of the shortlink.
If the upload was done server-side, we can make use of the DB's transaction on an application level to ensure atomicity in our entire upload flow.
- Client sends shortlink and file to server.
- Server opens a DB transaction and a shortlink. This process reserves the shortlink because ACID guarantees no dirty-reads.
- Server uploads file to S3.
- Depending on the outcome of the upload operation, the server can either commit the transaction, or rollback (which will 'undo' the link creation).
Benefits: Atomicity can be guaranteed on the server via a database transaction.
Drawbacks: More bandwidth and RAM used to send files to the server to be relayed to S3.
The team decided on option 2 (server-side uploads) on the following grounds:
- Ensuring atomicity in the application of utmost importance—at no point should the state of the database and files be out-of-sync.
- File uploads are limited to 10MB, making the resource consumption much less of a problem.