- While working on my graphql-dynamodb project, I started looking into alternatives to DynamoDB for the underlying datastore (while keeping essentially a key-value structure)
- I came across DiskCache, which got me wondering if I could use SQLite as the underlying key-value store
- But I still wanted to use AWS Lambda if possible
- Since AWS Lambda Functions can have an Amazon Elastic File System mounted to them, I thought I could possibly use that for storing the SQLite databases
- EFS mounted to AWS Lambda Function
- "Sharded" by fieldname by creating a new SQLite database for each fieldname
- I had to use a different implementation for node IDs than I used in my graphql-dynamodb project, since I wanted to get the performance benefits of using an integer primary key (aka create my own rowid alias)
- Python UUIDs are too large to store as normal SQLite integers
- ended up porting Mediagone's Small UID library for PHP, with some slight modifications to avoid duplicating UIDs when generating thousands per second (see implementation in this project or in its own repo)
- I couldn't use a local script to load the test data like I could with DynamoDB, so I created a Lambda Function for handling the initial load
- I used Python Threads and Queues to parallelize by fieldname (aka the "shard ID")
- It worked better than expected! (was honestly surprised that it worked at all 😅)
- I did run into issues when I tried to use SQLite's Write-Ahead Logging ("WAL"), so I stuck with the rollback journal mode
- SQLite WAL is documented as not working with NFS (see #2 under "disadvantages"), and my guess was that EFS uses something like NFS under the hood, so I wasn't surpised when I ran into issues with it
- It isn't as scalable as DynamoDB (EFS can only handle 35,000 read or 7,000 write IOs per second), but it's much cheaper for equivalent amounts of data and traffic