At it's simplest, Marble's software can be described as below:
The Marble software is composed of 2 docker images and 3 parts:
- [Docker 1] a back-end API server (go + gin), that serves the internal and public REST APIs
- [Docker 1] a set of cron job scripts (go, run from the same docker image with a different entrypoint as the back-end API server), meant to run periodically
- [Docker 2] a front-end API server (typescript + Remix), that serves html and exposes actions to the browser
It relies on the existence of a basic set of infrastructure to work:
- A Postgresql database
- A scheduler for the cron jobs
- A set of object storage buckets, to store documents uploaded in the case manager and csv files for batch data ingestion (currently, only Google Cloud Storage: compatibility for ASW S3 and Azure Blob Storage planned soon)
- A load balancer with TLS termination to expose the APIs
- A configured Firebase app for end user authentication
The docker images for Marble are stored on the registry europe-west1-docker.pkg.dev/marble-infra/marble/marble-backend
and europe-west1-docker.pkg.dev/marble-infra/marble/marble-frontend
for the API respectively.
For reference, below is a schematic of Marble's cloud offering architecture. It is essentially a variation of the architecture described above, with some infrastructure choices that work well for our cloud offering specifically.
Here at Marble, we choose to deploy our code on a serverless infrastructure. We do this to leverage the full ease of use, flexibility and scalability of GCP's Cloud Run offering. Doing this, we choose to run the back-end API server and the cron jobs (which are run from the same docker image) separately:
- the API is a Cloud Run service
- the scripts are run as distinct Cloud Run Jobs, scheduled by a Cloud Scheduler
Technically we could have the cron jobs scheduled within our go code image and run from within the API service, but we avoid this for the following reasons:
- we don’t want a batch job to be stopped/to fail because an api instance is torn down
- we don’t want the api to be impacted by a batch job’s cpu/memory usage
- the cloud run API has limits in how long it can take at most to handle requests (which are shorter than the typical cron job execution time)
However, running it all together could make sense if Marble is run in a VM, more on this below.
Moreover, in our cloud deployment, we use Google Secret Manager (integrated with Cloud Run) to inject secrets as environment variables into the containers, and a GCP application load balancer for load balancing, TLS termination, DDoS protection and other security rules.
In this repository, we provide an example set of terraform files that you can adapt to deploy Marble serverless in your own GCP environment. It should also be reasonably simple to run an equivalent deployment on AWS using Fargate and AWS Scheduler, or Azure Container Instances + scheduler on Azure.
While Marble on a VM is not ideal for our cloud offering, it may make sense for an open-source or on-premise usecase. If you do so, it could make sense to run the back-end API and built-in go scheduler together by passing the --server --cron
arguments to the docker container.
While we do not provide support for deploying Marble on Kubernetes, it should work very similarly to a serverless deployment. You can schedule the cron jobs by using Kubernetes' built-in scheduling tool.
Currently (March 2024), Marble still has some requirements on GCP infrastructure:
- Cloud Storage for file storing
- Firebase authentication for authentication (nb: in practice, any usage of Marble should fall under the Firebase auth free plan) We plan to propose alternatives soon, starting by proposing S3/Azure blob options for file storing.