-
Notifications
You must be signed in to change notification settings - Fork 10
Principles for the ocaml.org infrastructure
The goal of this document is to sketch out principles for how the OCaml.org infrastructure should operate. This is less about any particular tool or technology, and more about how to make decisions about them.
The objectives themselves are phrased as statements that we want to be true, rather than things that have already been achieved.
The OCaml infrastructure is now depended on by thousands of downstream projects. As such, the package management archives are part of a critical software supply chain, and we should strive for best practises to secure the core infrastructure as a priority beyond even availability. This isn't necessarily only active security measures, but also backup practises, key management and decision making procedures. Participation in OCaml infrastructure for individuals brings on this shared responsibility towards security.
We should also strive to proactively keep up with the wider industry (e.g. integrating with software-bill-of-materials and other emerging trust mechanisms), while also supporting our core innovations from the OCaml community (e.g. Conex).
Our hosted services should be as globally available as possible, from the perspective of the use of the website and the opam package manager. There are consumers of this infrastructure from continuous integration and release platforms all over the world.
A tradeoff with availability is that of update latency; package uploads and documentation rebuilds should come in a reasonable timeframe but not so fast that it requires trading off the simplicity of the steady-state deployed infrastructure. Where possible, we should prefer static hosting to business logic that is processed dynamically, since it is easier to replicate static content.
It's important as an open source community that elements of our infrastructure are replicable outside of the main deployments. This is to enable contributors to make improvements (which requires a local development environment), and also to allow industrial users to recreate a setup without dependency on core infrastructure. This is, for example, already true with opam package hosting where everything can be replicated locally. It is less true for some specific services such as the email archivers and messaging bridges.
An important element of local reproducibility is that it enables us to encourage participation in OCaml infrastructure management without handing over security keys to sensitive machines to newer contributors.
Keeping infrastructure operational is a balancing act between the time individuals spend manually maintaining infrastructure, and just spending money on cloud services. And given the climate crisis, we need to ensure we are not wasteful in our use of resources and minimise our carbon emissions impact. This implies tooling evolution that reflects this ethos: for example, instead of paying for a single, extremely available website, we could ensure our tools have the ability to use multiple multiple mirrors instead.
Where possible, it's best to not pay for services that require ongoing expenditure, and to host it directly ourselves. The reason is longevity; OCaml has been around for decades, and will continue to be around for many more. Almost every external service we've relied on has changed or been retired (e.g. Travis CI), and it's often more efficient to build a custom version for our core services. This is not always true; especially for some things we cannot replicate like global CDNs. We need to track those carefully and ensure that monetary expenses do not get out of hand or overly depend on a single commercial sponsor.
Our infrastructure costs are often sponsored by generous organisations that want to support open source. This support ranges from financial contributions and direct cloud credits. They often make small requests in return for their support, mainly around requesting reports and status updates on how their resources are being utilised. Transparent, regular communication (ideally in a public fashion) on how we are utilising this sponsorship helps build trust and justifies ongoing funding.
OCaml is one of the best languages available for building robust, networked services. Many of the core infrastructure needs (for web and file hosting, scripting and even DNS serving) could be served on OCaml-based infrastructure. The benefits of doing this are that we gain operational experience with deploying OCaml code, and can contribute back improvements to these areas that may not be immediately obvious without the pressing demands of live infrastructure. Finding bugs in our community packages also helps build up the quality of code across a variety of contributing individuals and organisations who might not otherwise work together.
This isn't to say that we have to use OCaml for absolutely everything. It's perfectly fine to have deployments in areas where OCaml does not have a good solution. But it would be ideal to communicate these ``missing pieces'' on the infrastructure pages to give the community an opportunity to contribute back in the medium term.
OCaml has a rich variety of supported operating systems and CPU architectures. While we want to stay on the mainline path of Tier-1 systems for the core architecture, the only thing stopping using more exotic architectures for secondary services is the willingness of volunteers to find reliable hosting and system administration. We should encourage curiosity and make room for these less popular systems in our deployments where practical, since heterogenous deployments are generally stronger over time.