First of all, thanks for taking the time to contribute! 🎉👍
We follow the GitHub Flow: all code contributions are submitted via a pull request towards the main
branch.
Opening a pull request means you want that code to be merged. If you want to only discuss it, send a link to your branch along with your questions through whichever communication channel you prefer.
All pull requests must be reviewed by at least one person who is not their original author.
To help reviewers, make sure to describe your pull request with a clear text explanation of your changes.
GitHub Actions is used to release the package on every merge to the main branch.
Branch protection is active, meaning that a merge to the main branch can only take place once all tests pass in CI, and that the peer review policy has been fulfilled.
We strive to follow this recommendation to write our commit messages, which contains the following rules:
- Separate subject from body with a blank line.
- Limit the subject line to 50 characters.
- Capitalize the subject line.
- Do not end the subject line with a period.
- Use the imperative mood in the subject line.
- Wrap the body at 72 characters.
- Use the body to explain what and why vs. how.
We add this additional rule:
- Do not rely on GitHub issue reference numbers in commit messages, as we have no guarantee the host system and its autolinking will be stable in time. Make sure the context is self-explanatory. If an external reference is given, use its full URL.
All changes to the codebase that impact users must be documented in the CHANGELOG.md
file.
The format to use is Common Changelog, with the following additional specifications:
- The
unreleased
section of Keep a Changelog is preserved with the addition of a tag to specify which type of release should be published and to foster discussions about it inside pull requests. This tag should be one of the names mandated by SemVer, within brackets:[patch]
,[minor]
or[major]
. For example:## Unreleased [minor]
. - Each listed change must provide an actionable way to adapt the user’s codebase, either directly in the changelog or through instructions or links.
- Changes are a single sentence with no punctuation (like all examples given in Common Changelog).
- Since each release is produced automatically from a single pull request, the notice is always used to link to the source pull request rather than references, which would always reference the same pull request. References can instead link to relevant specific parts of an RFC, decision record or diff.
- The notice is also used to present sponsor information. Since the development of this project is funded by different actors, and following discussions with sponsors, financial contributions are acknowledged in the changelog itself. The format of the notice thus diverges from the Common Changelog specification in that it is not “a single-sentence paragraph”. Sponsor information is in quote format, starts with “Development of this release was supported by <funding_from>”, and provides the name and link to the sponsor, as well as information on the specific funding instrument, as specified by the sponsor itself or as required by law. A short message from the sponsor might also be added, as long as it abides by the community’s Code of Conduct and aligns with the project’s goals.
Avoid “you” and “we” (who is “we” in a common anyway?). Use neutral or passive wordings.
- You can find federated public instances on GitHub.
+ Federated public instances can be found on GitHub.
Use snake case for multiple word options:
- ota validate --schemaOnly
+ ota validate --schema-only
For command-line examples and documentation, we follow the docopt usage patterns syntax. Quick recap of the main points:
- mandatory arguments are given between
<
and>
; - optional elements are given between
[
and]
; - mutually exclusive elements are given between
(
and)
and separated by|
.
- ota track --services [ $service_id ] [, $service_id, ...]
+ ota track [--services <service_id>...]
In order to improve the understandability of commands, we document all CLI options and examples with the long version of the options.
- ota track -s $service_id -r
+ ota track --services <service_id> --extract-only
An “instance” of Open Terms Archive is comprised of a server running Open Terms Archive and up to three repositories. An instance has a name describing the scope of services it aims at tracking. This scope is defined by one or several dimensions: jurisdiction, language, industry…
For example, the
france
instance tracks documents in the French jurisdiction and French language, while thedating
instance tracks services from the dating industry.
The instance name is written in lowercase and is made of one word for each dimension it focuses on, separated by dashes.
For example, the
france-elections
instance tracks services in the French jurisdiction and French language that could impact the French electoral processes.
This name is used consistently in all communication, written references, and in the inventory of instances that are managed automatically. It is also used as the base for naming the database repositories, by suffixing it with each type:
- The repository containing the declarations of services to be tracked is named
<instance_name>-declarations
. You can create it from a template. - The repository containing the snapshots of the tracked documents (unless the instance is storing them in an alternative database) is named
<instance_name>-snapshots
. You can create it from a template. - The repository containing the versions of the tracked documents (unless the instance is storing them in an alternative database) is named
<instance_name>-versions
. You can create it from a template.
We deploy identifiers for packages and namespaces across different universes: package managers, social networks, URLs… In order to unify these names across constraints, we reserve everywhere the name opentermsarchive
, with no space, no dash, no underscore, no capital.
For example, NPM does not allow uppercase and spaces; Ansible does not allow dashes and spaces; Twitter does not allow spaces. The name
ota
is too unlikely to be available everywhere.
First of all it's important to distinguish two fundamentally different kinds of errors: operational and programmer errors.
-
Operational errors represent run-time problems experienced by correctly-written programs. These are not bugs in the program. These are usually problems with something else: the system itself (e.g. out of memory), the system’s configuration (e.g. no route to a remote host), the network (e.g. socket hang-up), or a remote service (e.g. a 500 error).
-
Programmer errors are bugs in the program. These are things that can always be avoided by changing the code. They can never be handled properly, since by definition the code in question is broken (e.g. tried to read property of
undefined
, or forget toawait
an asynchronous function).
So the very important distinction is that operational errors are part of the normal operation of a program whereas programmer errors are bugs.
Also noteworthy, failure to handle an operational error is itself a programmer error.
There are five ways to handle operational errors:
- Deal with the failure directly. For example, create directory if it's missing.
- Propagate the failure. If you don’t know how to deal with the error, the simplest thing to do is to abort whatever operation you’re trying to do, clean up whatever you’ve started, and propagate the error.
- Retry the operation. For example, try to reconnect if the connection is lost.
- Log the error — and do nothing else. If it's a minor error and there’s nothing you can do about, and there is no reason to stop the whole process.
- Crash immediately. If the error cannot be handled and can affect data integrity.
In our case, we consider all fetch
-related errors as expected, so as operational errors and we handle them by logging but we do not stop the whole process. We handle errors related to the notifier
in the same way.
In contrast, we consider errors from the recorder
module as fatal, and we crash immediately.
The best way to handle programmer errors is to crash immediately. Indeed, it is not recommended to attempt to recover from programmer errors — that is, allow the current operation to fail, but keep handling requests. Consider that a programmer error is a case that you didn’t think about when you wrote the original code. How can you be sure that the problem won’t affect the program itself and the data integrity?
This section is highly inspired, and in part extracted, from this error handling guide.