Skip to content

MOM6 repository policies

Ian Grooms edited this page Jan 31, 2019 · 11 revisions

MOM6 repositories policies (for GitHub and GitLab).

Rationale

MOM6 source code management (SCM) is conducted using git. git is a distributed SCM meaning that git does not require a single centralized server. For organizational and management purposes we will consider one particular repository and branch to be the canonical master but the distributed approach provides great flexibility in how we operate and collaborate.

There are two official repositories: one on GitHub.com, the other at https://gitlab.gfdl.noaa.gov. Bear in mind that your working directory is also a repository in its own right. There will also be forks of repositories (snapshots of a repository) that become self contained repositories.

GitHub provides robust hosting with a modern web interface for collaboration, both with external developers and within a private project. GitLab is a free, open source, self-hosted alternative to GitHub which lives within the GFDL firewall.

Context

MOM6 relies on FMS software, and MOM6 configurations rely on other FMS components. The repository configuration is unique to MOM6 but our intent is to follow the FMS policies as much as possible. Since the MOM6 repositories are unique, there are some MOM6 specific policies and because git is new to FMS, these policies are a work-in-progress.

Terminology

A repository primarily contains files and their history. It may also contain images of other remote repositories. Your working directory is a repository and users could add your working directory as a remote in their own repository.

Cloning creates a local copy of a remote repository into a working directory (better thought of as a local working repository). It typically will be a complete copy including all the remote history up to the point when you cloned. Once cloned, your local repository is disconnected and needs to pull down updates from the parent to stay in sync.

Committing is the process of adding changes to the repository history. It is not file based but repository based. This distinction is powerful because it allows API changes (argument lists and calls) to be associated and consistent. Commits are always made to your local working repository.

Pushing is the process of sending your new history up to a parent (or other) remote repository. Until you push, all your commits are not shared with others using the same remote. Until you push, you can rearrange your commits, retroactively edit them and do other mischievous things. Once you have pushed your commits, there is no going back.

Fetching is the process of updating your copy of the remote repositories. This does not change your local repository.

Merging combines histories from two branches or repositories and is implemented in one of two ways: i) by stacking changes consecutively (known as a fast-forward), or ii) by creating a new merged state (a conventional merge).

Pulling is essentially a "fetch" followed by a "merge" for the current branch you are on.

Forking is a web-based analog of cloning. A fork is new repository that looks like a snapshot of the parent repository. There is a virtual connection between the two repositories but commits (pushes) are independent and forks thus are potentially static unless explicitly updated. Forks allow independent development by a user without interference with the parent repository. To work with a fork, the user must clone with the url of the fork, or add the fork as a second remote.

MOM6 repositories

MOM6 GitHub repository

The GitHub repository requires you to have a GitHub account. If you are a federal employee then you need to have applied for permission to have a GitHub account to use for government work. This repository is where the core developers will push their commits to share with each other. The front end is at: https://github.com/NOAA-GFDL/MOM6.

MOM6 versus MOM6-examples repository

Everything that applies to the MOM6 repository also applies to the MOM6-examples repository. As of October 2014, the MOM6 repository exclusively contains source code while everything else, namely the configuration data and tools, reside in the MOM6-examples repository.

Branches on MOM6 repositories

We promote a branch naming policy for the purposes of organizing work within the repositories. However, we no longer allow user branches on the primary repositories. Rather, all users (developers included) should use forks for any work, whether their own or for work to be submitted to the primary repositories. The branch naming conventions still can be useful but are somewhat redundant since the branches reside on forks.

The following branch naming conventions mostly reflect both i) the FMS policies (adopted on other repositories) and ii) a recommendation of best practices for personal repositories forked by users.

Branch name Purpose Comments
master Released code. City and public releases.
dev/master The effective master branch for coordinating updates between labs and for preparing releases to master. Requires complete compliance to commit rules (see section on dev/master).
dev/gfdl The effective main collaborative branch for developers at GFDL. Requires complete compliance to commit rules (see section on dev/gfdl).
dev/<project> A collaborative side-project branch. For large-scale changes that cannot continuously comply to the dev/master rules for commits. Project dependent restrictions on commits.
user/<abc>/<project> A personal branch belonging to user "abc". Used for experimental commits which might be dead-end. No restrictions.
public1 Publicly released code. This may lag the city releases if there is non-published work at risk of exposure.

A schematic shows how branches can be related.

Comment on the style of branch names

The a/b/c notation was adopted because it allows word separation for each of a, b or c. For example, user/aja/refactor_grid_module is easier to categorize than user_aja_refactor_grid_module. One side effect has been discovered that if a branch user/aja is created then user/aja/xyz has trouble being pushed. It appears that the branch name is being used for a hidden internal filename. In this example the file .git/refs/head/user/aja would exist and so the file .git/refs/head/user/aja/xyz can not be created since .git/refs/head/user/aja needs to be a directory but is already a file. While this side effect is potentially fragile, the readability and categorization of branches is sufficiently useful that we are keeping the notation. Cleanup simply requires deleting the branch.

Commit policies

All branches

There are some guidelines that apply to all branches.

  1. A commit should include all changes to all files associated with a particular update. This is important for keeping APIs consistent.

  2. Commits should not involve multiple unrelated issues at once. If you are making two unrelated changes, make two commits.

  3. Commits should be properly logged (see "Commit logging guidelines").

  4. If the commit is referring to an issue listed in the issue tracker (redmine or on github) use the # notation, such as "#101", “Fixes #43”, “resolves #2” or “closes #21”. See this stackoverflow question for help.

  5. Never rebase a branch that has been pushed. Once an upstream repository has your history, any changes to that history will be rejected.

Policies for dev/gfdl

dev/gfdl is the collaborative branch for developers at GFDL. dev/gfdl has strict guidelines for commits since it form the basis of future developments and most production work.

  1. At all times code compiles without errors using all sanctioned compilers (currently gnu, intel, PGI).

  2. At all times, all the regression tests pass with all the sanctioned compilers. Pass means that the checksums generated by running the code are the same as those committed to the repository.

  3. All commits are logged following the commit logging guidelines.

  4. If a commit [intentionally] changes answers in any test then put an asterisk (*) in the first character of the one-line summary. This helps identify answer changing commits when in forensic mode.

  5. No partial commits.

  6. Changes to checksums (ocean.stats) must be scientifically justified.

Policies for dev/master

dev/master is the branch used for preparing releases and syncing between institutional forks (other labs).

  1. All the rules for dev/gfdl apply to dev/master.

  2. All changes to dev/master are submitted via pull requests to the parent repository, i.e. NOAA-GFDL/MOM6.

    • The strict limitation to pull requests helps orchestrate updates between the forks.
    • Pull requests should ideally come from a labelled feature branch (.e.g dev-master-candidate-2017-12-01) made from the primary development branch of the respective fork, e.g. dev/gfdl, dev/center. The point of this is that the commit at the HEAD of the pull request should be on both the primary development branch and the feature branch but while the development branch may continue to evolve the feature branch will not unless fixes for the pull request are needed.
    • Minor contributions should be submitted to dev/gfdl or the primary development branch of the respective fork.
  3. Pull requests will only be merged once reviewers from the major forks have accepted the pull request.

    • This allows the development on forks to avoid needing to regression test each others configurations but gives everyone a chance to check that their own configurations are not affected by the pull request.
    • When a pull request needs adjustment because it fails a test on a fork, the authors need to expedite resolving the problems with the affected forks and the forks need to provide rapid feedback.

Other branches

dev/<project>

dev/<project> is for collaborative development on a feature/project that cannot meet the strict rules of dev/gfdl or dev/master, such as being in a working state at all times. We typically might use such a branch for a large-scale refactor.

Branches on user forks

Since a user's fork is under their sole control, they are free to name branches as they see fit. The previous requirement to use user/<abc>/<project> is no longer meaningful since user branches are not allowed on the main repository. Well named branches for feature updates, bug fixes or long-lived development are strongly encouraged. The use of underscores or hyphens as word separators is advised and either can be used, e.g. bugfix_symmetric_coriolis_term and bugfix-symmetric-coriolis-term equally good branch names.

Even on user forks, we strongly encourage the use of feature branches for development. It allows for easy submission of pull requests.

It is generally a bad idea to develop on branches master, dev/master or dev/gfdl - the history of your copy of that branch on your fork will be out of sync and forever different from the corresponding branch on the parent repository. Again, we strongly encourage the use of feature branches for organizing your work.

Chained branches / stacking pull requests

If you have a development that depends on a previous development that is not yet merged, it is still advisable to work with feature branches but to branch on top of a branch. Say feature2 depends on code modifications of feature1. Code feature1 on branch-feature1 and so that branch-feature1 can be submitted as pull request. Then create branch-feature2 off of branch-feature1 so that branch-feature2 can be submitted as a second pull request. The second branch and request will contain the commits from the first. However, once the first pull request is merged, the commits in the second pull request will be reduced to just those relevant to that request. This helps simplify the review and merge process.

When stacking pull requests, please indicate that this is the case in the pull request message.

Commit logging guidelines

All commits require a text log. git allows short logs to be provided via a command line option but will otherwise invoke an editor (preferred) at which point you can create the log entry. The format of a log entry should follow:

One-line summarizing the commit in <=50 characters

Detailed explanation of the commit, rationale, issues addresses, etc.
goes after a blank line (THE BLANK LINE IS VERY IMPORTANT). Also:
 - It is helpful to use an ascii pseudo-formatting like
   this "-" notation.
 - There is no need to add user information or dates since that
   information is recorded by git for you.
Yes, that one-liner is exactly 50 characters long! Your editor will
probably indicate when you exceed the 50 characters. Subsequent text
is best word-wrapped at 72 or 80 characters.
Here is a ruler:
         1         2         3         4         5         6         7
1234567890123456789012345678901234567890123456789012345678901234567890

A discussion/recommendation about commit messages can be found here.

The blank line after the summary is mandatory. If you skip the blank line the entire message is concatenated into the title.

Useful summary line

It can be challenging to get enough information into the short summary line and so the 50 character limit is not a hard limit. Too little information is worse than not enough information. A summary line such as "Bug fix" is almost useless. "Bug fix in MOM_ALE.F90" is better and "Bug fix: uninitialized variable (dXdYdZ) in MOM_ALE.F90" is even better despite being 55 characters long.

Short-hand to indicate answer changes

To indicate when a commit changes answers (i.e. when an ocean.stats.* file was updated) please insert an asterisk (*) to the summary line, e.g. "*Bug fix: uninitialized variable (dXdYdX) in MOM_ALE", which is 52 characters long. This helps us track experiment evolution. When answers change, the commit message should summarize the experiments for which the answers changed.