Skip to content

Commit

Permalink
Merge pull request #985 from datalad-handbook/main-transition
Browse files Browse the repository at this point in the history
Main transition
  • Loading branch information
adswa authored Oct 10, 2023
2 parents 57491b5 + 641a51f commit aed5ced
Show file tree
Hide file tree
Showing 60 changed files with 122 additions and 115 deletions.
4 changes: 2 additions & 2 deletions docs/basics/101-109-rerun.rst
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ and another state from the dataset's history (a commit specified with
``-t``/``--to``). Let's do a :dlcmd:`diff` between the current state
of the dataset and the previous commit (called "``HEAD~1``" in Git terminology [#f1]_):

.. windows-wit:: please use datalad diff --from master --to HEAD~1
.. windows-wit:: please use datalad diff --from main --to HEAD~1

While this example works on Unix file systems, it will not provide the same output on Windows.
This is due to different file handling on Windows.
Expand Down Expand Up @@ -256,7 +256,7 @@ other tools than from the machine-readable ``run record``.
For example, to find out who (or what) created or modified a file,
give the file path to :gitcmd:`log` (prefixed by ``--``):

.. windows-wit:: use "git log master -- recordings/podcasts.tsv"
.. windows-wit:: use "git log main -- recordings/podcasts.tsv"

A previous Windows Wit already advised to append ``main`` or ``master``, the common "default :term:`branch`", to any command that starts with ``git log``.
Here, the last part of the command specifies a file (``-- recordings/podcasts.tsv``).
Expand Down
24 changes: 12 additions & 12 deletions docs/basics/101-121-siblings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -218,41 +218,41 @@ the changed or added files from the sibling inside, and put it on your
desk. You can now take a look into that drawer to see whether you want
to have the changes right in front of you.

The drawer is a branch, and it is usually called ``remotes/origin/master``.
The drawer is a branch, and it is usually called ``remotes/origin/main``.
To look inside of it you can :gitcmd:`checkout BRANCHNAME`, or you can
do a ``diff`` between the branch (your drawer) and the dataset as it
is currently in front of you (your desk). We will do the latter, and leave
the former for a different lecture:

.. windows-wit:: Please use datalad diff --from main --to remotes/roommate/master
.. windows-wit:: Please use datalad diff --from main --to remotes/roommate/main

Please use the following command instead:

.. code-block:: bash
datalad diff --from main --to remotes/roommate/master
datalad diff --from main --to remotes/roommate/main
This syntax specifies the :term:`main` :term:`branch` as a starting point for the comparison instead of the current ``adjusted/master(unlocked)`` branch.
This syntax specifies the :term:`main` :term:`branch` as a starting point for the comparison instead of the current ``adjusted/main(unlocked)`` branch.

.. runrecord:: _examples/DL-101-121-108
:language: console
:workdir: dl-101/DataLad-101
:notes: on a different branch: remotes/roommate/master. Do a git remote -v here
:notes: on a different branch: remotes/roommate/main. Do a git remote -v here
:cast: 04_collaboration

$ datalad diff --to remotes/roommate/master
$ datalad diff --to remotes/roommate/main

This shows us that there is an additional file, and it also shows us
that there is a difference in ``notes.txt``! Let's ask
:gitcmd:`diff` to show us what the differences in detail (note that it is a shortened excerpt, cut in the middle to reduce its length):

.. windows-wit:: Please use git diff master..remotes/roommate/master
.. windows-wit:: Please use git diff main..remotes/roommate/main

Please use the following command instead:

.. code-block:: bash
git diff master..remotes/roommate/master
git diff main..remotes/roommate/main
This is :term:`Git`\s syntax for specifying a comparison between two :term:`branch`\es.

Expand All @@ -263,7 +263,7 @@ that there is a difference in ``notes.txt``! Let's ask
:lines: 1-18, 67-78
:cast: 04_collaboration

$ git diff remotes/roommate/master
$ git diff remotes/roommate/main

Let's digress into what is shown here.
We are comparing the current state of your dataset against
Expand All @@ -278,7 +278,7 @@ you made in your own dataset in the previous section.
Cool! So now that you know what the changes are that your room mate
made, you can safely :dlcmd:`update --how merge` them to integrate
them into your dataset. In technical terms you will
"*merge the branch remotes/roommate/master into master*".
"*merge the branch remotes/roommate/main into main*".
But the details of this will be stated in a standalone section later.

Note that the fact that your room mate does not have the note
Expand Down Expand Up @@ -324,7 +324,7 @@ room mate's dataset changes into your own dataset. The commit message of this
latter commit for now might contain many words yet unknown to you if you
do not use Git, but a later section will get into the details of what
the meaning of ":term:`merge`", ":term:`branch`", "refs"
or ":term:`master`" is.
or ":term:`main`" is.

For now, you're happy to have the changes your room mate made available.
This is how it should be! You helped him, and he helps you. Awesome!
Expand All @@ -348,7 +348,7 @@ Create a note about this, and save it.
Afterwards, a "datalad update --how merge -s name" will integrate the
changes made to the sibling into the dataset. A safe step in between
is to do a "datalad update -s name" and checkout the changes with
"git/datalad diff" to remotes/origin/master
"git/datalad diff" to remotes/origin/main

EOT
$ datalad save -m "Add note on adding siblings"
Expand Down
4 changes: 2 additions & 2 deletions docs/basics/101-137-history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -448,15 +448,15 @@ Neat, isn't it? By checking out a commit shasum you can explore a previous
state of a datasets history. And this does not only apply to simple text
files, but every type of file in your dataset, regardless of size.
The checkout command however led to something that Git calls a "detached HEAD state".
While this sounds scary, a :gitcmd:`checkout master` will bring you
While this sounds scary, a :gitcmd:`checkout main` will bring you
back into the most recent version of your dataset and get you out of the
"detached HEAD state":

.. runrecord:: _examples/DL-101-137-124
:language: console
:workdir: dl-101/DataLad-101

$ git checkout master
$ git checkout main


Note one very important thing: The previously untracked files are still
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/101-139-s3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -371,7 +371,7 @@ It needs to be invoked with three positional arguments, the path to the :term:`D
public=yes \
versioning=yes
fi
git annex export --to "$srname" --jobs 6 master
git annex export --to "$srname" --jobs 6 main
)
done
2 changes: 1 addition & 1 deletion docs/basics/101-141-push.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ targets are configured throughout the dataset hierarchy.
The published version of the handbook is known to the local handbook dataset
as a :term:`remote` called ``public``, and each section of the book is identified
with a custom branch name that corresponds to the section name. Whenever an
update to the public dataset is pushed, apart from pushing only the ``master``
update to the public dataset is pushed, apart from pushing only the ``main``
branch, all branches starting with the section identifier ``sct`` are pushed
automatically as well. This configuration was achieved by specifying these branches
(using :term:`globbing` with ``*``) in the ``push`` specification of this :term:`remote`:
Expand Down
6 changes: 3 additions & 3 deletions docs/basics/101-180-FAQ.rst
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ If you do not want to invent a description yourself, you can run
DataLad datasets can be updated. The command ``datalad update`` will *fetch*
updates and store them on a different branch (by default
``remotes/origin/master``). Running:
``remotes/origin/main``). Running
.. code-block:: bash
Expand Down Expand Up @@ -330,7 +330,7 @@ If you do not want to invent a description yourself, you can run
DataLad datasets can be updated. The command `datalad update` will
*fetch* updates and store them on a different branch (by default
`remotes/origin/master`). Running
`remotes/origin/main`). Running
```
datalad update --merge
Expand Down Expand Up @@ -399,7 +399,7 @@ If you do not want to invent a description yourself, you can run
DataLad datasets can be updated. The command 'datalad update' will
"fetch" updates and store them on a different branch (by default
'remotes/origin/master'). Running 'datalad update --merge' will "pull"
'remotes/origin/main'). Running 'datalad update --merge' will "pull"
available updates and integrate them in one go.
Find out what has been done
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-119-102
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ $ datalad update --how merge
[INFO] Start enumerating objects
[INFO] Start counting objects
[INFO] Start compressing objects
merge(ok): . (dataset) [Merged origin/master]
merge(ok): . (dataset) [Merged origin/main]
update.annex_merge(ok): . (dataset) [Merged annex branch]
update(ok): . (dataset)
action summary:
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-121-108
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
$ datalad diff --to remotes/roommate/master
$ datalad diff --to remotes/roommate/main
added: code/nested_repos.sh (file)
modified: notes.txt (file)
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-121-109
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
$ git diff remotes/roommate/master
$ git diff remotes/roommate/main
diff --git a/code/nested_repos.sh b/code/nested_repos.sh
deleted file mode 100644
index f84c817..0000000
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-121-110
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
$ datalad update --how merge -s roommate
[INFO] Fetching updates for Dataset(/home/me/dl-101/DataLad-101)
merge(ok): . (dataset) [Merged roommate/master]
merge(ok): . (dataset) [Merged roommate/main]
update.annex_merge(ok): . (dataset) [Merged annex branch]
update(ok): . (dataset)
action summary:
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-121-112
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
$ git log --oneline
0fbbe8c Merge remote-tracking branch 'roommate/master'
0fbbe8c Merge remote-tracking branch 'roommate/main'
81bf4d3 add note about datalad update
28a99fb Include nesting demo from datalad website
26595f2 add note on git annex whereis
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-121-113
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ name for it.
Afterwards, a "datalad update --how merge -s name" will integrate the
changes made to the sibling into the dataset. A safe step in between
is to do a "datalad update -s name" and checkout the changes with
"git/datalad diff" to remotes/origin/master
"git/datalad diff" to remotes/origin/main

EOT
$ datalad save -m "Add note on adding siblings"
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-130-118
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ copy(ok): prediction_report.csv (file) [to github...]
[INFO] Start compressing objects
[INFO] Start writing objects
publish(ok): . (dataset) [refs/heads/git-annex->github:refs/heads/git-annex ✂FROM✂..✂TO✂]
publish(ok): . (dataset) [refs/heads/master->github:refs/heads/master [new branch]]
publish(ok): . (dataset) [refs/heads/main->github:refs/heads/main [new branch]]
[INFO] Finished push of Dataset(/home/me/dl-101/DataLad-101/midterm_project)
action summary:
copy (ok: 2)
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-136-108
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
$ git status
On branch master
On branch main
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
renamed: TLCL.pdf -> The_Linux_Command_Line.pdf
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-136-109
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
$ git commit -m "rename book"
[master ee16aca] rename book
[main ee16aca] rename book
1 file changed, 0 insertions(+), 0 deletions(-)
rename books/{TLCL.pdf => The_Linux_Command_Line.pdf} (100%)
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-137-101
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ bfcdc77 finished my midterm project
4be94ca add note on DataLad's procedures
b21ea30 add note on configurations and git config
b232c62 Add note on adding siblings
0fbbe8c Merge remote-tracking branch 'roommate/master'
0fbbe8c Merge remote-tracking branch 'roommate/main'
81bf4d3 add note about datalad update
28a99fb Include nesting demo from datalad website
26595f2 add note on git annex whereis
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-137-121
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ bfcdc77 finished my midterm project
4be94ca add note on DataLad's procedures
b21ea30 add note on configurations and git config
b232c62 Add note on adding siblings
0fbbe8c Merge remote-tracking branch 'roommate/master'
0fbbe8c Merge remote-tracking branch 'roommate/main'
81bf4d3 add note about datalad update
28a99fb Include nesting demo from datalad website
26595f2 add note on git annex whereis
Expand Down
4 changes: 2 additions & 2 deletions docs/basics/_examples/DL-101-137-124
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
$ git checkout master
$ git checkout main
Previous HEAD position was afa2884 add note about cloning from paths and recursive datalad get
Switched to branch 'master'
Switched to branch 'main'
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-137-144
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

$ git revert 73b49af9✂SHA1
[master 33ad93e] Revert "did a bad modification"
[main 33ad93e] Revert "did a bad modification"
Date: Tue Jun 18 16:13:00 2019 +0000
1 file changed, 1 deletion(-)
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-139-102
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ copy(ok): books/progit.pdf (file) [to gin...]
[INFO] Start writing objects
[INFO] Start resolving deltas
publish(ok): . (dataset) [refs/heads/git-annex->gin:refs/heads/git-annex ✂FROM✂..✂TO✂]
publish(ok): . (dataset) [refs/heads/master->gin:refs/heads/master [new branch]]
publish(ok): . (dataset) [refs/heads/main->gin:refs/heads/main [new branch]]
[INFO] Finished push of Dataset(/home/me/dl-101/DataLad-101)
action summary:
copy (ok: 4)
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/_examples/DL-101-139-106
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ $ datalad push --to gin
[INFO] Start counting objects
[INFO] Start compressing objects
[INFO] Start writing objects
publish(ok): . (dataset) [refs/heads/master->gin:refs/heads/master 33ad93e..4191b5f]
publish(ok): . (dataset) [refs/heads/main->gin:refs/heads/main 33ad93e..4191b5f]
[INFO] Finished push of Dataset(/home/me/dl-101/DataLad-101)
action summary:
publish (notneeded: 1, ok: 1)
4 changes: 2 additions & 2 deletions docs/basics/_examples/DL-101-141-101
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ copy(ok): books/byte-of-python.pdf (file) [to roommate...]
[INFO] Start writing objects
[INFO] Start resolving deltas
[INFO] Finished
publish(ok): . (dataset) [refs/heads/git-annex->roommate:refs/heads/git-annex ✂FROM✂..✂TO✂]
publish(error): . (dataset) [refs/heads/master->roommate:refs/heads/master [remote rejected] (branch is currently checked out)]
publish(ok): . (dataset) [refs/heads/git-annex->roommate:refs/heads/git-annex FROM..TO ✂FROM✂..✂TO✂]
publish(error): . (dataset) [refs/heads/main->roommate:refs/heads/main [remote rejected] (branch is currently checked out)]
[INFO] Finished push of Dataset(/home/me/dl-101/DataLad-101)
action summary:
copy (ok: 3)
Expand Down
2 changes: 1 addition & 1 deletion docs/beyond_basics/101-147-riastores.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ highlighted:
│ ├── refs
│ │ ├── heads
│ │ │ ├── git-annex
│ │ │ └── master
│ │ │ └── main
│ │ └── tags
│ ├── ria-layout-version
│ └── ria-remote-ebce196a-b057-4c96-81dc-7656ea876234
Expand Down
6 changes: 3 additions & 3 deletions docs/beyond_basics/101-168-dvc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -296,8 +296,8 @@ At this point, the data is already version controlled [#f6]_, and we have the fo
deleted: /home/me/DVCvsDL/DVC-DataLad/data/raw/val/n01440764/n01440764_12021.JPEG (symlink)

$ git status
On branch master
Your branch is ahead of 'origin/master' by 2 commits.
On branch main
Your branch is ahead of 'origin/main' by 2 commits.
(use "git push" to publish your local commits)

Changes not staged for commit:
Expand Down Expand Up @@ -439,7 +439,7 @@ To demonstrate this, let's look at a repository with an empty cache by cloning t
.. runrecord:: _examples/DL-101-168-129
:workdir: DVCvsDL/DVC
:language: console
:realcommand: cd ../ && git clone /home/me/pushes/data-version-control DVC-2
:realcommand: cd ../ && git clone -b master /home/me/pushes/data-version-control DVC-2

### DVC
# clone the repo into a new location for demonstration purposes:
Expand Down
12 changes: 6 additions & 6 deletions docs/beyond_basics/101-170-dataladrun.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ The "creative" bits involved in this parallelized processing workflow boil down
- The jobs constitute a complete DataLad-centric workflow in the form of a simple **bash script**, including dataset build-up and tear-down routines in a throw-away location, result computation, and result publication back to the target dataset.
Thus, instead of submitting a ``datalad run`` command to the job scheduler, **the job submission is a single script**, and this submission is easily adapted to various job scheduling call formats.
- Right after successful completion of all jobs, the target dataset contains as many :term:`branch`\es as jobs, with each branch containing the results of one job.
A manual :term:`merge` aggregates all results into the :term:`master` branch of the dataset.
A manual :term:`merge` aggregates all results into the :term:`main` branch of the dataset.

The keys to the success of this workflow lie in

Expand Down Expand Up @@ -210,9 +210,9 @@ By running ``git annex dead here``, :term:`git-annex` disregards the clone, prev
The ``datalad push`` to the original clone location of a dataset needs to be prepared carefully.
The job computes *one* result (out of of many results) and saves it, thus creating new data and a new entry with the run-record in the dataset history.
But each job is unaware of the results and :term:`commit`\s produced by other branches.
Should all jobs push back the results to the original place (the :term:`master` :term:`branch` of the original dataset), the individual jobs would conflict with each other or, worse, overwrite each other (if you don't have the default push configuration of Git).
Should all jobs push back the results to the original place (the :term:`main` :term:`branch` of the original dataset), the individual jobs would conflict with each other or, worse, overwrite each other (if you don't have the default push configuration of Git).

The general procedure and standard :term:`Git` workflow for collaboration, therefore, is to create a change on a different, unique :term:`branch`, push this different branch, and integrate the changes into the original master branch via a :term:`merge` in the original dataset [#f4]_.
The general procedure and standard :term:`Git` workflow for collaboration, therefore, is to create a change on a different, unique :term:`branch`, push this different branch, and integrate the changes into the original main branch via a :term:`merge` in the original dataset [#f4]_.

In order to do this, prior to executing the analysis, the script will *checkout* a unique new branch in the analysis dataset.
The most convenient name for the branch is the Job-ID, an identifier under which the job scheduler runs an individual job.
Expand Down Expand Up @@ -370,14 +370,14 @@ All it takes to submit is a single ``condor_submit <submit_file>``.

**Merging results**:
Once all jobs are finished, the results lie in individual branches of the original dataset.
The only thing left to do now is merging all of these branches into :term:`master` -- and potentially solve any merge conflicts that arise.
The only thing left to do now is merging all of these branches into :term:`main` -- and potentially solve any merge conflicts that arise.
Usually, merging branches is done using the ``git merge`` command with a branch specification.
For example, in order to merge one job branch into the :term:`master` :term:`branch`, one would need to be on ``master`` and run ``git merge <job branch name>``.
For example, in order to merge one job branch into the :term:`main` :term:`branch`, one would need to be on ``main`` and run ``git merge <job branch name>``.
Given that the parallel job execution could have created thousands of branches, and that each ``merge`` would lead to a commit, in order to not inflate the history of the dataset with hundreds of :term:`merge` commits, one can do a single `Octopus merges <https://git-scm.com/docs/git-merge#Documentation/git-merge.txt-octopus>`_ of all branches at once.

.. find-out-more:: What is an octopus merge?

Usually a commit that arises from a merge has two *parent* commits: The *first parent* is the branch the merge is being performed from, in the example above, ``master``. The *second parent* is the branch that was merged into the first.
Usually a commit that arises from a merge has two *parent* commits: The *first parent* is the branch the merge is being performed from, in the example above, ``main``. The *second parent* is the branch that was merged into the first.

However, ``git merge`` is capable of merging more than two branches simultaneously if more than a single branch name is given to the command.
The resulting merge commit has as many parent as were involved in the merge.
Expand Down
Loading

0 comments on commit aed5ced

Please sign in to comment.