Skip to content

Commit

Permalink
Merge pull request #1223 from datalad-handbook/mslw-its
Browse files Browse the repository at this point in the history
Fix "it's" vs "its" usage
  • Loading branch information
mih authored May 22, 2024
2 parents 8a5a72f + 523fa30 commit a23359d
Show file tree
Hide file tree
Showing 21 changed files with 26 additions and 26 deletions.
2 changes: 1 addition & 1 deletion docs/basics/101-106-nesting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ we can set subdatasets to previous states, or *update* them.

.. index::
pair: temporary working directory change; with Git
.. find-out-more:: Do I have to navigate into the subdataset to see it's history?
.. find-out-more:: Do I have to navigate into the subdataset to see its history?

Previously, we used :shcmd:`cd` to navigate into the subdataset, and
subsequently opened the Git log. This is necessary, because a :gitcmd:`log`
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/101-107-summary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ Currently, this can be considered "best-practice building": Frequent :dlcmd:`sta
commands, :dlcmd:`save` commands to save dataset modifications,
and concise :term:`commit message`\s are the main take always from this. You can already explore
the history of a dataset and you know about many types of provenance information
captured by DataLad, but for now, its been only informative, and has not been used
captured by DataLad, but for now, it has been only informative, and has not been used
for anything more fancy. Later on, we will look into utilizing the history
in order to undo mistakes, how the origin of files or datasets becomes helpful
when sharing datasets or removing file contents, and how to make changes to large
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/101-110-run2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -396,7 +396,7 @@ Make a note of this behavior in your ``notes.txt`` file.
Save yourself the preparation time
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Its generally good practice to specify ``--input`` and ``--output`` even if your input files are already retrieved and your output files unlocked -- it makes sure that a recomputation can succeed, even if inputs are not yet retrieved, or if output needs to be unlocked.
It's generally good practice to specify ``--input`` and ``--output`` even if your input files are already retrieved and your output files unlocked -- it makes sure that a recomputation can succeed, even if inputs are not yet retrieved, or if output needs to be unlocked.
However, the internal preparation steps of checking that inputs exist or that outputs are unlocked can take a bit of time, especially if it involves checking a large number of files.

If you want to avoid the expense of unnecessary preparation steps you can make use of the ``--assume-ready`` argument of :dlcmd:`run`.
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/101-121-siblings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ But why does this need to be a one-way street? "I want to
provide helpful information for you as well!", says your
room mate. "How could you get any insightful notes that
I make in my dataset, or maybe the results of our upcoming
mid-term project? Its a bit unfair that I can get your work,
mid-term project? It's a bit unfair that I can get your work,
but you cannot get mine."

.. index::
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/101-134-summary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Now what can I do with it?

For one, you will not be surprised if you ever see a subdataset being shown as
``modified`` by :dlcmd:`status`: You now know that if a subdataset
evolves, it's most recent state needs to be explicitly saved to the superdatasets
evolves, its most recent state needs to be explicitly saved to the superdataset's
history.

On a different matter, you are now able to capture and share analysis provenance that
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/101-136-filesystem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1133,7 +1133,7 @@ both ``--recursive`` and ``--reckless [availability|undead|kill]`` flags are ne
to traverse into subdatasets and to remove content that does not have verified remotes.

Be aware, though, that deleting a dataset in which ever way will
irretrievably delete the dataset, it's contents, and it's history.
irretrievably delete the dataset, its contents, and its history.

Summary
^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/101-139-hostingservices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ How to add a sibling on a Git repository hosting site: The manual way

#. If you pick the :term:`SSH` URL, make sure to have an :term:`SSH key` set up. This usually requires generating an SSH key pair if you do not have one yet, and uploading the public key to the repository hosting service. The :find-out-more:`on SSH keys <fom-sshkey>` points to a useful tutorial for this.

#. Use the URL to add the repository as a sibling. There are two commands that allow you to do that; both require you give the sibling a name of your choice (common name choices are ``upstream``, or a short-cut for your user name or the hosting platform, but its completely up to you to decide):
#. Use the URL to add the repository as a sibling. There are two commands that allow you to do that; both require that you give the sibling a name of your choice (common name choices are ``upstream``, or a short-cut for your user name or the hosting platform, but it's completely up to you to decide):

#. ``git remote add <name> <url>``
#. ``datalad siblings add --dataset . --name <name> --url <url>``
Expand Down
2 changes: 1 addition & 1 deletion docs/basics/101-146-gists.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This section is a selection of code snippets tuned to perform specific,
non-trivial tasks in datasets. Often, they are not limited to single commands of
the version control tools you know, but combine helpful other command line
tools and general Unix command line magic. Just like
`GitHub gists <https://gist.github.com>`_, its a collection of lightweight
`GitHub gists <https://gist.github.com>`_, it's a collection of lightweight
and easily accessible tips and tricks. For a more basic command overview,
take a look at the :ref:`cheat`. The
`tips collection of git-annex <https://git-annex.branchable.com/tips>`_ is also
Expand Down
2 changes: 1 addition & 1 deletion docs/beyond_basics/101-145-hooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ And here is how to set the values for these variables:
command the hook operates on, and any key from the result evaluation can be
expanded to the respective value in the result dictionary. Curly braces need to
be escaped by doubling them.
This is not the easiest specification there is, but its also not as hard as it
This is not the easiest specification there is, but it's also not as hard as it
may sound. Here is how this could look like for a :dlcmd:`unlock`::
$ unlock {{"dataset": "{dsarg}", "path": "{path}"}}
Expand Down
2 changes: 1 addition & 1 deletion docs/beyond_basics/101-160-gobig.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ begin to see performance issues in datasets.
Bench marking in DataLad datasets with varying, but large amounts of tiny files
on different file systems and different git-annex repository versions show that
a mere :dlcmd:`save` or :dlcmd:`status` command
can take from 15 minutes up to several hours. Its neither fun nor feasible to
can take from 15 minutes up to several hours. It's neither fun nor feasible to
work with performance drops like this -- so how can this be avoided?

General advice: Use several subdatasets
Expand Down
4 changes: 2 additions & 2 deletions docs/beyond_basics/101-170-dataladrun.rst
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ Importantly, the ``$JOBID`` isn't hardcoded into the script but it can be given
The code snippet above uses a bash :term:`environment variable` (``$JOBID``, as indicated by the all-upper-case variable name with a leading ``$``).
It will be defined in the job submission -- this is shown and explained in detail in the respective paragraph below.

Next, its time for the :dlcmd:`containers-run` command.
Next, it's time for the :dlcmd:`containers-run` command.
The invocation will depend on the container and dataset configuration (both of which are demonstrated in the real-life example in the next section), and below, we pretend that the container invocation only needs an input file and an output file.
These input file is specified via a bash variables (``$inputfile``) that will be defined in the script and provided at the time of job submission via command line argument from the job scheduler, and the output file name is based on the input file name.

Expand Down Expand Up @@ -311,7 +311,7 @@ Here's how the full general script looks like.
# Done - job handler should clean up workspace
Its a short script that encapsulates a complete workflow.
It's a short script that encapsulates a complete workflow.
Think of it as the sequence of necessary DataLad commands you would need to do in order to compute a job.
You can save this script into your analysis dataset, e.g., as ``code/analysis_job.sh``, and make it executable (such that it is executed automatically by the program specified in the :term:`shebang`)using ``chmod +x code/analysis_job.sh``.

Expand Down
2 changes: 1 addition & 1 deletion docs/beyond_basics/101-171-enki.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Walkthrough: Parallel ENKI preprocessing with fMRIprep

The previous section has been an overview on parallel, provenance-tracked computations in DataLad datasets.
While the general workflow entails a complete setup, it is usually easier to understand it by seeing it applied to a concrete usecase.
Its even more informative if that use case includes some complexities that do not exist in the "picture-perfect" example but are likely to arise in real life.
It is even more informative if that use case includes some complexities that do not exist in the "picture-perfect" example but are likely to arise in real life.
Therefore, the following walk-through in this section is a write-up of an existing and successfully executed analysis.

The analysis
Expand Down
2 changes: 1 addition & 1 deletion docs/beyond_basics/101-179-gitignore.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ a ``tmp/`` directory in the ``DataLad-101`` dataset:

$ datalad save -m "add something to ignore" .gitignore

This ``.gitignore`` file is very minimalistic, but its sufficient to show
This ``.gitignore`` file is very minimalistic, but it's sufficient to show
how it works. If you now create a ``tmp/`` directory, all of its contents would be
ignored by your datasets version control. Let's do so, and add a file into it
that we do not (yet?) want to save to the dataset's history.
Expand Down
6 changes: 3 additions & 3 deletions docs/beyond_basics/101-181-metalad.rst
Original file line number Diff line number Diff line change
Expand Up @@ -173,8 +173,8 @@ The following call would add the metadata entry to the current dataset, ``cozy-s
single: configuration item; datalad.dataset.id
.. find-out-more:: meta-add validity checks

When adding metadata for the first time, its not uncommon to run into errors.
Its quite easy, for example, to miss a comma or quotation mark when creating a JSON object by hand.
When adding metadata for the first time, it is not uncommon to run into errors.
It is quite easy, for example, to miss a comma or quotation mark when creating a JSON object by hand.
But there are also some internal checks that might be surprising.
If you want to add the metadata above to your own dataset, you should make sure to adjust the ``dataset_id`` to the ID of your own dataset, found via the command ``datalad configuration get datalad.dataset.id`` - otherwise you'll see an error [#f4]_, and likewise the ``dataset_version``.
And in case you'd supply the ``extraction_time`` as "this morning at 8AM" instead of a time stamp, the command will be unhappy as well.
Expand Down Expand Up @@ -407,7 +407,7 @@ As with DataLad and other Python packages, you might want to do the installation

.. [#f1] It may seem like an unnecessary duplicated effort to record the names of contained files or certain file properties as metadata in a dataset already containing these files. However, metadata can be very useful whenever the primary data can't be shared, for example due to its large size or sensitive nature, allowing consumers to, for example, derive anonymized information, aggregate data with search queries, or develop code and submit it to the data holders to be ran on their behalf.
.. [#f2] `JSON <https://en.wikipedia.org/wiki/JSON>`_ is a language-independent, open and lightweight data interchange format. Data is represented as human readable text, organized in key-value pairs (e.g., 'name': 'Bob') or arrays, and thus easily readable by both humans and machines. A *JSON object* is a collection of key-value pairs. Its enclosed in curly brackets, and individual pairs in the object are separated by commas.
.. [#f2] `JSON <https://en.wikipedia.org/wiki/JSON>`_ is a language-independent, open and lightweight data interchange format. Data is represented as human readable text, organized in key-value pairs (e.g., 'name': 'Bob') or arrays, and thus easily readable by both humans and machines. A *JSON object* is a collection of key-value pairs. It's enclosed in curly brackets, and individual pairs in the object are separated by commas.
.. [#f3] A Unix timestamp is widely used in computing and measures time as the number of seconds passed since January 1st, 1970. The timestamp in the example metadata entry (``1675113291.1464975``) translates to January 30th, 2023, 22:14:51.146497 with the code snippet below. Lots of software tools have the ability to generate timestamps for you, for example Python's `time <https://docs.python.org/3/library/time.html>`_ module or the command ``date +%s`` in a command line on Unix systems.
Expand Down
2 changes: 1 addition & 1 deletion docs/code_from_chapters/ABCD.rst
Original file line number Diff line number Diff line change
Expand Up @@ -383,7 +383,7 @@ This allows others to very easily rerun computations, but it also spares yoursel
Computational reproducibility
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Its fantastic to have means to recompute a command automatically, but the ability to re-execute a command is often not enough.
It's fantastic to have means to recompute a command automatically, but the ability to re-execute a command is often not enough.
If you don't have the required Python packages available, or in a wrong version, running the script and computing the results will fail.
In order to be *computationally* reproducible the run record does not only need to link code, command, and data, but also encapsulate the *software* that is necessary for a computation::

Expand Down
4 changes: 2 additions & 2 deletions docs/code_from_chapters/DLBasicsMPI.rst
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@ DataLad save can in addition also attach an identifier in the form of a :term:`t

The :dlcmd:`run` command can run this script in a way that links the script to the results it produces and the data it was computed from.
In principle, the command is simple: Execute any command, save the resulting changes in the dataset, and associate them as well as all other optional information provided.
Because each :dlcmd:`run` ends with a :dlcmd:`save`, its recommended to start with a clean dataset (see :ref:`chapter_run` for details on how to use it in unclean datasets)::
Because each :dlcmd:`run` ends with a :dlcmd:`save`, it's recommended to start with a clean dataset (see :ref:`chapter_run` for details on how to use it in unclean datasets)::

datalad status

Expand Down Expand Up @@ -348,7 +348,7 @@ This allows others to very easily rerun computations, but it also spares yoursel
Computational reproducibility
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Its fantastic to have means to recompute a command automatically, but the ability to re-execute a command is often not enough.
It's fantastic to have means to recompute a command automatically, but the ability to re-execute a command is often not enough.
If you don't have the required Python packages available, or in a wrong version, running the script and computing the results will fail.
In order to be *computationally* reproducible the run record does not only need to link code, command, and data, but also encapsulate the *software* that is necessary for a computation::

Expand Down
2 changes: 1 addition & 1 deletion docs/code_from_chapters/dgpa.rst
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ Let's share this data with our friends and collaborators.
There are many ways to do this (section :ref:`chapter_thirdparty` has all the details), but
a convenient way is `Gin <https://gin.g-node.org>`_, a free hosting service for DataLad datasets.

First, you need to head over to `gin.g-node.org <https://gin.g-node.org>`__, log in, and upload an :term:`SSH key`. Then, under your user account, create a new repository, and copy it's SSH URL.
First, you need to head over to `gin.g-node.org <https://gin.g-node.org>`__, log in, and upload an :term:`SSH key`. Then, under your user account, create a new repository, and copy its SSH URL.
A step by step instruction with screenshots is in the section :ref:`gin`.

.. importantnote:: The 0.16 release will have a convenience command
Expand Down
2 changes: 1 addition & 1 deletion docs/code_from_chapters/osoh.rst
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,7 @@ To get an overview on publishing datasets, however, you best go to :ref:`shareth

Another convenient way is `Gin <https://gin.g-node.org>`_, a free hosting service for DataLad datasets.

First, you need to head over to `gin.g-node.org <https://gin.g-node.org>`__, log in, and upload an :term:`SSH key`. Then, under your user account, create a new repository, and copy it's SSH URL.
First, you need to head over to `gin.g-node.org <https://gin.g-node.org>`__, log in, and upload an :term:`SSH key`. Then, under your user account, create a new repository, and copy its SSH URL.
A step by step instruction with screenshots is in the section :ref:`gin`::

datalad create-sibling-gin \
Expand Down
4 changes: 2 additions & 2 deletions docs/code_from_chapters/usecase_ml_code.rst
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@ DataLad save can in addition also attach an identifier in the form of a :term:`t

The :dlcmd:`run` command can run this script in a way that links the script to the results it produces and the data it was computed from.
In principle, the command is simple: Execute any command, save the resulting changes in the dataset, and associate them as well as all other optional information provided.
Because each :dlcmd:`run` ends with a :dlcmd:`save`, its recommended to start with a clean dataset (see :ref:`chapter_run` for details on how to use it in unclean datasets)::
Because each :dlcmd:`run` ends with a :dlcmd:`save`, it's recommended to start with a clean dataset (see :ref:`chapter_run` for details on how to use it in unclean datasets)::

datalad status

Expand Down Expand Up @@ -334,7 +334,7 @@ This allows others to very easily rerun computations, but it also spares yoursel
Computational reproducibility
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Its fantastic to have means to recompute a command automatically, but the ability to re-execute a command is often not enough.
It's fantastic to have means to recompute a command automatically, but the ability to re-execute a command is often not enough.
If you don't have the required Python packages available, or in a wrong version, running the script and computing the results will fail.
In order to be *computationally* reproducible the run record does not only need to link code, command, and data, but also encapsulate the *software* that is necessary for a computation::

Expand Down
2 changes: 1 addition & 1 deletion docs/code_from_chapters/yale.rst
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ Let's share this data with our friends and collaborators.
There are many ways to do this (section :ref:`chapter_thirdparty` has all the details), but
a convenient way is `Gin <https://gin.g-node.org>`_, a free hosting service for DataLad datasets.

First, you need to head over to `gin.g-node.org <https://gin.g-node.org>`__, log in, and upload an :term:`SSH key`. Then, under your user account, create a new repository, and copy it's SSH URL.
First, you need to head over to `gin.g-node.org <https://gin.g-node.org>`__, log in, and upload an :term:`SSH key`. Then, under your user account, create a new repository, and copy its SSH URL.
A step by step instruction with screenshots is in the section :ref:`gin`.

You can register this URL as a sibling dataset to your own dataset using :dlcmd:`siblings add`::
Expand Down
2 changes: 1 addition & 1 deletion docs/usecases/ml-analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,7 @@ We will use the following script for this:
It will load the trained and dumped model and use it to test its prediction performance on the yet unseen test data.
To evaluate the model performance, it calculates the accuracy of the prediction, i.e., the proportion of correctly labeled images, prints it to the terminal, and saves it into a json file in the superdataset.
As this script constitutes the last analysis step, let's save it with a :term:`tag`.
Its entirely optional to do this, but just as commit messages are an easier way for humans to get an overview of a commits contents, a tag is an easier way for humans to identify a change than a commit hash.
It is entirely optional to do this, but just as commit messages are an easier way for humans to get an overview of a commits contents, a tag is an easier way for humans to identify a change than a commit hash.
With this script set up, we're ready for analysis, and thus can tag this state ``ready4analysis`` to identify it more easily later.

.. runrecord:: _examples/ml-114
Expand Down

0 comments on commit a23359d

Please sign in to comment.