Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a KBI on fixing location info in git-type special remotes #22

Merged
merged 9 commits into from
Apr 18, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions kbi/0006/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
.. index::
triple: git-annex; special remote; reconfigure

.. highlight:: console

adswa marked this conversation as resolved.
Show resolved Hide resolved
KBI0006: How to fix-up a git-type special remote with a new location
====================================================================

:authors: Adina Wagner <[email protected]>
:discussion: https://github.com/psychoinformatics-de/knowledge-base/pull/22
:keywords: special remote, git-annex, gitannex; enableremote

In many datasets, special remotes ensure that annexed file content can
be retrieved from external sources.
Sometimes, however, external resources vanish.
This happened to data of the studyforrest project:
Initially, data was hosted on University of Magdeburg's
infrastructure (``psydata.ovgu.de``), and registered as a `Git-type special remote`_
``mddatasrc``:

.. code-block:: console

$ git cat-file -p git-annex:remote.log
9536f86d-eb34-42ed-8ffc-fafd63a2b87e autoenable=true location=http://psydata.ovgu.de/studyforrest/visualrois/.git name=mddatasrc type=git timestamp=1459405007.225384s
[...]

This means that the webserver behind ``psydata.ovgu.de`` contains a publicly accessible
annex repo with all file content.
When the group moved institutions, the data was migrated to the new institution's
hosting infrastructure.
The former URL was configured to redirect to this new data source.
This kept data retrieval functional for a few years.
However, ``psydata.ovgu.de`` was eventually taken down by the former institution.
This made data retrieval impossible, and led to delays in the dataset, since any
mih marked this conversation as resolved.
Show resolved Hide resolved
git-annex operations tried to contact the special remote until it timed out.

.. _Git-type special remote: https://git-annex.branchable.com/special_remotes/git

How to update location information of the special remote
--------------------------------------------------------

The procedure to update the special remote was two-fold:

The problematic dataset was cloned from GitHub, its central entrypoint, to apply the fix.
First, the corresponding Git remote was removed:

.. code-block:: bash

$ git remote remove mddatasrc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed (note that joey left a recent comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, responded. I added a link to this in a new trailing section.


Next, it was possible to update the special remote by re-enabling it with updated location information.
This command required knowledge of the new hosting location (again a publicly accessible
annex repo), and the UUID of the special remote, taken from ``remote.log``::

$ git annex enableremote 9536f86d-eb34-42ed-8ffc-fafd63a2b87e location=https://datapub.fz-juelich.de/studyforrest/studyforrest/visualrois/.git

Afterwards, a test retrieval of some files with ``datalad get`` confirmed success.
To propagate the location update to the central dataset on GitHub, the ``annex`` branch
needs to be pushed there.
A regular ``datalad push --to origin`` should suffice.
The output should indicate that the annex branch was updated::

$ datalad push
publish(ok): . (dataset) [refs/heads/git-annex->origin:refs/heads/git-annex 304f2250..3a9c6331]
action summary:
publish (notneeded: 1, ok: 1)

To ensure everything worked as intended, the updated dataset was cloned again to
test if data retrieval succeeded.
adswa marked this conversation as resolved.
Show resolved Hide resolved


Alternatives
------------

In the future, git-annex may provide a dedicate management command for this purpose. See https://git-annex.branchable.com/bugs/Disabling_remote_auto-enabling_not_possible for updates.