From a271e14657b62282da9c68f323b7c9906f8f27a2 Mon Sep 17 00:00:00 2001 From: Yaroslav Halchenko Date: Wed, 15 May 2019 15:34:14 -0400 Subject: [PATCH] NF: SINGULARITY_CMD=shell to record (bash) history+result of interactive sessions **Related** This is a prototype for functionality which might be of interest outside of this project, e.g. related: - regular `datalad run` to record activities in the shell. - [`run --interactive`](https://github.com/datalad/datalad/issues/2158#issuecomment-365323188) - [`run --shell`](https://github.com/datalad/datalad/issues/2275) so here I am "implementing" it, solely for containerized environments ATM, via a "over the head" communication to the shim in environment variable - `datalad run` for better record keeping, e.g. - [saving stdout/err](https://github.com/datalad/datalad/issues/3385) so here I was not bothering to establish stdout/err capture but possibly could and might - `reproman login`, or even `execute` (with or without --trace) and may be `run` where we could benefit from having an environment with a unified interface for interactive sessions which would also establish the record of activities - just a regular shell environment to make a clear record of commands which were ran - might eventually absorb/meld with the "opinionated .bashrc" proposed for the training curiculum: https://github.com/ReproNim/module-reproducible-basics/pull/26 which provides assistance/docs for more efficient use of cmdline and establishes 'infinite bash history'. **reproshell???** So it feels to me like a motivation for some kind of a reproshell independent project which would be - usable indepdendently and easily installable/bindable (e.g. into a container) - parametrizeable to be invoked from the shim here and/or by datalad or reproman so could just take care about capturing all sidecar files into specified locations **Could benefit from** - knowing more about "datalad (containers-)run" invocation Implemented now within `singularity_run` shim, which could have benefited from having additional information about how exactly it was `ran` and also to instruct datalad run "upstairs" that there is now an additional file in [extra_outputs](https://github.com/datalad/datalad/issues/3094). Hence there is https://github.com/datalad/datalad/issues/3422 - [`datalad run` being able to 'cover' multiple commits](https://github.com/datalad/datalad/issues/3265) Interactivity creates ambiguity for `rerun` semantic: - run record ATM would say "reinvoke interactive session" which might be desireable on its own (e.g. to redo something manually in that original container) - but for "automated reproducibility" we do have all information (bash history file, which is a list of commands to run) possibly recorded in another commit, which is ATM is not associated with the "run" record So may be with somehow [tagging run commits](https://github.com/datalad/datalad/issues/3371) it could be possible to disambiguate/select specific run commits/records?
**Example** (dev) 1 13348.....................................:Wed 15 May 2019 06:12:24 PM EDT:. (git-annex)hopa:~/proj/repronim/containers[enh-shell]git-annex $> SINGULARITY_CMD=shell datalad containers-run -n repronim-reproin [INFO ] Making sure inputs are available (this may take some time) [INFO ] == Command start (output follows) ===== yoh@hopa:/home/yoh/proj/repronim/containers$ touch my-results singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers$ cd images/ singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers/images$ ls bids README.md repronim singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers/images$ cd ../ singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers$ ls binds images LICENSE my-results README.md scripts yoh@hopa:/home/yoh/proj/repronim/containers$ exit add(ok): .repronim/bash_histories/0.1-3-ge25c927-2019-05-15T18:12:37-04:00 (file) save(ok): . (dataset) action summary: add (ok: 1) save (ok: 1) [INFO ] == Command exit (modification check follows) ===== delete(ok): LICENSE (file) add(ok): my-results (file) save(ok): . (dataset) action summary: add (ok: 1) delete (ok: 1) get (notneeded: 1) save (ok: 1) SINGULARITY_CMD=shell datalad containers-run -n repronim-reproin 3.42s user 1.74s system 9% cpu 54.068 total $> git log --stat HEAD^^.. commit 89fed08617418e5ddb88ae11ee2c14db699acf31 (HEAD -> enh-shell) Author: Yaroslav Halchenko Date: Wed May 15 18:13:28 2019 -0400 [DATALAD RUNCMD] ./scripts/singularity_cmd run images/rep... === Do not change lines below === { "chain": [], "cmd": "./scripts/singularity_cmd run images/repronim/repronim-reproin--0.5.4.sing ", "dsid": "b02e63c2-62c1-11e9-82b0-52540040489c", "exit": 0, "extra_inputs": [], "inputs": [ "images/repronim/repronim-reproin--0.5.4.sing" ], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ LICENSE | 201 --------------------------------------------------------------------------------------------- my-results | 1 + 2 files changed, 1 insertion(+), 201 deletions(-) commit 5aa3b3383c2746f7c1d07ecdcc73852eb0a30f17 Author: Yaroslav Halchenko Date: Wed May 15 18:13:28 2019 -0400 [REPRONIM/CONTAINERS]: bash history for the interactive session Actual changes might (or not, depending on the invocation) get committed in the next commit .repronim/bash_histories/0.1-3-ge25c927-2019-05-15T18:12:37-04:00 | 7 +++++++ 1 file changed, 7 insertions(+) $> cat .repronim/bash_histories/0.1-3-ge25c927-2019-05-15T18:12:37-04:00 echo "I will do something useful today" touch my-results cd images/ ls cd ../ ls rm LICENSE ; echo 'nobody needs those'
**Additional possible features which might come here into a prototype** - color info/error messages from the shim - improve PS1 (probably multiline -- too much in a single line to still be able edit commands) - indicate being [reproman --trace](https://github.com/ReproNim/reproman/issues/416)'ed - provide 'reactive' PS1 to alert user when he/she leaves the initial directory (thus the one outside of original dataset), possibly resulting in outputs which would not be recorded --- scripts/singularity_cmd | 96 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 95 insertions(+), 1 deletion(-) diff --git a/scripts/singularity_cmd b/scripts/singularity_cmd index 999fc8ad..ca8fad9f 100755 --- a/scripts/singularity_cmd +++ b/scripts/singularity_cmd @@ -37,19 +37,113 @@ function info() { : # echo -e "I: $@" >&2 } +function error() { + echo -e "E: $@" >&2 + exit 1 +} + +function has_changes() { + git status -s | grep -q . +} + +function singularity_version() { + singularity --version | sed -e 's,^[^0-9]*,,g' +} + +# https://stackoverflow.com/a/24067243 +function version_gt() { + test "$(printf '%s\n' "$@" | sort -V | head -n 1)" != "$1"; +} + thisdir=$(dirname $0| xargs readlink -f) updir=$(dirname "$thisdir") cmd="${SINGULARITY_CMD:-$1}"; shift +# We might need to expand list of arguments +args=("$@") + +# +# Pass other useful variables inside the container +# if [ ! -z "${DATALAD_CONTAINER_NAME:-}" ]; then export SINGULARITYENV_DATALAD_CONTAINER_NAME="$DATALAD_CONTAINER_NAME" fi +# +# Prepare bind mounts +# + # singularity bind mounts system /tmp, which might result in side-effects # Create a dedicated temporary directory to be removed upon completion tmpdir=$(mktemp -d --suffix=singtmp) info "created temp dir $tmpdir" trap "rm -fr '$tmpdir' && info 'removed temp dir $tmpdir'" exit -singularity "$cmd" -e -c -W "$tmpdir" -H "$updir/binds/HOME" -B $PWD --pwd "$PWD" "$@" +# +# Prepare for storing bash history in cmd='shell' mode +# +# Will be non-empty if some post-run handling is needed +FINAL_BASH_HISTORY= +TEMP_BASH_HISTORY_LOCAL= +if [ "$cmd" = "shell" ]; then + # should be outside of $tmpdir so we could copy it there before + # trap cleans things up + histstamp=$(git describe --always)-$(date -Iseconds) + TEMP_BASH_HISTORY_LOCAL=$(mktemp -t bash_history.$histstamp.XXXXXXXXX) + TEMP_BASH_HISTORY_FILENAME=$(basename $TEMP_BASH_HISTORY_LOCAL) + TEMP_BASH_HISTORY="$tmpdir/tmp/$TEMP_BASH_HISTORY_FILENAME" + # singularity 2.x seems to mess with HISTFILE - cannot pass through! + if version_gt 3 "$(singularity_version)"; then + error "Can manipulate bash history only with singularity >= 3" + fi + # Expose it to singularity environment + export SINGULARITYENV_HISTFILE="/tmp/$TEMP_BASH_HISTORY_FILENAME" + # We will copy it only if it was clean and new changes emerged + # Handle (save) protocol of interactive sessions + if ! has_changes ; then + # TODO: place at the top of the dataset!? + FINAL_BASH_HISTORY=".repronim/bash_histories/$histstamp" + # TODO: cleanup TEMP_BASH_HISTORY in case of crash? + else + echo "W: uncomitted changes present, 'shell' mode will NOT commit bash history." + echo " You will find stored history at $TEMP_BASH_HISTORY_LOCAL" + fi + if [ "$#" -gt 1 ]; then + error "for 'shell' mode - do not provide any custom command. Got options: $@" + fi + cmd="exec" + args+=(bash) +fi + +# +# The actual invocation +# +singularity "$cmd" -e -c -W "$tmpdir" -H "$updir/binds/HOME" -B "$PWD" --pwd "$PWD" "${args[@]}" + + +# +# Handle possible digital objects to save/be added to be saved +# +if [ ! -z "$FINAL_BASH_HISTORY" ]; then + if ! has_changes ; then + # TODO: someone might want to just record his wonderings around, so + # might be worth an option to force saving history only + echo "I: no changes to the tree detected. Bash history will not be saved." + echo " You will find stored history at $TEMP_BASH_HISTORY_LOCAL" + else + mkdir -p "$(dirname $FINAL_BASH_HISTORY)" + mv "$TEMP_BASH_HISTORY" "$FINAL_BASH_HISTORY" + # due to https://github.com/datalad/datalad/issues/3421 saving entire directory of histories + datalad save \ + -m "[REPRONIM/CONTAINERS]: bash history for the interactive session + +Actual changes might (or not, depending on the invocation) get committed in the next commit" \ + "$(dirname $FINAL_BASH_HISTORY)" + fi +fi + +if [ ! -z "$TEMP_BASH_HISTORY_LOCAL" ] && [ -e "$TEMP_BASH_HISTORY" ]; then + # So we did create it but did not move to be saved, so let's expose locally before it is wiped out + mv "$TEMP_BASH_HISTORY" "$TEMP_BASH_HISTORY_LOCAL" +fi