Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF: SINGULARITY_CMD=shell to record (bash) history+result of interactive sessions #9

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented May 15, 2019

khe khe, description (commit msg) came out probably longer than the changes ;) Not there is that besides seeing myself using it to keep a better log of actions where I can't yet use datalad run, I think it would be great for some newbees.

Attn @kyleam, @mih

Related

This is a prototype for functionality which might be of interest
outside of this project, e.g. related:

  • regular datalad run to record activities in the shell.

    so here I am "implementing" it, solely for containerized environments ATM,
    via a "over the head" communication to the shim in environment variable

  • datalad run for better record keeping, e.g.

    so here I was not bothering to establish stdout/err capture but possibly
    could and might

  • reproman login, or even execute (with or without --trace) and may be run
    where we could benefit from having an environment with a unified interface
    for interactive sessions which would also establish the record of activities

  • just a regular shell environment to make a clear record of commands which were ran

  • might eventually absorb/meld with the "opinionated .bashrc"
    proposed for the training curiculum:
    [ENH] add .repronim.bashrc module-reproducible-basics#26
    which provides assistance/docs for more efficient use of cmdline
    and establishes 'infinite bash history'.

reproshell???

So it feels to me like a motivation for some kind of a reproshell independent
project which would be

  • usable independently and easily installable/bindable (e.g. into a container)
  • parametrizeable to be invoked from the shim here and/or by datalad or reproman
    so could just take care about capturing all sidecar files into specified
    locations

Now there is https://github.com/ReproNim/reproshell

Could benefit from

  • knowing more about "datalad (containers-)run" invocation

Implemented now within singularity_run shim, which could have benefited
from having additional information about how exactly it was ran and
also to instruct datalad run "upstairs" that there is now an additional file in
extra_outputs.
Hence there is datalad/datalad#3422

Interactivity creates ambiguity for rerun semantic:

  • run record ATM would say "reinvoke interactive session" which might be
    desireable on its own (e.g. to redo something manually in that original
    container)

  • but for "automated reproducibility" we do have all information (bash history
    file, which is a list of commands to run) possibly recorded in another
    commit, which is ATM is not associated with the "run" record

So may be with somehow tagging run
commits
it could be possible
to disambiguate/select specific run commits/records?

**Example**
(dev) 1 13348.....................................:Wed 15 May 2019 06:12:24 PM EDT:.
(git-annex)hopa:~/proj/repronim/containers[enh-shell]git-annex
$> SINGULARITY_CMD=shell datalad containers-run -n repronim-reproin
[INFO   ] Making sure inputs are available (this may take some time)
[INFO   ] == Command start (output follows) =====
<ome/yoh/proj/repronim/containers$ echo "I will do something useful today"
I will do something useful today
singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers$ touch my-results
singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers$ cd images/
singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers/images$ ls
bids  README.md  repronim
singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers/images$ cd ../
singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers$ ls
binds  images  LICENSE	my-results  README.md  scripts
<pa:/home/yoh/proj/repronim/containers$ rm LICENSE ; echo 'nobody needs those'
nobody needs those
singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers$ exit
add(ok): .repronim/bash_histories/0.1-3-ge25c927-2019-05-15T18:12:37-04:00 (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)
[INFO   ] == Command exit (modification check follows) =====
delete(ok): LICENSE (file)
add(ok): my-results (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  delete (ok: 1)
  get (notneeded: 1)
  save (ok: 1)
SINGULARITY_CMD=shell datalad containers-run -n repronim-reproin  3.42s user 1.74s system 9% cpu 54.068 total

$> git log --stat HEAD^^..
commit 89fed08617418e5ddb88ae11ee2c14db699acf31 (HEAD -> enh-shell)
Author: Yaroslav Halchenko <[email protected]>
Date:   Wed May 15 18:13:28 2019 -0400

	[DATALAD RUNCMD] ./scripts/singularity_cmd run images/rep...

	=== Do not change lines below ===
	{
	 "chain": [],
	 "cmd": "./scripts/singularity_cmd run images/repronim/repronim-reproin--0.5.4.sing ",
	 "dsid": "b02e63c2-62c1-11e9-82b0-52540040489c",
	 "exit": 0,
	 "extra_inputs": [],
	 "inputs": [
	  "images/repronim/repronim-reproin--0.5.4.sing"
	 ],
	 "outputs": [],
	 "pwd": "."
	}
	^^^ Do not change lines above ^^^

 LICENSE    | 201 ---------------------------------------------------------------------------------------------
 my-results |   1 +
 2 files changed, 1 insertion(+), 201 deletions(-)

commit 5aa3b3383c2746f7c1d07ecdcc73852eb0a30f17
Author: Yaroslav Halchenko <[email protected]>
Date:   Wed May 15 18:13:28 2019 -0400

	[REPRONIM/CONTAINERS]: bash history for the interactive session

	Actual changes might (or not, depending on the invocation) get committed in the next commit

 .repronim/bash_histories/0.1-3-ge25c927-2019-05-15T18:12:37-04:00 | 7 +++++++
 1 file changed, 7 insertions(+)

$> cat .repronim/bash_histories/0.1-3-ge25c927-2019-05-15T18:12:37-04:00
echo "I will do something useful today"
touch my-results
cd images/
ls
cd ../
ls
rm LICENSE ; echo 'nobody needs those'

Additional possible features which might come here into a prototype

  • color info/error messages from the shim
  • improve PS1 (probably multiline -- too much in a single line to still be
    able edit commands)
  • indicate being reproman --trace'ed
  • provide 'reactive' PS1 to alert user when he/she leaves the initial directory
    (thus the one outside of original dataset), possibly resulting in outputs which
    would not be recorded
  • alternative (or complementary) to above -- overload cd so leaving outside of the dataset boundary would be harder. We could interactively ask (and thus set a flag for PS1) user first if it is really desired!
  • consider making our own "run" commit before leaving. This way we could
    • in the first commit establish the script and commit it
    • in the second (somehow, would need a switch to run) invoke datalad run within the modified tree (we could analyze/provide --output then) stating to pretend running the bash history script ;-)
      If it was done from 'outside', could be done easily by rewriting the last commit and adjusting the run record (evil).

…ive sessions

**Related**

This is a prototype for functionality which might be of interest
outside of this project, e.g. related:

- regular `datalad run` to record activities in the shell.

  - [`run --interactive`](datalad/datalad#2158 (comment))
  - [`run --shell`](datalad/datalad#2275)

  so here I am "implementing" it, solely for containerized environments ATM,
  via a "over the head" communication to the shim in environment variable

- `datalad run` for better record keeping, e.g.

  - [saving stdout/err](datalad/datalad#3385)

  so here I was not bothering to establish stdout/err capture but possibly
  could and might

- `reproman login`, or even `execute` (with or without --trace) and may be `run`
  where we could benefit from having an environment with a unified interface
  for interactive sessions which would also establish the record of activities

- just a regular shell environment to make a clear record of commands which were ran

- might eventually absorb/meld with the "opinionated .bashrc"
  proposed for the training curiculum:
  ReproNim/module-reproducible-basics#26
  which provides assistance/docs for more efficient use of cmdline
  and establishes 'infinite bash history'.

**reproshell???**

So it feels to me like a motivation for some kind of a  reproshell  independent
project which would be

- usable indepdendently and easily installable/bindable (e.g. into a container)
- parametrizeable to be invoked from the shim here and/or by datalad or reproman
  so could just take care about capturing all sidecar files into specified
  locations

**Could benefit from**

- knowing more about "datalad (containers-)run" invocation

Implemented now within `singularity_run` shim, which could have benefited
from having additional information about how exactly it was `ran` and
also to instruct datalad run "upstairs" that there is now an additional file in
[extra_outputs](datalad/datalad#3094).
Hence there is datalad/datalad#3422

- [`datalad run` being able to 'cover' multiple commits](datalad/datalad#3265)

Interactivity creates ambiguity for `rerun` semantic:

- run record ATM would say "reinvoke interactive session" which might be
  desireable on its own (e.g. to redo something manually in that original
  container)

- but for "automated reproducibility" we do have all information (bash history
  file, which is a list of commands to run) possibly recorded in another
  commit, which is ATM is not associated with the "run" record

So may be with somehow [tagging run
commits](datalad/datalad#3371) it could be possible
to disambiguate/select specific run commits/records?

<details>
<summary>**Example**</summary>

	(dev) 1 13348.....................................:Wed 15 May 2019 06:12:24 PM EDT:.
	(git-annex)hopa:~/proj/repronim/containers[enh-shell]git-annex
	$> SINGULARITY_CMD=shell datalad containers-run -n repronim-reproin
	[INFO   ] Making sure inputs are available (this may take some time)
	[INFO   ] == Command start (output follows) =====
	<ome/yoh/proj/repronim/containers$ echo "I will do something useful today"
	I will do something useful today
	singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers$ touch my-results
	singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers$ cd images/
	singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers/images$ ls
	bids  README.md  repronim
	singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers/images$ cd ../
	singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers$ ls
	binds  images  LICENSE	my-results  README.md  scripts
	<pa:/home/yoh/proj/repronim/containers$ rm LICENSE ; echo 'nobody needs those'
	nobody needs those
	singularity:repronim-reproin > yoh@hopa:/home/yoh/proj/repronim/containers$ exit
	add(ok): .repronim/bash_histories/0.1-3-ge25c927-2019-05-15T18:12:37-04:00 (file)
	save(ok): . (dataset)
	action summary:
	  add (ok: 1)
	  save (ok: 1)
	[INFO   ] == Command exit (modification check follows) =====
	delete(ok): LICENSE (file)
	add(ok): my-results (file)
	save(ok): . (dataset)
	action summary:
	  add (ok: 1)
	  delete (ok: 1)
	  get (notneeded: 1)
	  save (ok: 1)
	SINGULARITY_CMD=shell datalad containers-run -n repronim-reproin  3.42s user 1.74s system 9% cpu 54.068 total

	$> git log --stat HEAD^^..
	commit 89fed08617418e5ddb88ae11ee2c14db699acf31 (HEAD -> enh-shell)
	Author: Yaroslav Halchenko <[email protected]>
	Date:   Wed May 15 18:13:28 2019 -0400

		[DATALAD RUNCMD] ./scripts/singularity_cmd run images/rep...

		=== Do not change lines below ===
		{
		 "chain": [],
		 "cmd": "./scripts/singularity_cmd run images/repronim/repronim-reproin--0.5.4.sing ",
		 "dsid": "b02e63c2-62c1-11e9-82b0-52540040489c",
		 "exit": 0,
		 "extra_inputs": [],
		 "inputs": [
		  "images/repronim/repronim-reproin--0.5.4.sing"
		 ],
		 "outputs": [],
		 "pwd": "."
		}
		^^^ Do not change lines above ^^^

	 LICENSE    | 201 ---------------------------------------------------------------------------------------------
	 my-results |   1 +
	 2 files changed, 1 insertion(+), 201 deletions(-)

	commit 5aa3b3383c2746f7c1d07ecdcc73852eb0a30f17
	Author: Yaroslav Halchenko <[email protected]>
	Date:   Wed May 15 18:13:28 2019 -0400

		[REPRONIM/CONTAINERS]: bash history for the interactive session

		Actual changes might (or not, depending on the invocation) get committed in the next commit

	 .repronim/bash_histories/0.1-3-ge25c927-2019-05-15T18:12:37-04:00 | 7 +++++++
	 1 file changed, 7 insertions(+)

	$> cat .repronim/bash_histories/0.1-3-ge25c927-2019-05-15T18:12:37-04:00
	echo "I will do something useful today"
	touch my-results
	cd images/
	ls
	cd ../
	ls
	rm LICENSE ; echo 'nobody needs those'

</details>

**Additional possible features which might come here into a prototype**

- color info/error messages from the shim
- improve PS1 (probably multiline -- too much in a single line to still be
  able edit commands)
- indicate being [reproman --trace](ReproNim/reproman#416
- provide 'reactive' PS1 to alert user when he/she leaves the initial directory
  (thus the one outside of original dataset), possibly resulting in outputs which
  would not be recorded
@yarikoptic
Copy link
Member Author

sure thing tastes could differ and might be good for light backgrounds but here is how it would look like ATM:
Screenshot from 2019-05-15 18-41-35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant