Agent-hosted datasets #246

wpbonelli · 2021-12-30T15:09:14Z

As data volumes grow it may become infeasible to store datasets on personal machines or in the cloud. This makes the common model in which data is transferred over the network to/from a processing environment more challenging if not impossible. Researchers may opt to park datasets directly on a cluster or supercomputer.

We should allow a user with access to a given cluster to address and annotate datasets directly on its filesystem.

This raises a few implementation questions:

do we need a periodic task to check for changes?
how to handle removal of files/folders?

We could also consider providing a link to automatically open a Jupyter notebook via SSH tunnel in a given workflow's container environment, rather than submitting as a batch job.

wpbonelli added the enhancement New feature or request label Dec 30, 2021

wpbonelli mentioned this issue Jun 13, 2022

Distinction between home & work directories #190

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent-hosted datasets #246

Agent-hosted datasets #246

wpbonelli commented Dec 30, 2021 •

edited

Loading

Agent-hosted datasets #246

Agent-hosted datasets #246

Comments

wpbonelli commented Dec 30, 2021 • edited Loading

wpbonelli commented Dec 30, 2021 •

edited

Loading