-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: model shovel more closely on git #20
Comments
It looks like git attributes |
You would still need a non-git interface, since you may have a project which isn't version-controlled. Usually, this won't be the case, but it'll happen sometimes. E.g. you want to use shovel to fetch some data for a quick analysis. So then you have two interfaces? Perhaps I'm misunderstanding. |
That makes sense. I was imagining shovel would stay the same, but it would be possible to set it up with hooks so the dig and bury commands are called for you. Unlike LFS, which tries to make it look like the files are in the repo, this would make it clear they are in a pit. So, in addition to what exists already, inside a git repo: cd data
shovel init . # adds this dir to maybe repo-root/.shovel so the hooks know which directories are under shovel control
git add . # clean calculates the MD% of the file and writes the interesting data into a .shovel file, for example
git commit # If shovel has a local cache (which it may in the future), the files are copied there with the MD5 as the key by a pre-commit hook
git push # the pre-push hook ensures the files have been uploaded to the S3 pit Or something. Probably worth getting a lot of inspiration from LFS. |
The problem to solve here is I currently add |
Git LFS has some nice properties, but doesn't really map well to large datasets used for analysis. A git model of checking in all resources is good for reproducibility, but it is nice to separate the data from the code.
A proposed future direction for shovel is to support
shovel <git commands
where shovel intercepts some commands and swaps a bunch of behaviours out. These can likely be done using git hooks, so it may be possible to init those and then use git directly.One benefit of the shovel model over LFS is that it lets you version datasets separate from a git repo and share them across multiple. In that sense, the git hooks would need to inspect the state of the filesystem and manage the dig and bury steps of shovel as part of the hooks.
These thoughts are very undeveloped.
The text was updated successfully, but these errors were encountered: