Skip to content

Commit

Permalink
Merge pull request #999 from datalad-handbook/dvc3
Browse files Browse the repository at this point in the history
Update DVC workflow for DVC v3
  • Loading branch information
adswa authored Oct 6, 2023
2 parents 52a95fb + b5c6523 commit be9905a
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 5 deletions.
11 changes: 7 additions & 4 deletions docs/beyond_basics/101-168-dvc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -584,7 +584,7 @@ The final script, ``src/evaluate.py`` is used to evaluate the trained classifier
There are more detailed insights and explanations of the actual analysis code in the `Tutorial <https://realpython.com/python-data-version-control>`_ if you're interested in finding out more.
For workflow management, DVC has the concept of a "DVC pipeline".
A pipeline consists of multiple stages and is executed using a :shcmd:`dvc run` command.
A pipeline consists of multiple stages, which are set up and executed using a :shcmd:`dvc stage add [--run]` command.
Each stage has three components: "deps", "outs", and "command".
Each of the scripts in the repository will be represented by a stage in the DVC pipeline.
Expand Down Expand Up @@ -615,9 +615,10 @@ The following command sets up the stage:
:language: console
### DVC
$ dvc run -n prepare \
$ dvc stage add -n prepare \
-d src/prepare.py -d data/raw \
-o data/prepared/train.csv -o data/prepared/test.csv \
--run \
python src/prepare.py
The ``-n`` parameter gives the stage a name, the ``-d`` parameter passes the dependencies -- the raw data -- to the command, and the ``-o`` parameter defines the outputs of the command -- the CSV files that ``prepare.py`` will create.
Expand Down Expand Up @@ -662,9 +663,10 @@ The following command sets it up:
:workdir: DVCvsDL/DVC
:language: console
$ dvc run -n train \
$ dvc stage add -n train \
-d src/train.py -d data/prepared/train.csv \
-o model/model.joblib \
--run \
python src/train.py
Afterwards, ``train.py`` has been executed, and the pipelines have been updated with a second stage.
Expand All @@ -684,9 +686,10 @@ The following command sets it up:
:workdir: DVCvsDL/DVC
:language: console
$ dvc run -n evaluate \
$ dvc stage add -n evaluate \
-d src/evaluate.py -d model/model.joblib \
-M metrics/accuracy.json \
--run \
python src/evaluate.py
.. runrecord:: _examples/DL-101-168-158
Expand Down
2 changes: 1 addition & 1 deletion requirements-devel.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ scikit-learn
scikit-image
# https://github.com/mwaskom/seaborn/issues/3192
numpy < 1.24
dvc < 3.0
dvc >= 3.0
datalad-catalog >= 1.0.1

0 comments on commit be9905a

Please sign in to comment.