A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
Adapted to Region Västra Götaland's data science processes.
You can choose which language to use when selecting interpreter. By using our template you can easily create reproducible data science projects to share with your colleagues.
You will get:
- Folder structure
- Documentation
- Tools for reproducibility
- (if selected) Dockerfile with PyTorch + Jupyter installed
- (if selected) Dockerfile with Tensorflow + Jupyter installed
- (if selected) Dockerfile with R + RStudio Server installed
You will NOT get:
- Messy projects
- Things that only work on your machine
- Python 3.5+
- Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter
cookiecutter https://github.com/Vastra-Gotalandsregionen/data-science-template
The directory structure of your new project looks like this:
├── LICENSE
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources (ex. script config files)
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── docs <- Documentation template with hints
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
├── {{ cookiecutter.module_name }} <- Source code for use in this project.
│ ├── __init__.py <- Makes {{ cookiecutter.module_name }} a Python module
│ │
│ ├── data <- Scripts to download or generate data
│ │
│ ├── features <- Scripts to turn raw data into features for modeling
│ │
│ ├── models <- Scripts to train models and then use trained models to make
│ │ predictions
│ │
│ └── visualization <- Scripts to create exploratory and results oriented visualizations
|
├── Dockerfile <- Dockerfile with settings to run scripts in Docker container
├── dvc.yaml <- DVC pipeline; see dvc.org
├── params.yaml <- Parameter values (things like hyperparameters) used by DVC pipeline
├── setup.cfg <- config file with settings for running pylint, flake8 and bandit
└── pytest.ini <- config file with settings for running pytest
If you work at Västra Götalandsregionen, or you simply want to use and develop this template, feel free to make a pull request with your suggested changes.
pip install -r requirements.txt
py.test tests