Marvelous MLOps End-to-end MLOps with Databricks course

Practical information

Weekly lectures on Wednesdays 16:00-18:00 CET.
Code for the lecture is shared before the lecture.
Presentation and lecture materials are shared right after the lecture.
Video of the lecture is uploaded within 24 hours after the lecture.
Every week we set up a deliverable, and you implement it with your own dataset.
To submit the deliverable, create a feature branch in that repository, and a PR to main branch. The code can be merged after we review & approve & CI pipeline runs successfully.
The deliverables can be submitted with a delay (for example, lecture 1 & 2 together), but we expect you to finish all assignments for the course before the 25th of November.

Set up your environment

In this course, we use Databricks 15.4 LTS runtime, which uses Python 3.11. In our examples, we use UV. Check out the documentation on how to install it: https://docs.astral.sh/uv/getting-started/installation/

To create a new environment and create a lockfile, run:

uv venv -p 3.11.0 .venv
source .venv/bin/activate
uv pip install -r pyproject.toml --all-extras
uv lock

Databricks Commands

Authentication

# Authentication
databricks auth login --configure-cluster --host <workspace-url>

# Profiles
databricks auth profiles
cat ~/.databrickscfg

# Root Dir
databricks fs ls dbfs:/

export DATABRICKS_CONFIG_PROFILE=DEFAULT

Catalog Creation

catalog name: credit
schema_name: default
volume name: data

# Create
databricks volumes create maven default data MANAGED

# Push
databricks fs cp data/data.csv dbfs:/Volumes/maven/default/data/data.csv

# Show files
databricks fs ls dbfs:/Volumes/maven/default/data

Package Creation

# Build
uv build

# Create
databricks volumes create maven default packages MANAGED

# Push
databricks fs cp dist/mlops_with_databricks-0.0.1-py3-none-any.whl dbfs:/Volumes/maven/default/packages

# Overwrite Package
databricks fs cp dist/credit_default_databricks-0.0.7-py3-none-any.whl dbfs:/Volumes/maven/default/packages --overwrite

Token Creation

First create a token under Settings --> User --> Developer

# Create Scope
databricks secrets create-scope secret-scope

# Add secret after running command
databricks secrets put-secret secret-scope databricks-token

# List secrets
databricks secrets list-secrets secret-scope

Data

Default of Credit Card Clients Dataset https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset/data

Fifth PR - Branch: inference

Added monitoring folder
Added refresh_monitor.py workflow
Updated bundle file with monitoring

Fourth PR - Branch: bundles

Updated config class with A/B test params
Updated test data cleaning
Updated config file
Token creation
Added workflows preprocessing, train_model, evaluate_model, deploy_model
Added create_source_data notebook
Fixed dependencies in pyproject.toml (v 0.0.11)

Third PR - Branch: serving

Reorganized Notebooks Folders
Changed env var "CONFIG DATABRICKS" and "CODE_PATH"
New Workspace mlops_students (change catalog and schame name)
Pyarrow incompatibility with mlflow/feature lookup. Changed o 14.0.2 in wheel 0.0.9
Added Notebooks feature/model serving

Second PR - Branch: mlflow

Added hatchling
Activated editabel mode: uv pip install -e .
Removed "src" imports
Improved src code, added utils and model training
Added logs to .gitignore
Added training to main
Added .gitattributes
Added mlflow notebooks (base, custom and feature store)
Added Pydantic

First PR - Branch: setup

Corrected README.md ".venv" instead of "venv"
Added README.md Databricks instructions
Added install pre-commit in ci.yml
Create databricks Schema and Volume
Pushed data and package to schema
Created new packages
Created logs file
Added larger size data for pre-commit (upt to 3 MB)
Added pytest, loguru, precommit, imbalanced-learn and ruff in dependencies
Added Makefile

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github		.github
data		data
notebooks		notebooks
src		src
tests		tests
workflows		workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
bundle_monitoring.yml		bundle_monitoring.yml
databricks.yml		databricks.yml
project_config.yml		project_config.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Marvelous MLOps End-to-end MLOps with Databricks course

Practical information

Set up your environment

Databricks Commands

Authentication

Catalog Creation

Package Creation

Token Creation

Data

Fifth PR - Branch: inference

Fourth PR - Branch: bundles

Third PR - Branch: serving

Second PR - Branch: mlflow

First PR - Branch: setup

About

Releases

Packages

Contributors 3

Languages

end-to-end-mlops-databricks/marvelous-databricks-course-benitomartin

Folders and files

Latest commit

History

Repository files navigation

Marvelous MLOps End-to-end MLOps with Databricks course

Practical information

Set up your environment

Databricks Commands

Authentication

Catalog Creation

Package Creation

Token Creation

Data

Fifth PR - Branch: inference

Fourth PR - Branch: bundles

Third PR - Branch: serving

Second PR - Branch: mlflow

First PR - Branch: setup

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages