CI Pipeline which builds & tests the container #4

philschmid · 2024-01-31T14:26:50Z

To make sure our Hugging Face DLC are well tested, we need to create "integration" tests that run different kinds of training using the container. Those tests should be run automatically or on-demand. We can use Github Actions as CI for running the tests and python + docker to implement the integration tests.

Until #3 is implemented, we can use existing Containers from, e.g. transformers to run the tests. For "tests" script, i think we can use existing "examples/" from transformers or peft trl. We could structure the tests/ folder maybe into:

local/ (run on a local machine GPU),
vertex (run on Vertex)
gke (run on GKE)

Example for a test:
0. build a container

starts a container on a GPU
runs a training using the container (few steps)
validates results
stops the container
-> repeat 1-4. with other tests.

In addition to "local" tests running on GPU instances, we should also run validation tests for GKE and Vertex AI.

We need to implement strong CI tests, which run several tests, including training smaller models like BERT and bigger models Like Llama.
- We should test and validate PEFT
- Distributed Training
- Flash attention support
Tests directly running on Vertex AI or GKE using vertex SDK

The text was updated successfully, but these errors were encountered:

philschmid · 2024-01-31T14:41:07Z

For access to GCP you can ask @glegendre01.

philschmid mentioned this issue Jan 31, 2024

[Pytorch][GPU][Training] Initial Release #2

Closed

6 tasks

philschmid added the pytorch Pytorch related Issues label Jan 31, 2024

philschmid changed the title ~~[Pytorch][GPU] CI Pipeline which builds & tests the container~~ CI Pipeline which builds & tests the container Jan 31, 2024

philschmid added GPU GPU related training labels Jan 31, 2024

ydshieh self-assigned this Jan 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI Pipeline which builds & tests the container #4

CI Pipeline which builds & tests the container #4

philschmid commented Jan 31, 2024 •

edited

Loading

philschmid commented Jan 31, 2024

CI Pipeline which builds & tests the container #4

CI Pipeline which builds & tests the container #4

Comments

philschmid commented Jan 31, 2024 • edited Loading

philschmid commented Jan 31, 2024

philschmid commented Jan 31, 2024 •

edited

Loading