Merge pull request #105 from invoke-ai/replace-pokemon-dataset

Replace the lambdalabs/pokemon-blip-captions dataset
invoke-ai · Apr 12, 2024 · bd5da50 · bd5da50
2 parents cedefe8 + 5ed6f2d
commit bd5da50
Show file tree

Hide file tree

Showing 24 changed files with 250 additions and 292 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,6 @@
-output/
-test_configs/
+/output/
+/test_configs/
+/data/
 
 # pyenv
 .python-version

diff --git a/README.md b/README.md
@@ -44,7 +44,7 @@ pip install -e ".[test]" --extra-index-url https://download.pytorch.org/whl/cu12
 
 Run training via the CLI with type-checked YAML configuration files for maximum control:
 ```bash
-invoke-train --cfg-file src/invoke_training/sample_configs/sd_lora_pokemon_1x8gb.yaml
+invoke-train --cfg-file src/invoke_training/sample_configs/sdxl_textual_inversion_gnome_1x24gb.yaml
 ```
 
 ### GUI
@@ -63,7 +63,7 @@ Training progress can be monitored with [Tensorboard](https://www.tensorflow.org
 All trained models are compatible with InvokeAI:
 
 ![Screenshot of the InvokeAI UI with an example of a Yoda pokemon generated using a Pokemon LoRA model.](docs/images/invokeai_yoda_pokemon_lora.png)
-*Example image generated with the prompt "A cute yoda pokemon creature." and the trained Pokemon LoRA.*
+*Example image generated with the prompt "A cute yoda pokemon creature." and a trained Pokemon LoRA.*
 
 ## Contributing
 

diff --git a/docs/concepts/dataset_formats.md b/docs/concepts/dataset_formats.md
@@ -2,21 +2,15 @@
 
 `invoke-training` supports the following dataset formats:
 
-- `HF_HUB_IMAGE_CAPTION_DATASET`: A Hugging Face Hub dataset containing images and captions.
 - `IMAGE_CAPTION_JSONL_DATASET`: A local image-caption dataset described by a single `.jsonl` file.
 - `IMAGE_CAPTION_DIR_DATASET`: A local directory of images with associated `.txt` caption files.
 - `IMAGE_DIR_DATASET`: A local directory of images (without captions).
+- `HF_HUB_IMAGE_CAPTION_DATASET`: A Hugging Face Hub dataset containing images and captions.
 
 See the documentation for a particular training pipeline to see which dataset formats it supports.
 
 The following sections explain each of these formats in more detail.
 
-## `HF_HUB_IMAGE_CAPTION_DATASET`
-
-Config documentation: [HFHubImageCaptionDatasetConfig][invoke_training.config.data.dataset_config.HFHubImageCaptionDatasetConfig]
-
-The easiest way to get started with `invoke-training` is to use a publicly available dataset on [Hugging Face Hub](https://huggingface.co/datasets). You can filter for the `Text-to-Image` task to find relevant datasets that contain both an image column and a caption column. [lambdalabs/pokemon-blip-captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) is a popular choice if you're not sure where to start.
-
 ## `IMAGE_CAPTION_JSONL_DATASET`
 
 Config documentation: [ImageCaptionJsonlDatasetConfig][invoke_training.config.data.dataset_config.ImageCaptionJsonlDatasetConfig]
@@ -102,3 +96,9 @@ This dataset can be used with the following pipeline dataset configuration:
 type: IMAGE_DIR_DATASET
 dataset_dir: /path/to/my_custom_dataset
 ```
+
+## `HF_HUB_IMAGE_CAPTION_DATASET`
+
+Config documentation: [HFHubImageCaptionDatasetConfig][invoke_training.config.data.dataset_config.HFHubImageCaptionDatasetConfig]
+
+The `HF_HUB_IMAGE_CAPTION_DATASET` dataset format can be used to access publicly datasets on the [Hugging Face Hub](https://huggingface.co/datasets). You can filter for the `Text-to-Image` task to find relevant datasets that contain both an image column and a caption column. [lambdalabs/pokemon-blip-captions](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions) is a popular choice if you're not sure where to start.
diff --git a/docs/get-started/installation.md b/docs/get-started/installation.md
@@ -6,19 +6,25 @@
 2. An NVIDIA GPU with >= 8 GB VRAM is recommended for model training.
 
 ## Basic Installation
+
+0. Open your terminal and navigate to the directory where you want to clone the `invoke-training` repo.
 1. Clone the repo:
 ```bash
 git clone https://github.com/invoke-ai/invoke-training.git
 ```
-2. (*Optional, but highly recommended*) Create and activate a python [virtual environment](https://docs.python.org/3/library/venv.html#creating-virtual-environments). This creates an isolated environment for `invoke-training` and its dependencies that won't interfere with other python environments on your system, including any installations of the [local Invoke client](https://www.github.com/invoke-ai/invokeai).
+2. Create and activate a python [virtual environment](https://docs.python.org/3/library/venv.html#creating-virtual-environments). This creates an isolated environment for `invoke-training` and its dependencies that won't interfere with other python environments on your system, including any installations of [InvokeAI](https://www.github.com/invoke-ai/invokeai).
 ```bash
-# Create the new virtual environment in a memorable location by navigating to the folder and running this command
-python -m venv invoketraining
+# Navigate to the invoke-training directory.
+cd invoke-training
 
-# Activate the new virtual environment
-Windows:  .\invoketraining\Scripts\activate
-Linux: source invoketraining/bin/activate
+# Create a new virtual environment named `invoketraining`.
+python -m venv invoketraining
 
+# Activate the new virtual environment.
+# On Windows:
+.\invoketraining\Scripts\activate
+# On MacOS / Linux:
+source invoketraining/bin/activate
 ```
 3. Install `invoke-training` and its dependencies:
 ```bash
@@ -30,17 +36,22 @@ pip install ".[test]" --extra-index-url https://download.pytorch.org/whl/cu121
 ```
 
 ## Developer Installation
+
 1. Consider forking the repo if you plan to contribute code changes.
 2. `git clone` the repo.
-3. (*Optional, but highly recommended*) Create and activate a python [virtual environment](https://docs.python.org/3/library/venv.html#creating-virtual-environments). This creates an isolated environment for `invoke-training` and its dependencies that won't interfere with other python environments on your system, including any installations of the [local Invoke client](https://www.github.com/invoke-ai/invokeai).
+3. Create and activate a python [virtual environment](https://docs.python.org/3/library/venv.html#creating-virtual-environments). This creates an isolated environment for `invoke-training` and its dependencies that won't interfere with other python environments on your system, including any installations of [InvokeAI](https://www.github.com/invoke-ai/invokeai).
 ```bash
-# Create the new virtual environment in a memorable location by navigating to the folder and running this command
-python -m venv invoketraining
+# Navigate to the invoke-training directory.
+cd invoke-training
 
-# Activate the new virtual environment
-Windows:  .\invoketraining\Scripts\activate
-Linux: source invoketraining/bin/activate
+# Create a new virtual environment named `invoketraining`.
+python -m venv invoketraining
 
+# Activate the new virtual environment.
+# On Windows:
+.\invoketraining\Scripts\activate
+# On MacOS / Linux:
+source invoketraining/bin/activate
 ```
 4. Install `invoke-training` and its dependencies:
 ```bash

diff --git a/docs/get-started/quick-start-cli.md b/docs/get-started/quick-start-cli.md
diff --git a/docs/get-started/quick-start-gui.md → docs/get-started/quick-start.md b/docs/get-started/quick-start-gui.md → docs/get-started/quick-start.md
@@ -1,11 +1,12 @@
-# Quick Start - GUI
+# Quick Start
 
-This page walks through the steps to train your first model with the `invoke-training` GUI.
+`invoke-training` has both a GUI and a CLI (for advanced users). The instructions for getting started with both options can be found on this page.
 
-There is also a [Quick Start - CLI](./quick-start-cli.md) guide.
+There is also a video introduction to `invoke-training`:
 
-## Tutorial
+<iframe width="560" height="315" src="https://www.youtube.com/embed/OZIz2vvtlM4?si=iR73F0IhlsolyYAl" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
 
+## Quick Start - GUI
 ### 1. Installation
 Follow the [`invoke-training` installation instructions](./installation.md).
 
@@ -53,6 +54,10 @@ You can now use your trained Pokemon LoRA in the InvokeAI UI! 🎉
 ![Screenshot of the InvokeAI UI with an example of a Yoda pokemon generated using a Pokemon LoRA model.](../images/invokeai_yoda_pokemon_lora.png)
 *Example image generated with the prompt "A cute yoda pokemon creature." and Pokemon LoRA.*
 
-## Next Steps
 
-After completing this Quick Start tutorial, we recommend continuing with any of the [full training pipeline tutorials](../tutorials/index.md).
+## Quick Start - CLI
+### 1. Installation
+Follow the [`invoke-training` installation instructions](./installation.md).
+
+### 2. Training
+See the [Textual Inversion - SDXL](../tutorials/stable_diffusion/textual_inversion_sdxl.md) tutorial for instructions on how to train a model via the CLI.
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -27,8 +27,7 @@ nav:
 - Welcome: index.md
 - Get Started:
   - get-started/installation.md
-  - get-started/quick-start-cli.md
-  - get-started/quick-start-gui.md
+  - get-started/quick-start.md
 - Tutorials:
   - tutorials/index.md
   - Stable Diffusion:

diff --git a/src/invoke_training/sample_configs/sd_lora_baroque_1x8gb.yaml b/src/invoke_training/sample_configs/sd_lora_baroque_1x8gb.yaml
@@ -0,0 +1,56 @@
+# Training mode: Finetuning with LoRA
+# Base model:    SD 1.5
+# Dataset:       https://huggingface.co/datasets/InvokeAI/nga-baroque
+# GPU:           1 x 8GB
+
+# Instructions:
+# 1. Download the dataset from https://huggingface.co/datasets/InvokeAI/nga-baroque.
+# 2. Update the `jsonl_path` field in the `data_loader` section to point to the `metadata.jsonl` file of the downloaded
+# dataset.
+
+# Notes:
+# This config file has been optimized for the primary goal of achieving reasonable results *quickly* for demo purposes.
+
+type: SD_LORA
+seed: 1
+base_output_dir: output/baroque/sd_lora
+
+optimizer:
+  optimizer_type: Prodigy
+  learning_rate: 1.0
+  weight_decay: 0.01
+  use_bias_correction: True
+  safeguard_warmup: True
+
+data_loader:
+  type: IMAGE_CAPTION_SD_DATA_LOADER
+  dataset:
+    type: IMAGE_CAPTION_JSONL_DATASET
+    # Update the jsonl_path field to point to the metadata.jsonl file of the downloaded dataset.
+    jsonl_path: data/nga-baroque/metadata.jsonl
+  resolution: 512
+  aspect_ratio_buckets:
+    target_resolution: 512
+    start_dim: 256
+    end_dim: 768
+    divisible_by: 64
+  caption_prefix: "A baroque painting of"
+  dataloader_num_workers: 4
+
+# General
+model: runwayml/stable-diffusion-v1-5
+gradient_accumulation_steps: 1
+mixed_precision: fp16
+xformers: False
+gradient_checkpointing: True
+
+max_train_epochs: 15
+save_every_n_epochs: 1
+validate_every_n_epochs: 1
+
+max_checkpoints: 5
+validation_prompts:
+  - A baroque painting of a woman carrying a basket of fruit.
+  - A baroque painting of a cute Yoda creature.
+train_batch_size: 4
+num_validation_images_per_prompt: 3
diff --git a/src/invoke_training/sample_configs/sd_lora_pokemon_1x8gb.yaml b/src/invoke_training/sample_configs/sd_lora_pokemon_1x8gb.yaml
diff --git a/src/invoke_training/sample_configs/sdxl_lora_and_ti_pokemon_1x24gb.yaml b/src/invoke_training/sample_configs/sdxl_lora_and_ti_pokemon_1x24gb.yaml