Skip to content

Commit

Permalink
rearrange
Browse files Browse the repository at this point in the history
  • Loading branch information
stevhliu committed May 21, 2024
1 parent 0df888f commit f2373cf
Show file tree
Hide file tree
Showing 4 changed files with 159 additions and 159 deletions.
3 changes: 1 addition & 2 deletions docs/source/en/_redirects.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
# Optimizing inference

perf_infer_gpu_many: perf_infer_gpu_one
hpo_trainer: trainer
44 changes: 23 additions & 21 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -143,10 +143,22 @@
- sections:
- local: performance
title: Overview
- local: llm_optims
title: LLM inference optimization
- local: big_models
title: Instantiate a big model
- local: quantization
title: Quantization
- sections:
- local: perf_infer_cpu
title: CPU inference
- local: perf_infer_gpu_one
title: GPU inference
- local: llm_optims
title: LLM inference optimization
- local: tf_xla
title: XLA Integration for TensorFlow Models
- local: perf_torch_compile
title: Optimize inference using `torch.compile()`
title: Inference
- sections:
- local: perf_train_gpu_one
title: Methods and tools for efficient training on a single GPU
Expand All @@ -156,6 +168,8 @@
title: Fully Sharded Data Parallel
- local: deepspeed
title: DeepSpeed
- local: debugging
title: GPU debugging
- local: perf_train_cpu
title: Efficient training on CPU
- local: perf_train_cpu_many
Expand All @@ -164,25 +178,9 @@
title: Training on TPU with TensorFlow
- local: perf_train_special
title: PyTorch training on Apple silicon
- local: perf_hardware
title: Custom hardware for training
- local: hpo_train
title: Hyperparameter Search using Trainer API
title: Efficient training techniques
- sections:
- local: perf_infer_cpu
title: CPU inference
- local: perf_infer_gpu_one
title: GPU inference
title: Optimizing inference
- local: big_models
title: Instantiate a big model
- local: debugging
title: Debugging
- local: tf_xla
title: XLA Integration for TensorFlow Models
- local: perf_torch_compile
title: Optimize inference using `torch.compile()`
title: Training
title: Performance and scalability
- sections:
- local: contributing
Expand Down Expand Up @@ -223,9 +221,12 @@
title: Model training anatomy
- local: llm_tutorial_optimization
title: Getting the most out of LLMs
- local: perf_hardware
title: Custom hardware for training
title: Conceptual guides
- sections:
- sections:
- isExpanded: false
sections:
- local: main_classes/agent
title: Agents and Tools
- local: model_doc/auto
Expand Down Expand Up @@ -849,7 +850,8 @@
title: Graphormer
title: Graph models
title: Models
- sections:
- isExpanded: false
sections:
- local: internal/modeling_utils
title: Custom Layers and Utilities
- local: internal/pipelines_utils
Expand Down
136 changes: 0 additions & 136 deletions docs/source/en/hpo_train.md

This file was deleted.

135 changes: 135 additions & 0 deletions docs/source/en/trainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -542,3 +542,138 @@ accelerate launch --num_processes=2 \
```

Check out the [Launching your Accelerate scripts](https://huggingface.co/docs/accelerate/basic_tutorials/launch) tutorial to learn more about `accelerate_launch` and custom configurations.

## Hyperparameter search

> [!TIP]
> Hyperparameter search for Distributed Data Parallel (DDP) is enabled for optuna and sigopt. Only the rank-zero process generates the search trial and passes the argument to other ranks.

[`Trainer`] supports four hyperparameter search backends: [optuna](https://optuna.org/), [sigopt](https://sigopt.com/), [raytune](https://docs.ray.io/en/latest/tune/index.html) and [wandb](https://wandb.ai/site/sweeps).

Install the backend you want to use.

```bash
pip install optuna/sigopt/wandb/ray[tune]
```

Each backend requires a different format for defining the hyperparameter search space.

<hfoptions id="backend">
<hfoption id="sigopt">

The hyperparameter search space for sigopt is defined in [parameter object](https://docs.sigopt.com/ai-module-api-references/api_reference/objects/object_parameter).

```py
>>> def sigopt_hp_space(trial):
... return [
... {"bounds": {"min": 1e-6, "max": 1e-4}, "name": "learning_rate", "type": "double"},
... {
... "categorical_values": ["16", "32", "64", "128"],
... "name": "per_device_train_batch_size",
... "type": "categorical",
... },
... ]
```

</hfoption>
<hfoption id="optuna">

The hyperparameter search space for optuna is defined in the [optuna.trial.Trial](https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/002_configurations.html#sphx-glr-tutorial-10-key-features-002-configurations-py) class.

```py
>>> def optuna_hp_space(trial):
... return {
... "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-4, log=True),
... "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32, 64, 128]),
... }
```

Optuna supports multi-objective HPO to optimize for multiple objectives. Pass the optimization `direction` in `hyperparameter_search` to specify whether you want to minimize or maximize an objective. Then define your own `compute_objective` to return multiple objective values.

```py
>>> best_trials = trainer.hyperparameter_search(
... direction=["minimize", "maximize"],
... backend="optuna",
... hp_space=optuna_hp_space,
... n_trials=20,
... compute_objective=compute_objective,
... )
```

A Pareto Front (`List[BestRun]`) is returned in the hyperparameter search. Refer to the test case [`TrainerHyperParameterMultiObjectOptunaIntegrationTest`](https://github.com/huggingface/transformers/blob/d24097e0229485287ff4959258c552168bd898c6/tests/trainer/test_trainer.py#L3670) for more details.

</hfoption>
<hfoption id="Raytune">

The hyperparameter search space for Raytune is defined in [Tune Search Space API](https://docs.ray.io/en/latest/tune/api/search_space.html).

```py
>>> def ray_hp_space(trial):
... return {
... "learning_rate": tune.loguniform(1e-6, 1e-4),
... "per_device_train_batch_size": tune.choice([16, 32, 64, 128]),
... }
```

</hfoption>
<hfoption id="WandB">

The hyperparameter search space for WandB is defined in a [sweep](https://docs.wandb.ai/guides/sweeps/configuration).

```py
>>> def wandb_hp_space(trial):
... return {
... "method": "random",
... "metric": {"name": "objective", "goal": "minimize"},
... "parameters": {
... "learning_rate": {"distribution": "uniform", "min": 1e-6, "max": 1e-4},
... "per_device_train_batch_size": {"values": [16, 32, 64, 128]},
... },
... }
```

</hfoption>
</hfoptions>

After configuring the hyperparameter search space, define a `model_init` function and pass it to the [`Trainer`].

```py
>>> def model_init(trial):
... return AutoModelForSequenceClassification.from_pretrained(
... model_args.model_name_or_path,
... from_tf=bool(".ckpt" in model_args.model_name_or_path),
... config=config,
... cache_dir=model_args.cache_dir,
... revision=model_args.model_revision,
... token=True if model_args.use_auth_token else None,
... )
```

Create a [`Trainer`] with your `model_init` function, training arguments, training and test datasets, and evaluation function.

```py
>>> trainer = Trainer(
... model=None,
... args=training_args,
... train_dataset=small_train_dataset,
... eval_dataset=small_eval_dataset,
... compute_metrics=compute_metrics,
... tokenizer=tokenizer,
... model_init=model_init,
... data_collator=data_collator,
... )
```

Call `hyperparameter_search` on [`Trainer`], and set your backend and optimization direction accordingly to minimize or maximize an objective.

You could define your own `compute_objective` function, otherwise, the default `compute_objective` is called and the sum of an evaluation metric like F1 is returned as the objective value.

```py
>>> best_trial = trainer.hyperparameter_search(
... direction="maximize",
... backend="optuna",
... hp_space=optuna_hp_space,
... n_trials=20,
... compute_objective=compute_objective,
... )
```

0 comments on commit f2373cf

Please sign in to comment.