diff --git a/docs/source/en/_redirects.yml b/docs/source/en/_redirects.yml index b6575a6b02f205..6f4cd152b6c6ff 100644 --- a/docs/source/en/_redirects.yml +++ b/docs/source/en/_redirects.yml @@ -1,3 +1,2 @@ -# Optimizing inference - perf_infer_gpu_many: perf_infer_gpu_one +hpo_trainer: trainer \ No newline at end of file diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index b7db4a66c6265d..941e82653d30c8 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -143,10 +143,22 @@ - sections: - local: performance title: Overview - - local: llm_optims - title: LLM inference optimization + - local: big_models + title: Instantiate a big model - local: quantization title: Quantization + - sections: + - local: perf_infer_cpu + title: CPU inference + - local: perf_infer_gpu_one + title: GPU inference + - local: llm_optims + title: LLM inference optimization + - local: tf_xla + title: XLA Integration for TensorFlow Models + - local: perf_torch_compile + title: Optimize inference using `torch.compile()` + title: Inference - sections: - local: perf_train_gpu_one title: Methods and tools for efficient training on a single GPU @@ -156,6 +168,8 @@ title: Fully Sharded Data Parallel - local: deepspeed title: DeepSpeed + - local: debugging + title: GPU debugging - local: perf_train_cpu title: Efficient training on CPU - local: perf_train_cpu_many @@ -164,25 +178,9 @@ title: Training on TPU with TensorFlow - local: perf_train_special title: PyTorch training on Apple silicon - - local: perf_hardware - title: Custom hardware for training - local: hpo_train title: Hyperparameter Search using Trainer API - title: Efficient training techniques - - sections: - - local: perf_infer_cpu - title: CPU inference - - local: perf_infer_gpu_one - title: GPU inference - title: Optimizing inference - - local: big_models - title: Instantiate a big model - - local: debugging - title: Debugging - - local: tf_xla - title: XLA Integration for TensorFlow Models - - local: perf_torch_compile - title: Optimize inference using `torch.compile()` + title: Training title: Performance and scalability - sections: - local: contributing @@ -223,9 +221,12 @@ title: Model training anatomy - local: llm_tutorial_optimization title: Getting the most out of LLMs + - local: perf_hardware + title: Custom hardware for training title: Conceptual guides - sections: - - sections: + - isExpanded: false + sections: - local: main_classes/agent title: Agents and Tools - local: model_doc/auto @@ -849,7 +850,8 @@ title: Graphormer title: Graph models title: Models - - sections: + - isExpanded: false + sections: - local: internal/modeling_utils title: Custom Layers and Utilities - local: internal/pipelines_utils diff --git a/docs/source/en/hpo_train.md b/docs/source/en/hpo_train.md deleted file mode 100644 index c516c501f88228..00000000000000 --- a/docs/source/en/hpo_train.md +++ /dev/null @@ -1,136 +0,0 @@ - - -# Hyperparameter Search using Trainer API - -🤗 Transformers provides a [`Trainer`] class optimized for training 🤗 Transformers models, making it easier to start training without manually writing your own training loop. The [`Trainer`] provides API for hyperparameter search. This doc shows how to enable it in example. - -## Hyperparameter Search backend - -[`Trainer`] supports four hyperparameter search backends currently: -[optuna](https://optuna.org/), [sigopt](https://sigopt.com/), [raytune](https://docs.ray.io/en/latest/tune/index.html) and [wandb](https://wandb.ai/site/sweeps). - -you should install them before using them as the hyperparameter search backend -```bash -pip install optuna/sigopt/wandb/ray[tune] -``` - -## How to enable Hyperparameter search in example - -Define the hyperparameter search space, different backends need different format. - -For sigopt, see sigopt [object_parameter](https://docs.sigopt.com/ai-module-api-references/api_reference/objects/object_parameter), it's like following: -```py ->>> def sigopt_hp_space(trial): -... return [ -... {"bounds": {"min": 1e-6, "max": 1e-4}, "name": "learning_rate", "type": "double"}, -... { -... "categorical_values": ["16", "32", "64", "128"], -... "name": "per_device_train_batch_size", -... "type": "categorical", -... }, -... ] -``` - -For optuna, see optuna [object_parameter](https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/002_configurations.html#sphx-glr-tutorial-10-key-features-002-configurations-py), it's like following: - -```py ->>> def optuna_hp_space(trial): -... return { -... "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-4, log=True), -... "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32, 64, 128]), -... } -``` - -Optuna provides multi-objective HPO. You can pass `direction` in `hyperparameter_search` and define your own compute_objective to return multiple objective values. The Pareto Front (`List[BestRun]`) will be returned in hyperparameter_search, you should refer to the test case `TrainerHyperParameterMultiObjectOptunaIntegrationTest` in [test_trainer](https://github.com/huggingface/transformers/blob/main/tests/trainer/test_trainer.py). It's like following - -```py ->>> best_trials = trainer.hyperparameter_search( -... direction=["minimize", "maximize"], -... backend="optuna", -... hp_space=optuna_hp_space, -... n_trials=20, -... compute_objective=compute_objective, -... ) -``` - -For raytune, see raytune [object_parameter](https://docs.ray.io/en/latest/tune/api/search_space.html), it's like following: - -```py ->>> def ray_hp_space(trial): -... return { -... "learning_rate": tune.loguniform(1e-6, 1e-4), -... "per_device_train_batch_size": tune.choice([16, 32, 64, 128]), -... } -``` - -For wandb, see wandb [object_parameter](https://docs.wandb.ai/guides/sweeps/configuration), it's like following: - -```py ->>> def wandb_hp_space(trial): -... return { -... "method": "random", -... "metric": {"name": "objective", "goal": "minimize"}, -... "parameters": { -... "learning_rate": {"distribution": "uniform", "min": 1e-6, "max": 1e-4}, -... "per_device_train_batch_size": {"values": [16, 32, 64, 128]}, -... }, -... } -``` - -Define a `model_init` function and pass it to the [`Trainer`], as an example: -```py ->>> def model_init(trial): -... return AutoModelForSequenceClassification.from_pretrained( -... model_args.model_name_or_path, -... from_tf=bool(".ckpt" in model_args.model_name_or_path), -... config=config, -... cache_dir=model_args.cache_dir, -... revision=model_args.model_revision, -... token=True if model_args.use_auth_token else None, -... ) -``` - -Create a [`Trainer`] with your `model_init` function, training arguments, training and test datasets, and evaluation function: - -```py ->>> trainer = Trainer( -... model=None, -... args=training_args, -... train_dataset=small_train_dataset, -... eval_dataset=small_eval_dataset, -... compute_metrics=compute_metrics, -... tokenizer=tokenizer, -... model_init=model_init, -... data_collator=data_collator, -... ) -``` - -Call hyperparameter search, get the best trial parameters, backend could be `"optuna"`/`"sigopt"`/`"wandb"`/`"ray"`. direction can be`"minimize"` or `"maximize"`, which indicates whether to optimize greater or lower objective. - -You could define your own compute_objective function, if not defined, the default compute_objective will be called, and the sum of eval metric like f1 is returned as objective value. - -```py ->>> best_trial = trainer.hyperparameter_search( -... direction="maximize", -... backend="optuna", -... hp_space=optuna_hp_space, -... n_trials=20, -... compute_objective=compute_objective, -... ) -``` - -## Hyperparameter search For DDP finetune -Currently, Hyperparameter search for DDP is enabled for optuna and sigopt. Only the rank-zero process will generate the search trial and pass the argument to other ranks. diff --git a/docs/source/en/trainer.md b/docs/source/en/trainer.md index b69bebd6ea2004..db4eef526c855b 100644 --- a/docs/source/en/trainer.md +++ b/docs/source/en/trainer.md @@ -542,3 +542,138 @@ accelerate launch --num_processes=2 \ ``` Check out the [Launching your Accelerate scripts](https://huggingface.co/docs/accelerate/basic_tutorials/launch) tutorial to learn more about `accelerate_launch` and custom configurations. + +## Hyperparameter search + +> [!TIP] +> Hyperparameter search for Distributed Data Parallel (DDP) is enabled for optuna and sigopt. Only the rank-zero process generates the search trial and passes the argument to other ranks. + +[`Trainer`] supports four hyperparameter search backends: [optuna](https://optuna.org/), [sigopt](https://sigopt.com/), [raytune](https://docs.ray.io/en/latest/tune/index.html) and [wandb](https://wandb.ai/site/sweeps). + +Install the backend you want to use. + +```bash +pip install optuna/sigopt/wandb/ray[tune] +``` + +Each backend requires a different format for defining the hyperparameter search space. + + + + +The hyperparameter search space for sigopt is defined in [parameter object](https://docs.sigopt.com/ai-module-api-references/api_reference/objects/object_parameter). + +```py +>>> def sigopt_hp_space(trial): +... return [ +... {"bounds": {"min": 1e-6, "max": 1e-4}, "name": "learning_rate", "type": "double"}, +... { +... "categorical_values": ["16", "32", "64", "128"], +... "name": "per_device_train_batch_size", +... "type": "categorical", +... }, +... ] +``` + + + + +The hyperparameter search space for optuna is defined in the [optuna.trial.Trial](https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/002_configurations.html#sphx-glr-tutorial-10-key-features-002-configurations-py) class. + +```py +>>> def optuna_hp_space(trial): +... return { +... "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-4, log=True), +... "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32, 64, 128]), +... } +``` + +Optuna supports multi-objective HPO to optimize for multiple objectives. Pass the optimization `direction` in `hyperparameter_search` to specify whether you want to minimize or maximize an objective. Then define your own `compute_objective` to return multiple objective values. + +```py +>>> best_trials = trainer.hyperparameter_search( +... direction=["minimize", "maximize"], +... backend="optuna", +... hp_space=optuna_hp_space, +... n_trials=20, +... compute_objective=compute_objective, +... ) +``` + +A Pareto Front (`List[BestRun]`) is returned in the hyperparameter search. Refer to the test case [`TrainerHyperParameterMultiObjectOptunaIntegrationTest`](https://github.com/huggingface/transformers/blob/d24097e0229485287ff4959258c552168bd898c6/tests/trainer/test_trainer.py#L3670) for more details. + + + + +The hyperparameter search space for Raytune is defined in [Tune Search Space API](https://docs.ray.io/en/latest/tune/api/search_space.html). + +```py +>>> def ray_hp_space(trial): +... return { +... "learning_rate": tune.loguniform(1e-6, 1e-4), +... "per_device_train_batch_size": tune.choice([16, 32, 64, 128]), +... } +``` + + + + +The hyperparameter search space for WandB is defined in a [sweep](https://docs.wandb.ai/guides/sweeps/configuration). + +```py +>>> def wandb_hp_space(trial): +... return { +... "method": "random", +... "metric": {"name": "objective", "goal": "minimize"}, +... "parameters": { +... "learning_rate": {"distribution": "uniform", "min": 1e-6, "max": 1e-4}, +... "per_device_train_batch_size": {"values": [16, 32, 64, 128]}, +... }, +... } +``` + + + + +After configuring the hyperparameter search space, define a `model_init` function and pass it to the [`Trainer`]. + +```py +>>> def model_init(trial): +... return AutoModelForSequenceClassification.from_pretrained( +... model_args.model_name_or_path, +... from_tf=bool(".ckpt" in model_args.model_name_or_path), +... config=config, +... cache_dir=model_args.cache_dir, +... revision=model_args.model_revision, +... token=True if model_args.use_auth_token else None, +... ) +``` + +Create a [`Trainer`] with your `model_init` function, training arguments, training and test datasets, and evaluation function. + +```py +>>> trainer = Trainer( +... model=None, +... args=training_args, +... train_dataset=small_train_dataset, +... eval_dataset=small_eval_dataset, +... compute_metrics=compute_metrics, +... tokenizer=tokenizer, +... model_init=model_init, +... data_collator=data_collator, +... ) +``` + +Call `hyperparameter_search` on [`Trainer`], and set your backend and optimization direction accordingly to minimize or maximize an objective. + +You could define your own `compute_objective` function, otherwise, the default `compute_objective` is called and the sum of an evaluation metric like F1 is returned as the objective value. + +```py +>>> best_trial = trainer.hyperparameter_search( +... direction="maximize", +... backend="optuna", +... hp_space=optuna_hp_space, +... n_trials=20, +... compute_objective=compute_objective, +... ) +```