diff --git a/docs/source/en/_redirects.yml b/docs/source/en/_redirects.yml
index b6575a6b02f205..6f4cd152b6c6ff 100644
--- a/docs/source/en/_redirects.yml
+++ b/docs/source/en/_redirects.yml
@@ -1,3 +1,2 @@
-# Optimizing inference
-
 perf_infer_gpu_many: perf_infer_gpu_one
+hpo_trainer: trainer
\ No newline at end of file
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
index b7db4a66c6265d..941e82653d30c8 100644
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -143,10 +143,22 @@
 - sections:
   - local: performance
     title: Overview
-  - local: llm_optims
-    title: LLM inference optimization
+  - local: big_models
+    title: Instantiate a big model
   - local: quantization
     title: Quantization
+  - sections:
+    - local: perf_infer_cpu
+      title: CPU inference
+    - local: perf_infer_gpu_one
+      title: GPU inference
+    - local: llm_optims
+      title: LLM inference optimization
+    - local: tf_xla
+      title: XLA Integration for TensorFlow Models
+    - local: perf_torch_compile
+      title: Optimize inference using `torch.compile()`
+    title: Inference
   - sections:
     - local: perf_train_gpu_one
       title: Methods and tools for efficient training on a single GPU
@@ -156,6 +168,8 @@
       title: Fully Sharded Data Parallel
     - local: deepspeed
       title: DeepSpeed
+    - local: debugging
+      title: GPU debugging
     - local: perf_train_cpu
       title: Efficient training on CPU
     - local: perf_train_cpu_many
@@ -164,25 +178,9 @@
       title: Training on TPU with TensorFlow
     - local: perf_train_special
       title: PyTorch training on Apple silicon
-    - local: perf_hardware
-      title: Custom hardware for training
     - local: hpo_train
       title: Hyperparameter Search using Trainer API
-    title: Efficient training techniques
-  - sections:
-    - local: perf_infer_cpu
-      title: CPU inference
-    - local: perf_infer_gpu_one
-      title: GPU inference
-    title: Optimizing inference
-  - local: big_models
-    title: Instantiate a big model
-  - local: debugging
-    title: Debugging
-  - local: tf_xla
-    title: XLA Integration for TensorFlow Models
-  - local: perf_torch_compile
-    title: Optimize inference using `torch.compile()`
+    title: Training
   title: Performance and scalability
 - sections:
   - local: contributing
@@ -223,9 +221,12 @@
     title: Model training anatomy
   - local: llm_tutorial_optimization
     title: Getting the most out of LLMs
+  - local: perf_hardware
+    title: Custom hardware for training
   title: Conceptual guides
 - sections:
-  - sections:
+  - isExpanded: false
+    sections:
     - local: main_classes/agent
       title: Agents and Tools
     - local: model_doc/auto
@@ -849,7 +850,8 @@
         title: Graphormer
       title: Graph models
     title: Models
-  - sections:
+  - isExpanded: false
+    sections:
     - local: internal/modeling_utils
       title: Custom Layers and Utilities
     - local: internal/pipelines_utils
diff --git a/docs/source/en/hpo_train.md b/docs/source/en/hpo_train.md
deleted file mode 100644
index c516c501f88228..00000000000000
--- a/docs/source/en/hpo_train.md
+++ /dev/null
@@ -1,136 +0,0 @@
-<!--Copyright 2022 The HuggingFace Team. All rights reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
-the License. You may obtain a copy of the License at
-
-http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
-an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-
-⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
-rendered properly in your Markdown viewer.
-
--->
-
-# Hyperparameter Search using Trainer API
-
-🤗 Transformers provides a [`Trainer`] class optimized for training 🤗 Transformers models, making it easier to start training without manually writing your own training loop. The [`Trainer`] provides API for hyperparameter search. This doc shows how to enable it in example. 
-
-## Hyperparameter Search backend
-
-[`Trainer`] supports four hyperparameter search backends currently:
-[optuna](https://optuna.org/), [sigopt](https://sigopt.com/), [raytune](https://docs.ray.io/en/latest/tune/index.html) and [wandb](https://wandb.ai/site/sweeps).
-
-you should install them before using them as the hyperparameter search backend
-```bash
-pip install optuna/sigopt/wandb/ray[tune] 
-```
-
-## How to enable Hyperparameter search in example
-
-Define the hyperparameter search space, different backends need different format.
-
-For sigopt, see sigopt [object_parameter](https://docs.sigopt.com/ai-module-api-references/api_reference/objects/object_parameter), it's like following:
-```py
->>> def sigopt_hp_space(trial):
-...     return [
-...         {"bounds": {"min": 1e-6, "max": 1e-4}, "name": "learning_rate", "type": "double"},
-...         {
-...             "categorical_values": ["16", "32", "64", "128"],
-...             "name": "per_device_train_batch_size",
-...             "type": "categorical",
-...         },
-...     ]
-```
-
-For optuna, see optuna [object_parameter](https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/002_configurations.html#sphx-glr-tutorial-10-key-features-002-configurations-py), it's like following:
-
-```py
->>> def optuna_hp_space(trial):
-...     return {
-...         "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-4, log=True),
-...         "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32, 64, 128]),
-...     }
-```
-
-Optuna provides multi-objective HPO. You can pass `direction` in `hyperparameter_search` and define your own compute_objective to return multiple objective values. The Pareto Front (`List[BestRun]`) will be returned in hyperparameter_search, you should refer to the test case `TrainerHyperParameterMultiObjectOptunaIntegrationTest` in [test_trainer](https://github.com/huggingface/transformers/blob/main/tests/trainer/test_trainer.py). It's like following
-
-```py
->>> best_trials = trainer.hyperparameter_search(
-...     direction=["minimize", "maximize"],
-...     backend="optuna",
-...     hp_space=optuna_hp_space,
-...     n_trials=20,
-...     compute_objective=compute_objective,
-... )
-```
-
-For raytune, see raytune [object_parameter](https://docs.ray.io/en/latest/tune/api/search_space.html), it's like following:
-
-```py
->>> def ray_hp_space(trial):
-...     return {
-...         "learning_rate": tune.loguniform(1e-6, 1e-4),
-...         "per_device_train_batch_size": tune.choice([16, 32, 64, 128]),
-...     }
-```
-
-For wandb, see wandb [object_parameter](https://docs.wandb.ai/guides/sweeps/configuration), it's like following:
-
-```py
->>> def wandb_hp_space(trial):
-...     return {
-...         "method": "random",
-...         "metric": {"name": "objective", "goal": "minimize"},
-...         "parameters": {
-...             "learning_rate": {"distribution": "uniform", "min": 1e-6, "max": 1e-4},
-...             "per_device_train_batch_size": {"values": [16, 32, 64, 128]},
-...         },
-...     }
-```
-
-Define a `model_init` function and pass it to the [`Trainer`], as an example:
-```py
->>> def model_init(trial):
-...     return AutoModelForSequenceClassification.from_pretrained(
-...         model_args.model_name_or_path,
-...         from_tf=bool(".ckpt" in model_args.model_name_or_path),
-...         config=config,
-...         cache_dir=model_args.cache_dir,
-...         revision=model_args.model_revision,
-...         token=True if model_args.use_auth_token else None,
-...     )
-```
-
-Create a [`Trainer`] with your `model_init` function, training arguments, training and test datasets, and evaluation function:
-
-```py
->>> trainer = Trainer(
-...     model=None,
-...     args=training_args,
-...     train_dataset=small_train_dataset,
-...     eval_dataset=small_eval_dataset,
-...     compute_metrics=compute_metrics,
-...     tokenizer=tokenizer,
-...     model_init=model_init,
-...     data_collator=data_collator,
-... )
-```
-
-Call hyperparameter search, get the best trial parameters, backend could be `"optuna"`/`"sigopt"`/`"wandb"`/`"ray"`. direction can be`"minimize"` or `"maximize"`, which indicates whether to optimize greater or lower objective.
-
-You could define your own compute_objective function, if not defined, the default compute_objective will be called, and the sum of eval metric like f1 is returned as objective value.
-
-```py
->>> best_trial = trainer.hyperparameter_search(
-...     direction="maximize",
-...     backend="optuna",
-...     hp_space=optuna_hp_space,
-...     n_trials=20,
-...     compute_objective=compute_objective,
-... )
-```
-
-## Hyperparameter search For DDP finetune
-Currently, Hyperparameter search for DDP is enabled for optuna and sigopt. Only the rank-zero process will generate the search trial and pass the argument to other ranks.
diff --git a/docs/source/en/trainer.md b/docs/source/en/trainer.md
index b69bebd6ea2004..db4eef526c855b 100644
--- a/docs/source/en/trainer.md
+++ b/docs/source/en/trainer.md
@@ -542,3 +542,138 @@ accelerate launch --num_processes=2 \
 ```
 
 Check out the [Launching your Accelerate scripts](https://huggingface.co/docs/accelerate/basic_tutorials/launch) tutorial to learn more about `accelerate_launch` and custom configurations.
+
+## Hyperparameter search
+
+> [!TIP]
+> Hyperparameter search for Distributed Data Parallel (DDP) is enabled for optuna and sigopt. Only the rank-zero process generates the search trial and passes the argument to other ranks.
+
+[`Trainer`] supports four hyperparameter search backends: [optuna](https://optuna.org/), [sigopt](https://sigopt.com/), [raytune](https://docs.ray.io/en/latest/tune/index.html) and [wandb](https://wandb.ai/site/sweeps).
+
+Install the backend you want to use.
+
+```bash
+pip install optuna/sigopt/wandb/ray[tune] 
+```
+
+Each backend requires a different format for defining the hyperparameter search space.
+
+<hfoptions id="backend">
+<hfoption id="sigopt">
+
+The hyperparameter search space for sigopt is defined in [parameter object](https://docs.sigopt.com/ai-module-api-references/api_reference/objects/object_parameter).
+
+```py
+>>> def sigopt_hp_space(trial):
+...     return [
+...         {"bounds": {"min": 1e-6, "max": 1e-4}, "name": "learning_rate", "type": "double"},
+...         {
+...             "categorical_values": ["16", "32", "64", "128"],
+...             "name": "per_device_train_batch_size",
+...             "type": "categorical",
+...         },
+...     ]
+```
+
+</hfoption>
+<hfoption id="optuna">
+
+The hyperparameter search space for optuna is defined in the [optuna.trial.Trial](https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/002_configurations.html#sphx-glr-tutorial-10-key-features-002-configurations-py) class.
+
+```py
+>>> def optuna_hp_space(trial):
+...     return {
+...         "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-4, log=True),
+...         "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [16, 32, 64, 128]),
+...     }
+```
+
+Optuna supports multi-objective HPO to optimize for multiple objectives. Pass the optimization `direction` in `hyperparameter_search` to specify whether you want to minimize or maximize an objective. Then define your own `compute_objective` to return multiple objective values.
+
+```py
+>>> best_trials = trainer.hyperparameter_search(
+...     direction=["minimize", "maximize"],
+...     backend="optuna",
+...     hp_space=optuna_hp_space,
+...     n_trials=20,
+...     compute_objective=compute_objective,
+... )
+```
+
+A Pareto Front (`List[BestRun]`) is returned in the hyperparameter search. Refer to the test case [`TrainerHyperParameterMultiObjectOptunaIntegrationTest`](https://github.com/huggingface/transformers/blob/d24097e0229485287ff4959258c552168bd898c6/tests/trainer/test_trainer.py#L3670) for more details.
+
+</hfoption>
+<hfoption id="Raytune">
+
+The hyperparameter search space for Raytune is defined in [Tune Search Space API](https://docs.ray.io/en/latest/tune/api/search_space.html).
+
+```py
+>>> def ray_hp_space(trial):
+...     return {
+...         "learning_rate": tune.loguniform(1e-6, 1e-4),
+...         "per_device_train_batch_size": tune.choice([16, 32, 64, 128]),
+...     }
+```
+
+</hfoption>
+<hfoption id="WandB">
+
+The hyperparameter search space for WandB is defined in a [sweep](https://docs.wandb.ai/guides/sweeps/configuration).
+
+```py
+>>> def wandb_hp_space(trial):
+...     return {
+...         "method": "random",
+...         "metric": {"name": "objective", "goal": "minimize"},
+...         "parameters": {
+...             "learning_rate": {"distribution": "uniform", "min": 1e-6, "max": 1e-4},
+...             "per_device_train_batch_size": {"values": [16, 32, 64, 128]},
+...         },
+...     }
+```
+
+</hfoption>
+</hfoptions>
+
+After configuring the hyperparameter search space, define a `model_init` function and pass it to the [`Trainer`].
+
+```py
+>>> def model_init(trial):
+...     return AutoModelForSequenceClassification.from_pretrained(
+...         model_args.model_name_or_path,
+...         from_tf=bool(".ckpt" in model_args.model_name_or_path),
+...         config=config,
+...         cache_dir=model_args.cache_dir,
+...         revision=model_args.model_revision,
+...         token=True if model_args.use_auth_token else None,
+...     )
+```
+
+Create a [`Trainer`] with your `model_init` function, training arguments, training and test datasets, and evaluation function.
+
+```py
+>>> trainer = Trainer(
+...     model=None,
+...     args=training_args,
+...     train_dataset=small_train_dataset,
+...     eval_dataset=small_eval_dataset,
+...     compute_metrics=compute_metrics,
+...     tokenizer=tokenizer,
+...     model_init=model_init,
+...     data_collator=data_collator,
+... )
+```
+
+Call `hyperparameter_search` on [`Trainer`], and set your backend and optimization direction accordingly to minimize or maximize an objective.
+
+You could define your own `compute_objective` function, otherwise, the default `compute_objective` is called and the sum of an evaluation metric like F1 is returned as the objective value.
+
+```py
+>>> best_trial = trainer.hyperparameter_search(
+...     direction="maximize",
+...     backend="optuna",
+...     hp_space=optuna_hp_space,
+...     n_trials=20,
+...     compute_objective=compute_objective,
+... )
+```