top-k instead of top-p in MixtralConfig docstring (#30687)

top-k instead of top-p in docstring
huggingface · May 14, 2024 · 3e03d88 · 3e03d88
1 parent d6709d8
commit 3e03d88
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/src/transformers/models/mixtral/configuration_mixtral.py b/src/transformers/models/mixtral/configuration_mixtral.py
@@ -83,7 +83,7 @@ class MixtralConfig(PretrainedConfig):
         attention_dropout (`float`, *optional*, defaults to 0.0):
             The dropout ratio for the attention probabilities.
         num_experts_per_tok (`int`, *optional*, defaults to 2):
-            The number of experts to root per-token, can be also interpreted as the `top-p` routing
+            The number of experts to route per-token, can be also interpreted as the `top-k` routing
             parameter
         num_local_experts (`int`, *optional*, defaults to 8):
             Number of experts per Sparse MLP layer.