Skip to content

Commit

Permalink
top-k instead of top-p in MixtralConfig docstring (#30687)
Browse files Browse the repository at this point in the history
top-k instead of top-p in docstring
  • Loading branch information
sorgfresser authored and Ita Zaporozhets committed May 14, 2024
1 parent d6709d8 commit 3e03d88
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/transformers/models/mixtral/configuration_mixtral.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ class MixtralConfig(PretrainedConfig):
attention_dropout (`float`, *optional*, defaults to 0.0):
The dropout ratio for the attention probabilities.
num_experts_per_tok (`int`, *optional*, defaults to 2):
The number of experts to root per-token, can be also interpreted as the `top-p` routing
The number of experts to route per-token, can be also interpreted as the `top-k` routing
parameter
num_local_experts (`int`, *optional*, defaults to 8):
Number of experts per Sparse MLP layer.
Expand Down

0 comments on commit 3e03d88

Please sign in to comment.