-
-
Notifications
You must be signed in to change notification settings - Fork 220
Whisper Advanced Parameters
jhj0517 edited this page May 31, 2024
·
6 revisions
Parameter | Description |
---|---|
beam_size |
Parameter used in the beam search algorithm. TLDR; Higher beam size, higher quality but slower transcription. Smaller beam size, lower quality but faster transcription. |
log_prob_threshold |
Parameter related to how whisper handles the "silent" part of the audio. If the average log probability over sampled tokens is below this value, treat as failed. TLDR; Lower this value if you want Whisper to be more "sensitive" to small sounds. Adjust together with no_speech_threshold and see what happens.
|
no_speech_threshold |
Parameter related to how Whisper handles the "silent" part of the audio. If the no_speech probability is higher than this value AND the average log probability over sampled tokens is below log_prob_threshold , consider the segment as silent. TLDR; Lower this value if you want Whisper to be more "sensitive" to small sounds. Adjust together with log_prob_threshold and see what happens.
|
compute_type |
Compute type such as float16 or float32 . default to float16 if CUDA is enabled, else float32 . |
best_of |
Number of candidates when sampling with non-zero temperature. |
patience |
Beam search patience factor. |
condition_on_previous_text |
If True, the previous output of the model is provided as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop, such as repetition looping or timestamps going out of sync. TLDR; If failure loop (repetitive hallucination) occurs, consider setting this to False. |
initial_prompt |
Optional text to provide as a prompt for the first window. This can be used to provide, or "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns to make it more likely to predict those word correctly. |
temperature |
Temperature for sampling. It can be a tuple of temperatures, which will be successively used upon failures according to either compression_ratio_threshold or log_prob_threshold . |
compression_ratio_threshold |
If the gzip compression ratio is above this value, treat as failed. |