From 237c1a3cdd69290749890470d9c5ca8ae8e709d2 Mon Sep 17 00:00:00 2001 From: David Valente Date: Tue, 30 Jan 2024 00:04:57 +0000 Subject: [PATCH] Update flash attention documentation --- docs/source/en/perf_infer_gpu_one.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/perf_infer_gpu_one.md b/docs/source/en/perf_infer_gpu_one.md index 8e01450c93f40e..4cd1f2b23aea1c 100644 --- a/docs/source/en/perf_infer_gpu_one.md +++ b/docs/source/en/perf_infer_gpu_one.md @@ -54,7 +54,7 @@ FlashAttention-2 is currently supported for the following architectures: * [Phi](https://huggingface.co/docs/transformers/model_doc/phi#transformers.PhiModel) * [Qwen2](https://huggingface.co/docs/transformers/model_doc/qwen2#transformers.Qwen2Model) * [Whisper](https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperModel) -* [XLMRoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta#transformers.XLMRobertaModel) +* [xlm_roberta](https://huggingface.co/docs/transformers/model_doc/xlm-roberta#transformers.XLMRobertaModel) You can request to add FlashAttention-2 support for another model by opening a GitHub Issue or Pull Request.