From 73f4b94db81f1b860d0f5199ee8d1a12a47c4e39 Mon Sep 17 00:00:00 2001 From: Haiping Lu Date: Mon, 2 Oct 2023 09:59:36 +0100 Subject: [PATCH] fix minor typos --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 8ae391b8..5e8aa3fd 100644 --- a/README.md +++ b/README.md @@ -117,7 +117,7 @@ There are several directories in this repo: ## Additional Notes -1. While we focus on a simple yet effect setup, namely adapting only the `q` and `v` projection in a Transformer, in our examples, LoRA can be apply to any subsets of pre-trained weights. We encourage you to explore different configurations, such as adapting the embedding layer by replacing `nn.Embedding` with `lora.Embedding` and/or adapting the MLP layers. It's very likely that the optimal configuration varies for different model architectures and tasks. +1. While we focus on a simple yet effective setup, namely adapting only the `q` and `v` projection in a Transformer, in our examples, LoRA can be applied to any subsets of pre-trained weights. We encourage you to explore different configurations, such as adapting the embedding layer by replacing `nn.Embedding` with `lora.Embedding` and/or adapting the MLP layers. It's very likely that the optimal configuration varies for different model architectures and tasks. 2. Some Transformer implementation uses a single `nn.Linear` for the projection matrices for query, key, and value. If one wishes to constrain the rank of the updates to the individual matrices, one has to either break it up into three separate matrices or use `lora.MergedLinear`. Make sure to modify the checkpoint accordingly if you choose to break up the layer. ```python