Skip to content

Latest commit

 

History

History
41 lines (24 loc) · 2.87 KB

README.md

File metadata and controls

41 lines (24 loc) · 2.87 KB

When Do Prompting and Prefix-Tuning Work?

This is the companion code for our When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations paper.

You may want to download some of our trained prefixes needed to run some of the notebooks from https://drive.google.com/drive/folders/1Bff8VKh1ZdflaVFKDY9MiyCmewoAGnzV?usp=sharing. Make sure to place the checkpoints from the minGPT directory into the minGPT directory of this repository.

Structure:

  • llama contains a modified version of the original LLaMA code that adds an implementation for prefix-tuning. Changes have been clearly commented as such.

  • llama_token_vs_soft_token.ipynb compares how many unique completions LLaMA has if we vary only the first token vs if we vary only the first virtual token (Section 3).

  • constructions.ipynb contains the implementations of transformer architecutres whose unconditional and conditional generation is fully governed by the choice of virtual tokens (Section 3).

  • prefix_bias_only.ipynb illustrates that the theory from Section 4 holds for LLaMA, namely that a prefix cannot change the relative attention distribution over the content positions and only induces a bias in the attention layer output.

  • minGPT contains a modified version of the original minGPT code with an implementation of prefix-tuning. The directory also contains the experiments of Section 5 from the paper:

    • 01_cannot_learn_new_task.ipynb shows that prefix-tuning cannot learn a new task that requires a different attention pattern.

    • 02_can_extract_pretrained_task.ipynb shows that prefix-tuning can be used to specialize the model for one of the tasks it has seen during pre-training.

    • 03_can_learn_new_task_same_attention.ipynb shows that prefix-tuning can also learn a new task, as long as the attention patterns necessary to solve it have been learned during pretraining, but cannot learn a new task (double histogram) that cannot be solved with skills learned during pretraining.

    • 04_prefix_tuning_vs_lora.ipynb shows that rank-1 LoRA on the MLP is sufficient to learn double histogram but prefix-tuning with the same number of learnable parameters cannot (Section 6).

  • longer_prefixes.ipynb shows that the attention distribution over the prefix positions is not unifromly distributed, showing that prefix-tuning does not make full use of the subspace spanned by the prefix-induced biases. (Appendix B)

Reference

@inproceedings{petrov2023when,
  title={When Do Prompting and Prefix-Tuning Work? {A} Theory of Capabilities and Limitations},
  author={Aleksandar Petrov and Philip H. S. Torr and Adel Bibi},
  booktitle={International Conference on Learning Representations},
  url={https://arxiv.org/abs/2310.19698},
  year={2024}
}