Attention flow

This repository contain implementations of Attention Rollout and Attention Flow algorithms, which are post hoc methods to get more explanatory attention weights.

Attention Rollout and Attention Flow recursively compute the token attentions in each layer of a given model given the embedding attentions as input. They differ in the assumptions they make about how attention weights in lower layers affect the flow of information to the higher layers and whether to compute the token attentions relative to each other or independently.

Colab showing how to apply these methods on a pretrained BERT model of huggingface Transformer library

Here is the paper introducing these methods:

Quantifying Attention Flow in Transformers

Related projects:

An implementation of Attention Rollout for Vision Transformers by Jacob Gildenblat (and a nice blog post on Exploring Explanaibality for Vision Transformers).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Attention flow

Here is the paper introducing these methods:

Related projects:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Attention flow

Here is the paper introducing these methods:

Related projects: