implementation detail about alibi_mask #18

bugm · 2023-11-29T09:25:54Z

Hello, I am reading the code for generating alibi_mask with link https://github.com/ofirpress/attention_with_linear_biases/blob/master/fairseq/models/transformer.py

for the code in line 760 and line 761

self.alibi = self.slopes.unsqueeze(1).unsqueeze(1) * torch.arange(maxpos).unsqueeze(0).unsqueeze(0).expand(attn_heads, -1, -1) #line760
self.alibi = self.alibi.view(attn_heads, 1, maxpos) #line761
I believe we have gotten a tensor with shape (attn_heads, 1, maxpos) in line 760 already, since self.slopes.unsqueeze(1).unsqueeze(1) is a (attn_heads,1,1) tensor and torch.arange(maxpos).unsqueeze(0).unsqueeze(0).expand(attn_heads, -1, -1) is a (attn_heads,1,maxpos) tensor.
So what is the purpose of view it to (attn_heads, 1, maxpos) again?

bugm added the question Further information is requested label Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implementation detail about alibi_mask #18

implementation detail about alibi_mask #18

bugm commented Nov 29, 2023

implementation detail about alibi_mask #18

implementation detail about alibi_mask #18

Comments

bugm commented Nov 29, 2023