Weird Attention module #229
pfeatherstone
started this conversation in
General
Replies: 2 comments 1 reply
-
I'm not sure, but I don't think batchnorm is equivalent to layernorm when using channel first layout. (a normal attention module would be channel last). |
Beta Was this translation helpful? Give feedback.
0 replies
-
Also, that module can use flash attention, i.e. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Your PSA class uses an Attention block with the following layers:
yolov10/ultralytics/nn/modules/block.py
Lines 781 to 783 in ea93d4f
These conv blocks all use batch normalization. This is weird. I've never seen that inside an attention module. You would normally see LayerNorm right at the end of the attention module, not batch norm after every projection.
Can you explain
Beta Was this translation helpful? Give feedback.
All reactions