Query of theorem of handling residual networks with ADD layer #8

JacksonZyy · 2023-11-21T11:12:35Z

Dear,

I am very impressed with how you enforce constraints with Lagrange multipliers.
In the paper, I notice that affine layers are encoded with z⁽ⁱ⁾ = W⁽ⁱ⁾z^(i-1)+b⁽ⁱ⁾, which only captures fully-connected/convolutional/... layers's behavior.
But for an Add layer in residual networks in ONNX model, its function is like z⁽ⁱ⁾ = z^(i-1)+z^(i-k). I fail to see how your process extends to residual networks, but I did observe residual networks in your experiments.
So I wonder if there is a theorem behind handling the residual networks? And is this theorem (if any) just a customization of your existing version?
Thank you in advance for your clarification!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query of theorem of handling residual networks with ADD layer #8

Query of theorem of handling residual networks with ADD layer #8

JacksonZyy commented Nov 21, 2023

Query of theorem of handling residual networks with ADD layer #8

Query of theorem of handling residual networks with ADD layer #8

Comments

JacksonZyy commented Nov 21, 2023