-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement initial ReLU/sign function via polynomial approximation #658
Comments
Also cf. https://openreview.net/pdf?id=Hq16Jk2bVlp and https://eprint.iacr.org/2021/1688 which use these approximations. |
The paper linked above 2020/834 does not release source code, but there is a reference implementation in LattiGo: https://github.com/tuneinsight/lattigo/blob/4cce9a48c1daaa2dd122921822f5ad70cd444156/he/hefloat/minimax_composite_polynomial.go#L124 |
The paper https://eprint.iacr.org/2019/1234 is a precursor to https://eprint.iacr.org/2020/834, but also seems to explain more of the motivation behind the composite polynomials. |
An example of generating a well-fitting polynomial using lolremez: samhocevar/lolremez#28 (comment) Another tool: https://github.com/pychebfun/pychebfun |
Outline sketch:
Various improvements based on more recent research that would be worth splitting into separate tickets.
|
Thanks to Seonhong Min for sending me https://eprint.iacr.org/2018/462, in which it shows that BFV achieves polynomial approximations via a fixed-point approximation, not a floating point one. I think there is also some complexity there in that evaluating a fixed point polynomial approximation also requires a rounding step, but not always, see sec 2.5 |
I also had an interest in this issue a while back, so I know a paper worth sharing: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10155408 |
Oh, I didn't know there was an implementation in Lattigo! In that case, these hardcoded coefficients might not be necessary. |
After starting an implementation in #665 (with a fixed approximation polynomial) and discussing in the HEIR meeting today, I have a few kinks to work out:
|
These folks do something slightly different, which is more holistic in re NN training: https://github.com/EfficientFHE/SmartPAF They pick small degree polynomial approximations and then do a variety of network fine-tuning to adapt the network to the replaced operations. This seems out of scope of the compiler, since it would require training data to be included. |
I added an upstream RFC for the polynomial approximation pass https://discourse.llvm.org/t/rfc-a-polynomial-approximation-pass/79301 |
@JianmingTONG has also worked on https://github.com/EfficientFHE/SmartPAF system for approximating activation functions |
The one in Tune Insight's Lattigo is unstable and highly likely to fail when approximating over multiple intervals. The issue comes from the method to find the roots of the error function which can miss roots. I fixed if this summer, because I needed it for the iDASH challenge, it is in my personal fork of Lattigo. |
@Pro7ech Since I wrote that comment I've been doing a lot of studying on this topic, including writing a basic Remez solver that had issues with finding roots. I believe that using the so-called barycentric form of the remez algorithm would avoid these issues (though I have not yet followed through with this plan to confirm it). Did you come to the same conclusion? Or did you resolve it via a different method? |
I haven't looked into the barycentric Remez, but I've managed to make it work and stable (although quite slow for large degrees) even when the number if discret intervals is in the dozen. I directly look for the extrema of the error function, this is easier and more stable than looking for the roots of the derivative of the error function. Because of the alternating condition, the minimum and maximum number of extrema can be bounded, and knowing how many points we are looking for helps a lot. The code is commented if you want to take a look. |
Given that
ReLU(x) = x (0.5 + 0.5 sgn(x))
, this reduces to approximating the sign function, and this paper appears to have the state of the art: https://eprint.iacr.org/2020/834Also note
max(u, v) = ((u+v) + (u-v)sign(u-v)) / 2
min(u, v) = -max(-u, -v) = ((u+v) - (v - u)sign(v - u)) / 2
The text was updated successfully, but these errors were encountered: