Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid LAPACK dependency for Savitzky Golay filter #11

Open
Vindaar opened this issue Jan 22, 2022 · 2 comments
Open

Avoid LAPACK dependency for Savitzky Golay filter #11

Vindaar opened this issue Jan 22, 2022 · 2 comments

Comments

@Vindaar
Copy link
Member

Vindaar commented Jan 22, 2022

In ggplotnim we've seen multiple times already that users have issues with the added LAPACK dependency introduced from geom_smooth (the fact that it shouldn't end up as a dep. in the first place if the user doesn't use geom_smooth is a different issue of course).

Ideally, we would use an alternative implementation to solve the least squares problem here:
https://github.com/SciNim/scinim/blob/main/scinim/signals/filters.nim#L29

(If performance allows, then fully replace it, otherwise fall back if LAPACK is not specified)

Investigate if we can use @c-blake's fitl:
https://github.com/c-blake/fitl
for this purpose.

@c-blake
Copy link

c-blake commented Jan 22, 2022

Happy to help use it, if I can. The command line driver is pretty fancy. To just get estimated coefs you should be able to just set up the SVD and do the b[] loop - something like:

discard svdx(u, s, v, n, m)     # SVD LHS of X.b=y design eq
for k in 0 ..< m:               # Calc best fit coefs `b`
  b[k] = sum0(j,m, sum0(i,n, v[k + m*j]*s[j]*u[i + n*j]*y[i]))

(EDIT: update a per c-blake/fitl@268d3b4 and c-blake/fitl@8f03e5c)

I was leaving that not some separate API call a) partially due to not having any users and b) to remind myself to maybe do a double-sum/triple-sum/.. template/macro - regular one was finicky enough for now. :-) This kind of sum notation is only slightly more explicit than the Einstein summation convention, but that is probably the right balance for many actual programs (maybe not all, though).

More debatable in terms of clarity balance might be index calculations vs [k,j]. Even there, the explicitness makes it easier (for me, anyway) to reason about cache performance which works much better (10X?) over dense back-to-back striding vs. large scale striding (in e.g. the dot product hot loop of svdx). BW use of cache line loads alone might be 64B/4B=16X for float32, though that may only matter for out of L1 problems which may not matter for your Savitzky-Golay. Personally, I find it not so mind-melting if you use consistent letter conventions for i 0..<n, j 0..<m|p, etc., but it's definitely more thought than just tossing on a cap-Sigma_{i,k} equivalent.

This code has a legacy in C dating back to circa y2k. If you think LAPACK/BLAS integration is a pain now...oh boy. Lol. Well, maybe diversity breeds complexity and it's even worse today. If "integration pain" were easy to quantify then a lot of things would be different. ;-)

@Vindaar
Copy link
Member Author

Vindaar commented Jan 31, 2022

Thanks for your input!

I had a look a few days ago, but couldn't fully get it to work. I tried on a simple example with a few points and got the correct result. But once I tried to use it for one of the examples used in the ggplotnim repo, I only got a bunch of NaNs. Didn't have time to fully investigate that yet. Essentially I need to compute the SVD of a big Vandermonde matrix (in the example O(6 x ~2400).

I'll get back to you once I've done some more testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants