Avoid LAPACK dependency for Savitzky Golay filter #11

Vindaar · 2022-01-22T14:39:41Z

In ggplotnim we've seen multiple times already that users have issues with the added LAPACK dependency introduced from geom_smooth (the fact that it shouldn't end up as a dep. in the first place if the user doesn't use geom_smooth is a different issue of course).

Ideally, we would use an alternative implementation to solve the least squares problem here:
https://github.com/SciNim/scinim/blob/main/scinim/signals/filters.nim#L29

(If performance allows, then fully replace it, otherwise fall back if LAPACK is not specified)

Investigate if we can use @c-blake's fitl:
https://github.com/c-blake/fitl
for this purpose.

The text was updated successfully, but these errors were encountered:

c-blake · 2022-01-22T15:11:51Z

Happy to help use it, if I can. The command line driver is pretty fancy. To just get estimated coefs you should be able to just set up the SVD and do the b[] loop - something like:

discard svdx(u, s, v, n, m)     # SVD LHS of X.b=y design eq
for k in 0 ..< m:               # Calc best fit coefs `b`
  b[k] = sum0(j,m, sum0(i,n, v[k + m*j]*s[j]*u[i + n*j]*y[i]))

(EDIT: update a per c-blake/fitl@268d3b4 and c-blake/fitl@8f03e5c)

I was leaving that not some separate API call a) partially due to not having any users and b) to remind myself to maybe do a double-sum/triple-sum/.. template/macro - regular one was finicky enough for now. :-) This kind of sum notation is only slightly more explicit than the Einstein summation convention, but that is probably the right balance for many actual programs (maybe not all, though).

More debatable in terms of clarity balance might be index calculations vs [k,j]. Even there, the explicitness makes it easier (for me, anyway) to reason about cache performance which works much better (10X?) over dense back-to-back striding vs. large scale striding (in e.g. the dot product hot loop of svdx). BW use of cache line loads alone might be 64B/4B=16X for float32, though that may only matter for out of L1 problems which may not matter for your Savitzky-Golay. Personally, I find it not so mind-melting if you use consistent letter conventions for i 0..<n, j 0..<m|p, etc., but it's definitely more thought than just tossing on a cap-Sigma_{i,k} equivalent.

This code has a legacy in C dating back to circa y2k. If you think LAPACK/BLAS integration is a pain now...oh boy. Lol. Well, maybe diversity breeds complexity and it's even worse today. If "integration pain" were easy to quantify then a lot of things would be different. ;-)

Vindaar · 2022-01-31T13:57:03Z

Thanks for your input!

I had a look a few days ago, but couldn't fully get it to work. I tried on a simple example with a few points and got the correct result. But once I tried to use it for one of the examples used in the ggplotnim repo, I only got a bunch of NaNs. Didn't have time to fully investigate that yet. Essentially I need to compute the SVD of a big Vandermonde matrix (in the example O(6 x ~2400).

I'll get back to you once I've done some more testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid LAPACK dependency for Savitzky Golay filter #11

Avoid LAPACK dependency for Savitzky Golay filter #11

Vindaar commented Jan 22, 2022

c-blake commented Jan 22, 2022 •

edited

Loading

Vindaar commented Jan 31, 2022

Avoid LAPACK dependency for Savitzky Golay filter #11

Avoid LAPACK dependency for Savitzky Golay filter #11

Comments

Vindaar commented Jan 22, 2022

c-blake commented Jan 22, 2022 • edited Loading

Vindaar commented Jan 31, 2022

c-blake commented Jan 22, 2022 •

edited

Loading