-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up mul! for Jacobi matrices (X, Y) #190
Comments
I assume you meant to write julia> Jxt = jacobimatrix(Val(1), T²);
julia> Jyt = jacobimatrix(Val(2), T²); ? |
Yes, you're right. Timings and script updated |
The timings seem insensitive to the types of the vectors (blockedvectors or vanilla vectors). |
With special routines ( |
Aren't they implemented as |
Note |
Here's an example, but I just realised its not smart enough to reduce to banded * matrix * banded..... so I think the first step should be to speed up the following. Note for finite-dimensional we'll have to zero out the bottom right of the matrix since julia> f = (x,y) -> exp(x*cos(y));
julia> @time c = transform(T², splat(f)) # actually stores a matrix
0.001255 seconds (1.29 k allocations: 154.328 KiB)
ℵ₀-blocked ℵ₀-element LazyBandedMatrices.DiagTrav{Float64, 2, LazyArrays.ApplyArray{Float64, 2, typeof(Base.setindex), Tuple{FillArrays.Zeros{Float64, 2, Tuple{InfiniteArrays.OneToInf{Int64}, InfiniteArrays.OneToInf{Int64}}}, Matrix{Float64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}}:
1.1599721136150525
───────────────────────
0.8308530775663573
0.0
───────────────────────
0.16232639352729974
0.0
-0.0953431447201382
───────────────────────
0.022229436911517752
0.0
-0.28414170532898164
0.0
───────────────────────
0.0023697762434078066
0.0
-0.0977548626325749
0.0
0.010198907422008477
───────────────────────
0.00020799840787857313
0.0
-0.01851811361776135
0.0
0.01439021748117314
0.0
───────────────────────
1.5554820721467286e-5
0.0
-0.00243194182561198
0.0
0.010777474168944185
0.0
-0.0005266935134853384
───────────────────────
1.0142343964071342e-6
0.0
-0.00024590442620664693
0.0
0.0032677180350806525
⋮
julia> @time Jxt * c # actually does banded * matrix * banded
0.005148 seconds (31.94 k allocations: 12.766 MiB)
setindex(ℵ₀-element FillArrays.Zeros{Float64, 1, Tuple{BlockArrays.BlockedOneTo{Int64, ArrayLayouts.RangeCumsum{Int64, InfiniteArrays.OneToInf{Int64}}}}} with indices BlockArrays.BlockedOneTo(ArrayLayouts.RangeCumsum(OneToInf())), 60-blocked 1830-element BlockVector{Float64}, 60-blocked 1830-element BlockArrays.BlockedOneTo{Int64, LazyArrays.ApplyArray{Int64, 1, typeof(vcat), Tuple{Vector{Int64}, StepRangeLen{Int64, Int64, Int64, Int64}}}}) with indices BlockArrays.BlockedOneTo(ArrayLayouts.RangeCumsum(OneToInf())):
0.41542653878317864
───────────────────────
1.2411353103787024
0.0
───────────────────────
0.42654125723893754
0.0
-0.14207085266449082
───────────────────────
0.08234808488535378
0.0
-0.14422057603642566
0.0
───────────────────────
0.011218717659698162
0.0
-0.1513299094733715
0.0
0.00719510874058657
───────────────────────
0.001192665532064637
0.0
-0.050093402229093434
0.0
0.01558764450648057
0.0
───────────────────────
0.00010450632113749013
0.0
-0.009382009021984
0.0
0.008828967758126896
0.0
-0.0004428188754744439
───────────────────────
7.806726705590395e-6
0.0
-0.0012261261148056486
0.0
0.00568161993130593
⋮ |
I probably won't be able to look at this for a while. But I want you to look into your heart and ask yourself, do you actually want the code to be fast? If so, why is it that you live your life in such a rush? Slow code is a valuable opportunity to take time for self-reflection, and I would hate to rob you of this valuable opportunity. |
Guys, AI is getting out of hand 😅 |
AI?? |
Your spiel sounded like AI garbage, Sheehan. (I loved it) |
I don’t actually know how to use AI so that was all natural, baby |
The following code compares the
mul!
speed of banded-block-banded matrices to the diagonal. ForChebyshevT
, the diagonalmul!
is mathematically only 3x for X and 9x for Y given the tridagonal-block-tridiagonal structure, but there appears to be a large overhead to it all. Moreover,mul!
withviews
may spend a lot of time with the garbage collector and can be slower even though they require less data. I also don't see whymul!
withX
is not 3x faster than that withY
. Finally,mul!
with an adjoint view is basically unusable (O(n^4) complexity?).The text was updated successfully, but these errors were encountered: