You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this context it will be very interesting to measure the speedup in breaking down block matrices. In fact, for every operations we do on matrices, it is faster to do it with more smaller matrices than with one huge matrix.
This is even more significant when inverting the matrix, especially in the GPU.
So we can apply this concept to a couple of things:
When we have a multiplication of the adjoint matrix ad_xi we have that this matrix can be divided in four blocks, and the one in the upper left is zero. When we take the transpose, the block of zeros goes in the bottom left part.
So, we could split the computation in two parts: one multiplication of the upper part and another for the bottom right block.
This means: in the case of the wrench : first integrating the forces and then the couples, and will save some operations.
When you will test it, you can compare the different sizes of the corresponding A matrix and b vector.
To compute the generalized forces, we multiply by the matrix Phi and B by the wrench.
However, in this part, we have that these matrices are already known. So first, we can precompute the values at the Chebyshev point.
Moreover, being two block matrices, we can break the computation in as many parts as blocks in the matrices.
For these two cases, it will be interesting to measure the speedup in the classical CPU code and in the parallelized GPU code.
So once the library is completed, we can perform a couple of measures and define the current computational time.
Then we will compare with this optimized version.
Similarly in the GPU. We will first perform a common Spectral integration and then break down the terms.
The text was updated successfully, but these errors were encountered:
aGotelli
changed the title
Optimized version for the adjoint computations
Measuring the unexploited efficiency of breaking down a block-diagonal matrix
May 9, 2022
In this context it will be very interesting to measure the speedup in breaking down block matrices. In fact, for every operations we do on matrices, it is faster to do it with more smaller matrices than with one huge matrix.
This is even more significant when inverting the matrix, especially in the GPU.
So we can apply this concept to a couple of things:
When we have a multiplication of the adjoint matrix ad_xi we have that this matrix can be divided in four blocks, and the one in the upper left is zero. When we take the transpose, the block of zeros goes in the bottom left part.
So, we could split the computation in two parts: one multiplication of the upper part and another for the bottom right block.
This means: in the case of the wrench : first integrating the forces and then the couples, and will save some operations.
When you will test it, you can compare the different sizes of the corresponding A matrix and b vector.
To compute the generalized forces, we multiply by the matrix Phi and B by the wrench.
However, in this part, we have that these matrices are already known. So first, we can precompute the values at the Chebyshev point.
Moreover, being two block matrices, we can break the computation in as many parts as blocks in the matrices.
For these two cases, it will be interesting to measure the speedup in the classical CPU code and in the parallelized GPU code.
So once the library is completed, we can perform a couple of measures and define the current computational time.
Then we will compare with this optimized version.
Similarly in the GPU. We will first perform a common Spectral integration and then break down the terms.
The text was updated successfully, but these errors were encountered: