Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC 2024 Final Submission #167

Merged
merged 22 commits into from
Aug 23, 2024
Merged

GSoC 2024 Final Submission #167

merged 22 commits into from
Aug 23, 2024

Conversation

GeliezaK
Copy link
Collaborator

@GeliezaK GeliezaK commented Aug 21, 2024

GSoC 2024 Final Submission

Overview

This pull request is the final submission for my Google Summer of Code 2024 project titled Parallel Algebraic Multigrid for Structural Mechanics in Julia with PartitionedArrays.jl under the mentorship of NumFocus. The project aims to contribute to the current effort of implementing parallel Algebraic Multigrid (AMG) schemes in Julia withing the PartitionedArrays.jl package. More specifically, the goal of this project is to provide a sequential and parallel AMG solver for multidimensional linear elasticity problems. This submission includes all the code, documentation, and tests developed during the GSoC period.

Algebraic Multigrid

Finite element approximations of Partial Differential Equations (PDEs) usually result in large systems of linear equations

$$ A x = b.$$

The main idea of multigrid methods is to solve these systems of equations on multiple levels of grid refinement, from coarse to fine grids. On coarser levels, the solution is more cheaply obtained, and is then refined iteratively by transferring information between grids of different resolutions. This transfer process allows for a more rapid convergence than basic iterative solvers. AMG is a type of multigrid method where the hierarchy of matrices is not user-defined but directly constructed from the system matrix $A$. In smoothed aggregation AMG, the hierarchical levels are essentially groupings of unknowns without any geometric significance, which makes them easier to apply for problems on unstructured meshes.

For a more detailed introduction to smoothed AMG, see references [1,2,3] from the References section below.

Summary of Work Done

Previous to the GSoC project, the existing implementations covered only scalar PDEs with the trivial nullspace. However, the linear elasticity model, which is a common equation in structural mechanics, is a vector-valued PDE with multiple nullspace vectors. The goal of the project was to extend the existing implementation so that vector-valued PDEs with arbitrary nullspaces may be solved.

In summary, the contributions of this project consist of (1) the extension of existing AMG implementations for scalar problems to vector-valued problems with arbitrary nullspaces in sequential, (2) the parallelization of the sequential methods implemented in (1) using the PartitionedArrays.jl programming model, and (3) a PartitionedArrays user tutorial.

Implementation Details

The following sections will describe the different parts of the code that were added or revised to extend the AMG functions for vector-values PDEs.

Aggregation

In vector-valued equations, the entries in the system matrix $A_l$ at resolution level $l$, no longer directly correspond to nodes in the grid. In linear elasticity, the vectors typically represent displacement in three dimensions. Consequently, the value of each node in the grid is represented by three rows in matrix $A_l$. Since the current implementation of the aggregation algorithm assumes each node to correspond to one row, an additional pre-processing step is required where each row of $A_l$ is mapped to a node in the graph of $A_l$. This step was implemented in the new function strength_graph for sparse Matrices and in parallel for PartitionedArrays' PSparseMatrix.

strength_graph implementation
sequential
parallel

Tentative Prolongator

In the case of scalar valued PDEs, the tentative prolongator $\hat{P}$ can simply be built by constructing a matrix filled with constant values that maps the fine-level degrees of freedom (DOFs) to the new coarse-level DOFs. Building the tentative prolongator for vector-valued PDEs is more involved since the nullspace vectors can be arbitrary. In the GSoC project, a general sequential tentative prolongator with block size function was implemented based on Algorithm 7 in [1].

tentative prolongator implementation
sequential
parallel

Smoothed Prolongator

The third step is to construct the final prolongator, which adds smoothing to the tentative prolongator built in the previous step. This smoothing often leads to notable improvement of the multigrid convergence. The smoothed prolongation operator $P_{l+1}$ at level $l + 1$ can be obtained from the non-smoothed tentative prolongator $\hat{P}_{l+1}$ as follows [1]:

$$ P_{l+1} = (I − \frac{\omega}{\lambda}D^{−1}A_l) \hat{P}_{l+1}$$

with $A_l$ the system matrix at refinement level $l$, $D$ a diagonal matrix describing the diagonal part of $A_l$, and $\omega \geq 0$ a damping factor. In the scalar case, $\lambda = 1$, but in the vector-valued case, the value for $\lambda$ is not known and needs to be estimated as $\lambda \geq \rho(D^{−1}A_l)$ . For this purpose, the method spectral_radius was implemented which gives an estimate of the spectral radius of the matrix $D^{−1}A_l$ using the Power method.

$\lambda$ (spectral radius) estimation
sequential
parallel

Testing

Unit and integration tests of all new functions are provided in amg_tests.jl.

The new functionality can be tested with PartitionedArrays.jl inbuilt linear elasticity generators linear_elasticity_fem for the system matrix $A$ and nullspace_linear_elasticity for the near nullspace $B$:

using PartitionedSolvers
using PartitionedArrays

parts_per_dir = (2,2,1)
block_size = length(parts_per_dir) 
p = prod(parts_per_dir)
ranks = DebugArray(LinearIndices((p,)))
nodes_per_dir = (4,4,2)
args = PartitionedArrays.linear_elasticity_fem(nodes_per_dir,parts_per_dir,ranks)
A = psparse(args...) |> fetch 
x = node_coordinates_unit_cube(nodes_per_dir,parts_per_dir, ranks)
B = nullspace_linear_elasticity(x, partition(axes(A,2)))

The AMG solver for vector-valued linear elasticity equations can be configured using the new function amg_level_params_linear_elasticity(block_size).

level_params = amg_level_params_linear_elasticity(block_size)
fine_params = amg_fine_params(;level_params)
solver = amg(;fine_params)

Now setup the solver with the corresponding nullspace and solve:

using IterativeSolvers : cg!

x_exact = pones(partition(axes(A,2)))
b = A * x_exact
y = pzeros(partition(axes(A,2)))
Pl = setup(solver,y,A,b;nullspace=B)
y, history = cg!(y,A,b;Pl,log=true)

Performance

The new solver for vector-valued PDEs was compared against the existing scalar solver in sequential and parallel. The number of iterations using IterativeSolvers.cg! method was compared after the same linear elasticity system matrix $A$ had been preconditioned with either of the solvers. The scalar solver treats each row in matrix $A$ as a node in the graph. The vector-valued solver, that was implemented during this GSoC project, receives a parameter block_size with which the dimension of the linear elasticity matrix can be specified. Consequently, the solver treats each block of block_size rows as one node in the graph. The code to generate the following plots is attached here: performance_tests_linear_elasticity.txt

Sequential

convergence_seq_69984

Both preconditioners result in increasing number of iterations for increasing matrix sizes. The number of iterations seem to reach a plateau for both preconditioners as the matrices grow large. With the vector-valued preconditioner, the conjugate gradient method requires about half the number of iterations to converge as with the scalar preconditioner for large matrices (8 vs. 16 iterations for the third largest matrix, 9 vs. 18 iterations for the second largest and 10 vs. 21 for the largest matrix).

resnorm_seq_69984

The residual norm of the last iteration of the conjugate gradient method is depicted for both preconditioners. The residuals are approximately the same for each matrix size.

runtime_seq_setup_69984

The runtimes of the amg setup() function, which runs the preconditioner, are magnitues slower for the vector-valued preconditioner than for the scalar preconditioner. This can be attributed to the additional computational effort needed to compute the tentative prolongator, which involves a QR-decomposition of the nullspace, as well as the smoothed prolongator, which requires a few iterations of the Power method.

Parallel

convergence_strong_np=16_par_41472

In parallel, the vector valued preconditioner achieves a convergence of the conjugate gradient method of half or less than half of the iterations needed with the scalar preconditioner for all number of processes. The number of iterations increases with increasing number of processes, which is expected since the AMG was altered to allow for the parallelization. The aggregates found in the parallel version are different than the aggregates in the sequential version, because aggregation is only performed with the local nodes. This is a loss of accuracy which leads to a worse preconditioning.

convergence_weak_np=16_par_65856

In this graph, the number of iterations is shown for different numbers of parts and a fixed number of nodes per part. The resulting graphs per part consist of $4^3 = 64$ nodes (4 nodes per dimension), $8^3=512$ nodes (8 nodes per dimension) and $12^3 = 1728$ nodes (12 nodes per dimension). The number of iterations increase with increasing number of processes, which is expected as explained above.

To conclude, the new functionality now allows for vector-valued linear elasticity problems with arbitrary nullspaces to be solved with the PartitionedArrays.jl package. The resulting convergence of the conjugate gradient method is substantially improved when using this new preconditioner.

PartitionedArrays tutorial

Another deliverable of this GSoC project was a new tutorial on how to use the PartitionedArrays.jl package. The tutorial demonstrates the usage of PartitionedArrays package implementing a parallel version of the one-dimensional Jacobi method. Topics covered include

  • Creating block partitions with ghost cells
  • Running functions in parallel using map
  • Updating ghost cells in parallel using consistent!
  • The debugging vs. MPI execution mode
  • Executing the parallel Julia code with MPI

The corresponding pull request can be found here.

Future Work

  • Performance improvements: This project aimed at implementing a correct version of the AMG algorithm for vector valued PDEs in sequential and parallel. Future work should look into possibilities to improve the performance of the implemented functions, such that the runtime of the setup() function may be reduced.
  • Comparison against other libraries: The performance of PartitionedArrays' implementation should be compared against existing implementations from other libraries such as PETSc, in order to investigate whether this implementation gives a practicable Julia alternative to AMG methods written in C or other languages.

Acknowledgments

I would like to thank my mentors, Francesc Verdugo @fverdugo, Alberto F. Martín @amartinhuertas and Oriol Colomés @oriolcg for their guidance and support throughout this project. It has been a very educational, enriching and productive summer and I am grateful for the opportunity to work on this exciting project. I also appreciate the opportunity provided by NumFocus and the GSoC program.

References

  • [1] Tobias Wiesner. “Flexible aggregation-based algebraic multigrid methods for contact and flow problems”. PhD thesis. Technische Universität München, 2015. Link: https://mediatum.ub.tum.de/doc/1229321/document.pdf
  • [2] Stanislav Míka and Petr Vaněk. “Acceleration of convergence of a two-level algebraic algorithm by aggregation in smoothing process”. In: Applications of Mathematics 37.5 (1992), pp. 343–356
  • [3] Rasmus Tamstorf, Toby Jones, and Stephen F McCormick. “Smoothed aggregation multigrid for cloth simulation”. In: ACM Transactions on Graphics (TOG) 34.6 (2015), pp. 1–13.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.98%. Comparing base (37e60ca) to head (6d9cad8).
Report is 37 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #167      +/-   ##
==========================================
- Coverage   85.26%   84.98%   -0.28%     
==========================================
  Files          11       12       +1     
  Lines        4302     5089     +787     
==========================================
+ Hits         3668     4325     +657     
- Misses        634      764     +130     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fverdugo
Copy link
Owner

@GeliezaK Thanks for this PR. Can you please add some figure showing how the number of iterations changes with the number of parts?

@GeliezaK GeliezaK marked this pull request as ready for review August 23, 2024 11:59
Copy link
Owner

@fverdugo fverdugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @GeliezaK

@fverdugo fverdugo merged commit ab04551 into master Aug 23, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants