GSoC 2024 Final Submission #167

GeliezaK · 2024-08-21T07:27:56Z

GSoC 2024 Final Submission

Overview

This pull request is the final submission for my Google Summer of Code 2024 project titled Parallel Algebraic Multigrid for Structural Mechanics in Julia with PartitionedArrays.jl under the mentorship of NumFocus. The project aims to contribute to the current effort of implementing parallel Algebraic Multigrid (AMG) schemes in Julia withing the PartitionedArrays.jl package. More specifically, the goal of this project is to provide a sequential and parallel AMG solver for multidimensional linear elasticity problems. This submission includes all the code, documentation, and tests developed during the GSoC period.

Algebraic Multigrid

Finite element approximations of Partial Differential Equations (PDEs) usually result in large systems of linear equations

$$ A x = b.$$

The main idea of multigrid methods is to solve these systems of equations on multiple levels of grid refinement, from coarse to fine grids. On coarser levels, the solution is more cheaply obtained, and is then refined iteratively by transferring information between grids of different resolutions. This transfer process allows for a more rapid convergence than basic iterative solvers. AMG is a type of multigrid method where the hierarchy of matrices is not user-defined but directly constructed from the system matrix $A$. In smoothed aggregation AMG, the hierarchical levels are essentially groupings of unknowns without any geometric significance, which makes them easier to apply for problems on unstructured meshes.

For a more detailed introduction to smoothed AMG, see references [1,2,3] from the References section below.

Summary of Work Done

Previous to the GSoC project, the existing implementations covered only scalar PDEs with the trivial nullspace. However, the linear elasticity model, which is a common equation in structural mechanics, is a vector-valued PDE with multiple nullspace vectors. The goal of the project was to extend the existing implementation so that vector-valued PDEs with arbitrary nullspaces may be solved.

In summary, the contributions of this project consist of (1) the extension of existing AMG implementations for scalar problems to vector-valued problems with arbitrary nullspaces in sequential, (2) the parallelization of the sequential methods implemented in (1) using the PartitionedArrays.jl programming model, and (3) a PartitionedArrays user tutorial.

Implementation Details

The following sections will describe the different parts of the code that were added or revised to extend the AMG functions for vector-values PDEs.

Aggregation

In vector-valued equations, the entries in the system matrix $A_l$ at resolution level $l$, no longer directly correspond to nodes in the grid. In linear elasticity, the vectors typically represent displacement in three dimensions. Consequently, the value of each node in the grid is represented by three rows in matrix $A_l$. Since the current implementation of the aggregation algorithm assumes each node to correspond to one row, an additional pre-processing step is required where each row of $A_l$ is mapped to a node in the graph of $A_l$. This step was implemented in the new function strength_graph for sparse Matrices and in parallel for PartitionedArrays' PSparseMatrix.

strength_graph implementation
sequential
parallel

Tentative Prolongator

In the case of scalar valued PDEs, the tentative prolongator $\hat{P}$ can simply be built by constructing a matrix filled with constant values that maps the fine-level degrees of freedom (DOFs) to the new coarse-level DOFs. Building the tentative prolongator for vector-valued PDEs is more involved since the nullspace vectors can be arbitrary. In the GSoC project, a general sequential tentative prolongator with block size function was implemented based on Algorithm 7 in [1].

tentative prolongator implementation
sequential
parallel

Smoothed Prolongator

The third step is to construct the final prolongator, which adds smoothing to the tentative prolongator built in the previous step. This smoothing often leads to notable improvement of the multigrid convergence. The smoothed prolongation operator $P_{l+1}$ at level $l + 1$ can be obtained from the non-smoothed tentative prolongator $\hat{P}_{l+1}$ as follows [1]:

$$ P_{l+1} = (I − \frac{\omega}{\lambda}D^{−1}A_l) \hat{P}_{l+1}$$

with $A_l$ the system matrix at refinement level $l$, $D$ a diagonal matrix describing the diagonal part of $A_l$, and $\omega \geq 0$ a damping factor. In the scalar case, $\lambda = 1$, but in the vector-valued case, the value for $\lambda$ is not known and needs to be estimated as $\lambda \geq \rho(D^{−1}A_l)$ . For this purpose, the method spectral_radius was implemented which gives an estimate of the spectral radius of the matrix $D^{−1}A_l$ using the Power method.

$\lambda$ (spectral radius) estimation
sequential
parallel

Testing

Unit and integration tests of all new functions are provided in amg_tests.jl.

The new functionality can be tested with PartitionedArrays.jl inbuilt linear elasticity generators linear_elasticity_fem for the system matrix $A$ and nullspace_linear_elasticity for the near nullspace $B$:

using PartitionedSolvers
using PartitionedArrays

parts_per_dir = (2,2,1)
block_size = length(parts_per_dir) 
p = prod(parts_per_dir)
ranks = DebugArray(LinearIndices((p,)))
nodes_per_dir = (4,4,2)
args = PartitionedArrays.linear_elasticity_fem(nodes_per_dir,parts_per_dir,ranks)
A = psparse(args...) |> fetch 
x = node_coordinates_unit_cube(nodes_per_dir,parts_per_dir, ranks)
B = nullspace_linear_elasticity(x, partition(axes(A,2)))

The AMG solver for vector-valued linear elasticity equations can be configured using the new function amg_level_params_linear_elasticity(block_size).

level_params = amg_level_params_linear_elasticity(block_size)
fine_params = amg_fine_params(;level_params)
solver = amg(;fine_params)

Now setup the solver with the corresponding nullspace and solve:

using IterativeSolvers : cg!

x_exact = pones(partition(axes(A,2)))
b = A * x_exact
y = pzeros(partition(axes(A,2)))
Pl = setup(solver,y,A,b;nullspace=B)
y, history = cg!(y,A,b;Pl,log=true)

Performance

The new solver for vector-valued PDEs was compared against the existing scalar solver in sequential and parallel. The number of iterations using IterativeSolvers.cg! method was compared after the same linear elasticity system matrix $A$ had been preconditioned with either of the solvers. The scalar solver treats each row in matrix $A$ as a node in the graph. The vector-valued solver, that was implemented during this GSoC project, receives a parameter block_size with which the dimension of the linear elasticity matrix can be specified. Consequently, the solver treats each block of block_size rows as one node in the graph. The code to generate the following plots is attached here: performance_tests_linear_elasticity.txt

Sequential

Both preconditioners result in increasing number of iterations for increasing matrix sizes. The number of iterations seem to reach a plateau for both preconditioners as the matrices grow large. With the vector-valued preconditioner, the conjugate gradient method requires about half the number of iterations to converge as with the scalar preconditioner for large matrices (8 vs. 16 iterations for the third largest matrix, 9 vs. 18 iterations for the second largest and 10 vs. 21 for the largest matrix).

The residual norm of the last iteration of the conjugate gradient method is depicted for both preconditioners. The residuals are approximately the same for each matrix size.

The runtimes of the amg setup() function, which runs the preconditioner, are magnitues slower for the vector-valued preconditioner than for the scalar preconditioner. This can be attributed to the additional computational effort needed to compute the tentative prolongator, which involves a QR-decomposition of the nullspace, as well as the smoothed prolongator, which requires a few iterations of the Power method.

Parallel

In parallel, the vector valued preconditioner achieves a convergence of the conjugate gradient method of half or less than half of the iterations needed with the scalar preconditioner for all number of processes. The number of iterations increases with increasing number of processes, which is expected since the AMG was altered to allow for the parallelization. The aggregates found in the parallel version are different than the aggregates in the sequential version, because aggregation is only performed with the local nodes. This is a loss of accuracy which leads to a worse preconditioning.

In this graph, the number of iterations is shown for different numbers of parts and a fixed number of nodes per part. The resulting graphs per part consist of $4^3 = 64$ nodes (4 nodes per dimension), $8^3=512$ nodes (8 nodes per dimension) and $12^3 = 1728$ nodes (12 nodes per dimension). The number of iterations increase with increasing number of processes, which is expected as explained above.

To conclude, the new functionality now allows for vector-valued linear elasticity problems with arbitrary nullspaces to be solved with the PartitionedArrays.jl package. The resulting convergence of the conjugate gradient method is substantially improved when using this new preconditioner.

PartitionedArrays tutorial

Another deliverable of this GSoC project was a new tutorial on how to use the PartitionedArrays.jl package. The tutorial demonstrates the usage of PartitionedArrays package implementing a parallel version of the one-dimensional Jacobi method. Topics covered include

Creating block partitions with ghost cells
Running functions in parallel using map
Updating ghost cells in parallel using consistent!
The debugging vs. MPI execution mode
Executing the parallel Julia code with MPI

The corresponding pull request can be found here.

Future Work

Performance improvements: This project aimed at implementing a correct version of the AMG algorithm for vector valued PDEs in sequential and parallel. Future work should look into possibilities to improve the performance of the implemented functions, such that the runtime of the setup() function may be reduced.
Comparison against other libraries: The performance of PartitionedArrays' implementation should be compared against existing implementations from other libraries such as PETSc, in order to investigate whether this implementation gives a practicable Julia alternative to AMG methods written in C or other languages.

Acknowledgments

I would like to thank my mentors, Francesc Verdugo @fverdugo, Alberto F. Martín @amartinhuertas and Oriol Colomés @oriolcg for their guidance and support throughout this project. It has been a very educational, enriching and productive summer and I am grateful for the opportunity to work on this exciting project. I also appreciate the opportunity provided by NumFocus and the GSoC program.

References

[1] Tobias Wiesner. “Flexible aggregation-based algebraic multigrid methods for contact and flow problems”. PhD thesis. Technische Universität München, 2015. Link: https://mediatum.ub.tum.de/doc/1229321/document.pdf
[2] Stanislav Míka and Petr Vaněk. “Acceleration of convergence of a two-level algebraic algorithm by aggregation in smoothing process”. In: Applications of Mathematics 37.5 (1992), pp. 343–356
[3] Rasmus Tamstorf, Toby Jones, and Stephen F McCormick. “Smoothed aggregation multigrid for cloth simulation”. In: ACM Transactions on Graphics (TOG) 34.6 (2015), pp. 1–13.

…gregates

codecov-commenter · 2024-08-23T07:11:35Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.98%. Comparing base (37e60ca) to head (6d9cad8).
Report is 37 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #167      +/-   ##
==========================================
- Coverage   85.26%   84.98%   -0.28%     
==========================================
  Files          11       12       +1     
  Lines        4302     5089     +787     
==========================================
+ Hits         3668     4325     +657     
- Misses        634      764     +130

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fverdugo · 2024-08-23T07:54:55Z

@GeliezaK Thanks for this PR. Can you please add some figure showing how the number of iterations changes with the number of parts?

fverdugo

Thanks for your contribution @GeliezaK

GeliezaK added 20 commits July 1, 2024 09:40

Add sequential strength graph computation

fe2cf93

Add strength graph and tentative prolongator with block size

9d93e31

Add IterativeSolvers dependency

e7ea591

Merge branch 'master' into gsoc

913a254

Add spectral radius estimation

ae89c8d

Add parallel implementation of strength_graph and collect_nodes_in_ag…

b8c4470

…gregates

Add parallel implementation of tentative prolongator

d229946

Add tests for parallel tentative prolongator

7cf1469

Remove plots of power method convergence

e06c74e

Merge remote-tracking branch 'origin/gallery' into gsoc

c36be2c

Use linear elasticity matrix for testing

e2a8e86

Merge branch 'master' into gsoc

39a0be1

Use near nullspace for linear elasticity for testing

2008787

Fix typo in node_coordinates_unit_cube

b2ee910

Delete remove_singleton_aggregates

e80f15c

Test amg for sequential linear elasticity

5fc4e95

Fix bug in nullspace_linear_elasticity

e407c4d

Fix bug in tentative_prolongator_with_block_size

706885d

Remove Pkg dependency

2946d79

Remove Statistics dependency

a0d5696

GeliezaK requested review from fverdugo and amartinhuertas August 21, 2024 15:18

GeliezaK added 2 commits August 22, 2024 17:09

Set default epsilon to 0.01

f258874

Fix bug in strength_graph

6d9cad8

GeliezaK marked this pull request as ready for review August 23, 2024 11:59

fverdugo approved these changes Aug 23, 2024

View reviewed changes

fverdugo merged commit ab04551 into master Aug 23, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GSoC 2024 Final Submission #167

GSoC 2024 Final Submission #167

GeliezaK commented Aug 21, 2024 •

edited

Loading

codecov-commenter commented Aug 23, 2024

fverdugo commented Aug 23, 2024

fverdugo left a comment

GSoC 2024 Final Submission #167

GSoC 2024 Final Submission #167

Conversation

GeliezaK commented Aug 21, 2024 • edited Loading

GSoC 2024 Final Submission

Overview

Algebraic Multigrid

Summary of Work Done

Implementation Details

Aggregation

Tentative Prolongator

Smoothed Prolongator

Testing

Performance

Sequential

Parallel

PartitionedArrays tutorial

Future Work

Acknowledgments

References

codecov-commenter commented Aug 23, 2024

Codecov Report

fverdugo commented Aug 23, 2024

fverdugo left a comment

Choose a reason for hiding this comment

GeliezaK commented Aug 21, 2024 •

edited

Loading