Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FESpace consistent assembly strategy #134

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
File renamed without changes
File renamed without changes
38 changes: 38 additions & 0 deletions docs/src/dev-notes/AssemblyStrategies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Assembly strategies

GridapDistributed offers several assembly strategies for distributed linear systems. These strategies modify the ghost layout for the rows and columns of the assembled matric and vector. Depending on your usecase, one strategy may be more convenient than the others.

## SubAssembledRows

!!! info
- **Main idea:** Both columns and rows are ghosted, whith (potentially) different ghost layouts. Assembly is costly but matrix-vector products are cheap.
- **Pros:** Matrix-vector product fills both owned and ghost rows of the output vector. Communication is therefore not required to make the output vector consistent.
- **Cons:** Communication is required to assemble the matrix and vector.
- **Use cases:** Default assembly strategy.

- Each processor integrates over the **owned cells**, i.e there are no duplicated cell contributions. However, processors do not hold all the contributions they need to assemble their matrix and vector.

## FullyAssembledRows

!!! info
- **Main idea:** Columns are ghosted, but rows ownly contain owned indices. Assembly is cheap but matrix-vector products are costly.
- **Pros:** Assembly is local, i.e no communication is required. Column vectors can also be used as row vectors.
- **Cons:** Matrix-vector product only fills the owned rows of the output vector. Communication is therefore required to make the output vector consistent.
- **Use cases:** This is the strategy used by PETSc. You should also use this strategy if you plan to feed back output row vectors as input column vectors during successive matrix-vector products.

- Each processor integrates over **all it's local (owned + ghost) cells**, i.e contributions for interface cells are duplicated. This implies that each processor has access to **all** the contributions for its **owned dofs** without need for any communication.
- Contributions whose row index is not owned by the processor are discarded, while owned rows can be fully assembled without any communication.

## FEConsistentAssembly

!!! info
- **Main idea:** Same as `FullyAssembledRows` but the ghost layout for the columns is the same as the original `FESpace` ghost layout.
- **Pros:** Assembly is local, i.e no communication is required. DoF `PVector`s from the `FESpace` can be used as column and row vectors for the matrix (like in serial).
- **Cons:** Matrix-vector product only fills the owned rows of the output vector. Communication is therefore required to make the output vector consistent.
- **Use cases:** You should use this strategy if you are constantly creating `FEFunction`s with vectors coming from the linear system (and viceversa). This is quite typical for geometric solvers.

Problem:

We cannot have the same partition for FESpaces and Matrix. The reason is that we want the matrix layouts to be owned then ghots. However FESpace local numberings are mixed owned and ghost.

We can create the matrix layouts so that the order of the owned and ghosts (separately) is the same as in the FESpace. Paired with the OwnedAndGhostValues data type, this allows no-copy no-allocation reuse of values between the two dof layouts. However this is prone to create crash-less bugs (the worst kind) if the user is careless. How should this be handled? Is there a cheap way to check that we are not doing anything wrong (without checking the whole index ranges every time)?
11 changes: 4 additions & 7 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,18 @@ Documentation of the `GridapDistributed.jl` library.

## Introduction

The ever-increasing demand for resolution and accuracy in mathematical models of physical processes governed by systems of Partial Differential Equations (PDEs)
can only be addressed using fully-parallel advanced numerical discretization methods and scalable solution methods, thus able to exploit the vast amount of computational resources in state-of-the-art supercomputers. To this end, `GridapDistributed.jl` is a registered software package which provides
fully-parallel distributed memory data structures and associated methods
for the Finite Element (FE) numerical solution of PDEs on parallel computers. Thus, it can be run on multi-core CPU desktop computers at small scales, as well as on HPC clusters and supercomputers at medium/large scales. The data structures in `GridapDistributed.jl` are designed to mirror as far as possible their counterparts in the `Gridap.jl` Julia software package, while implementing/leveraging most of their abstract interfaces. As a result, sequential Julia scripts written in the high-level Application Programming Interface (API) of `Gridap.jl` can be used verbatim up to minor adjustments in a parallel distributed memory context using `GridapDistributed.jl`.
The ever-increasing demand for resolution and accuracy in mathematical models of physical processes governed by systems of Partial Differential Equations (PDEs) can only be addressed using fully-parallel advanced numerical discretization methods and scalable solution methods, thus able to exploit the vast amount of computational resources in state-of-the-art supercomputers. To this end, `GridapDistributed.jl` is a registered software package which provides fully-parallel distributed memory data structures and associated methods for the Finite Element (FE) numerical solution of PDEs on parallel computers. Thus, it can be run on multi-core CPU desktop computers at small scales, as well as on HPC clusters and supercomputers at medium/large scales. The data structures in `GridapDistributed.jl` are designed to mirror as far as possible their counterparts in the `Gridap.jl` Julia software package, while implementing/leveraging most of their abstract interfaces. As a result, sequential Julia scripts written in the high-level Application Programming Interface (API) of `Gridap.jl` can be used verbatim up to minor adjustments in a parallel distributed memory context using `GridapDistributed.jl`.
This equips end-users with a tool for the development of simulation codes able to solve real-world application problems on massively parallel supercomputers while using a highly expressive, compact syntax, that resembles mathematical notation. This is indeed one of the main advantages of `GridapDistributed.jl` and a major design goal that we pursue.

In order to scale FE simulations to large core counts, the mesh used to discretize the computational domain on which the PDE is posed must be partitioned (distributed) among the parallel tasks such that each of these only holds a local portion of the global mesh. The same requirement applies to the rest of data structures in the FE simulation pipeline, i.e., FE space, linear system, solvers, data output, etc. The local portion of each task is composed by a set of cells that it owns, i.e., the **local cells** of the task, and a set of off-processor cells (owned by remote processors) which are in touch with its local cells, i.e., the **ghost cells** of the task.
This overlapped mesh partition is used by `GridapDistributed.jl`, among others, to exchange data among nearest neighbors, and to glue together global Degrees of Freedom (DoFs) which are sitting on the interface among subdomains. Following this design principle, `GridapDistributed.jl` provides scalable parallel data structures and associated methods for simple grid handling (in particular, Cartesian-like meshes of arbitrary-dimensional, topologically n-cube domains), FE spaces setup, and distributed linear system assembly. It is in our future plans to provide highly scalable linear and nonlinear solvers tailored for the FE discretization of PDEs (e.g., linear and nonlinear matrix-free geometric multigrid and domain decomposition preconditioners). In the meantime, however, `GridapDistributed.jl` can be combined with other Julia packages in order to realize the full potential required in real-world applications. These packages and their relation with `GridapDistributed.jl` are overviewed in the next section.
This overlapped mesh partition is used by `GridapDistributed.jl`, among others, to exchange data among nearest neighbors, and to glue together global Degrees of Freedom (DoFs) which are sitting on the interface among subdomains. Following this design principle, `GridapDistributed.jl` provides scalable parallel data structures and associated methods for simple grid handling (in particular, Cartesian-like meshes of arbitrary-dimensional, topologically n-cube domains), FE spaces setup, and distributed linear system assembly. It is in our future plans to provide highly scalable linear and nonlinear solvers tailored for the FE discretization of PDEs (e.g., linear and nonlinear matrix-free geometric multigrid and domain decomposition preconditioners). In the meantime, however, `GridapDistributed.jl` can be combined with other Julia packages in order to realize the full potential required in real-world applications. These packages and their relation with `GridapDistributed.jl` are overviewed in the next section.

## Building blocks and composability

The figure below depicts the relation among `GridapDistributed.jl` and other packages in the Julia package ecosystem. The interaction of `GridapDistributed.jl` and its dependencies is mainly designed with separation of concerns in mind towards high composability and modularity. On the one hand, `Gridap.jl` provides a rich set of abstract types/interfaces suitable for the FE solution of PDEs. It also provides realizations (implementations) of these abstractions tailored to serial/multi-threaded computing environments. `GridapDistributed.jl` **implements** these abstractions for parallel distributed-memory computing environments. To this end, `GridapDistributed.jl` also leverages (**uses**) the serial realizations in `Gridap.jl` and associated methods to handle the local portion on each parallel task. (See arrow labels in the figure below.) On the other hand, `GridapDistributed.jl` relies on `PartitionedArrays.jl` in order to handle the parallel execution model (e.g., message-passing via the Message Passing Interface (MPI)), global data distribution layout, and communication among tasks. `PartitionedArrays.jl` also provides a parallel implementation of partitioned global linear systems (i.e., linear algebra vectors and sparse matrices) as needed in grid-based numerical simulations. While `PartitionedArrays.jl` is an stand-alone package, segregated from `GridapDistributed.jl`, it was designed with parallel FE packages such as `GridapDistributed.jl` in mind. In any case, `GridapDistributed.jl` is designed so that a different distributed linear algebra library from `PartitionedArrays.jl` might be used as well, as far as it is able to provide the same functionality.


| ![fig:packages](./packages_sketchy.png) |
| ![fig:packages](./assets/packages_sketchy.png) |
|:--:|
|`GridapDistributed.jl` and its relation to other packages in the Julia package ecosystem. In this diagram, each rectangle represents a Julia package, while the (directed) arrows represent relations (dependencies) among packages. Both the direction of the arrow and the label attached to the arrows are used to denote the nature of the relation. Thus, e.g., `GridapDistributed.jl` depends on `Gridap.jl` and `PartitionedArrays.jl` , and GridapPETSc depends on `Gridap.jl` and `PartitionedArrays.jl` . Note that, in the diagram, the arrow direction is relevant, e.g., GridapP4est depends on `GridapDistributed.jl` but not conversely.|

Expand All @@ -43,4 +40,4 @@ Here, one can find a list of resources to get started with this programming lang

* First steps to learn Julia form the [Gridap wiki](https://github.com/gridap/Gridap.jl/wiki/Start-learning-Julia) page.
* Official webpage [docs.julialang.org](https://docs.julialang.org/)
* Official list of learning resources [julialang.org/learning](https://julialang.org/learning/)
* Official list of learning resources [julialang.org/learning](https://julialang.org/learning/)
76 changes: 76 additions & 0 deletions src/Algebra.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,35 @@ function Algebra.allocate_vector(::Type{<:BlockPVector{V}},ids::BlockPRange) whe
BlockPVector{V}(undef,ids)
end

# Row/Col vector allocations for serial
function allocate_row_vector(A::AbstractMatrix{T}) where T
return zeros(T,size(A,1))
end

function allocate_col_vector(A::AbstractMatrix{T}) where T
return zeros(T,size(A,2))
end

# Row/Col vector allocations for parallel
function allocate_row_vector(A::PSparseMatrix)
T = eltype(A)
return pfill(zero(T),partition(axes(A,1)))
end

function allocate_col_vector(A::PSparseMatrix)
T = eltype(A)
return pfill(zero(T),partition(axes(A,2)))
end

# Row/Col vector allocations for blocks
function allocate_row_vector(A::AbstractBlockMatrix)
return mortar(map(Aii->allocate_row_vector(Aii),blocks(A)[:,1]))
end

function allocate_col_vector(A::AbstractBlockMatrix)
return mortar(map(Aii->allocate_col_vector(Aii),blocks(A)[1,:]))
end

# This might go to Gridap in the future. We keep it here for the moment.
function change_axes(a::Algebra.ArrayCounter,axes)
@notimplemented
Expand Down Expand Up @@ -303,6 +332,7 @@ end

struct FullyAssembledRows end
struct SubAssembledRows end
struct FEConsistentAssembly end

# For the moment we use COO format even though
# it is quite memory consuming.
Expand Down Expand Up @@ -549,6 +579,29 @@ function _sa_create_from_nz_with_callback(callback,async_callback,a)
return A, b
end

function Algebra.create_from_nz(a::DistributedAllocationCOO{<:FEConsistentAssembly})
f(x) = nothing
A, = _feca_create_from_nz_with_callback(f,a)
return A
end

function Algebra.create_from_nz(a::ArrayBlock{<:DistributedAllocationCOO{<:FEConsistentAssembly}})
f(x) = nothing
A, = _feca_create_from_nz_with_callback(f,a)
return A
end

function _feca_create_from_nz_with_callback(callback,a)
I,J,V = get_allocations(a)
rows = _setup_prange(get_test_gids(a),I;ghost=false,ax=:rows)
cols = get_trial_gids(a)
b = callback(rows)

asys = change_axes(a,(rows,cols))
values = map(create_from_nz,local_views(asys))
A = _setup_matrix(values,rows,cols)
return A, b
end

# PVector assembly

Expand Down Expand Up @@ -595,6 +648,12 @@ function Arrays.nz_allocation(a::PVectorCounter{<:FullyAssembledRows})
PVectorAllocationTrackOnlyValues(a.par_strategy,values,dofs)
end

function Arrays.nz_allocation(a::PVectorCounter{<:FEConsistentAssembly})
dofs = a.test_dofs_gids_prange
values = map(nz_allocation,a.counters)
PVectorAllocationTrackOnlyValues(a.par_strategy,values,dofs)
end

struct PVectorAllocationTrackOnlyValues{A,B,C}
par_strategy::A
values::B
Expand All @@ -615,6 +674,11 @@ function Algebra.create_from_nz(a::PVectorAllocationTrackOnlyValues{<:FullyAssem
_rhs_callback(a,rows)
end

function Algebra.create_from_nz(a::PVectorAllocationTrackOnlyValues{<:FEConsistentAssembly})
rows = _setup_prange_without_ghosts(a.test_dofs_gids_prange)
_rhs_callback(a,rows)
end

function Algebra.create_from_nz(a::PVectorAllocationTrackOnlyValues{<:SubAssembledRows})
# This point MUST NEVER be reached. If reached there is an inconsistency
# in the parallel code in charge of vector assembly
Expand Down Expand Up @@ -673,6 +737,18 @@ function Algebra.create_from_nz(
return A,b
end

function Algebra.create_from_nz(
a::DistributedAllocationCOO{<:FEConsistentAssembly},
c_fespace::PVectorAllocationTrackOnlyValues{<:FEConsistentAssembly})

function callback(rows)
_rhs_callback(c_fespace,rows)
end

A,b = _feca_create_from_nz_with_callback(callback,a)
return A,b
end

struct PVectorAllocationTrackTouchedAndValues{A,B,C}
allocations::A
values::B
Expand Down
8 changes: 8 additions & 0 deletions src/FESpaces.jl
Original file line number Diff line number Diff line change
Expand Up @@ -646,6 +646,14 @@ function local_assembly_strategy(::FullyAssembledRows,rows,cols)
row->rows_local_to_ghost[row]==0,
col->true)
end
function local_assembly_strategy(::FEConsistentAssembly,rows,cols)
rows_local_to_ghost = local_to_ghost(rows)
GenericAssemblyStrategy(
identity,
identity,
row->rows_local_to_ghost[row]==0,
col->true)
end

# Assembler high level constructors
function FESpaces.SparseMatrixAssembler(
Expand Down
8 changes: 8 additions & 0 deletions src/Geometry.jl
Original file line number Diff line number Diff line change
Expand Up @@ -517,6 +517,14 @@ function filter_cells_when_needed(
remove_ghost_cells(trian,cell_gids)
end

function filter_cells_when_needed(
portion::FEConsistentAssembly,
cell_gids::AbstractLocalIndices,
trian::Triangulation)

trian
end

function remove_ghost_cells(trian::Triangulation,gids)
model = get_background_model(trian)
Dt = num_cell_dims(trian)
Expand Down
2 changes: 2 additions & 0 deletions src/GridapDistributed.jl
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,15 @@ import Gridap.ODEs.ODETools: ∂t, ∂tt

export FullyAssembledRows
export SubAssembledRows
export FEConsistentAssembly

export get_cell_gids
export get_face_gids

export local_views, get_parts
export with_ghost, no_ghost

export allocate_col_vector, allocate_row_vector
export redistribute

include("BlockPartitionedArrays.jl")
Expand Down
5 changes: 5 additions & 0 deletions test/FESpacesTests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ end
function main(distribute,parts)
main(distribute,parts,SubAssembledRows())
main(distribute,parts,FullyAssembledRows())
main(distribute,parts,FEConsistentAssembly())
end

function main(distribute,parts,das)
Expand Down Expand Up @@ -199,4 +200,8 @@ function main(distribute,parts,das)

end

with_debug() do distribute
main(distribute,(2,2))
end

end # module
Loading