Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for writing parallel structured VTK formats (.pvti / .pvtr / .pvts) #98

Merged
merged 18 commits into from
Jan 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "WriteVTK"
uuid = "64499a7a-5c06-52f2-abe2-ccb03c286192"
authors = ["Juan Ignacio Polanco <[email protected]>"]
version = "1.13.0"
version = "1.14.0"

[deps]
Base64 = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Supported dataset formats include:
Moreover, the following metadata formats are supported:
- [multiblock files](https://jipolanco.github.io/WriteVTK.jl/dev/metadata/multiblock/) (`.vtm`),
- [ParaView collections](https://jipolanco.github.io/WriteVTK.jl/dev/metadata/paraview_collections/) (`.pvd`, typically used for time series),
- [parallel files](https://jipolanco.github.io/WriteVTK.jl/dev/metadata/parallel/) (`.pvt*`, only partial support for now).
- [parallel files](https://jipolanco.github.io/WriteVTK.jl/dev/metadata/parallel/) (`.pvt*`).

## Further reading

Expand Down
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Supported dataset formats include:
Moreover, the following metadata formats are supported:
- [multiblock files](@ref Multiblock-files) (`.vtm`),
- [ParaView collections](@ref ParaView-collections) (`.pvd`, typically used for time series),
- [parallel files](@ref Parallel-files) (`.pvt*`, only partial support for now).
- [parallel files](@ref Parallel-files) (`.pvt*`).

## Authors

Expand Down
164 changes: 126 additions & 38 deletions docs/src/metadata/parallel.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,85 +3,173 @@
The parallel file formats do not actually store any data in the file.
Instead, the data is broken into pieces, each of which is stored in a serial file,
and an extra header file is created containing pointers to the corresponding serial files.
The header file extension is the serial extension pre-appended with a `p`.
The header file extension is the serial extension prepended with a `p`.
For instance, for serial `vtu` files, the corresponding header file extension is `pvtu`.

!!! note "Supported dataset types"

For now, parallel file formats have only been tested for [unstructured grid](@ref Unstructured-grid) formats (`.vtu`), and are currently unlikely to work for other kinds of datasets.
Feel free to open an issue or to submit a PR to add support for other dataset types.

## Generating a parallel data file

The parallel header file and the corresponding serial files are generated
using the [`pvtk_grid`](@ref) function.
by a single call to [`pvtk_grid`](@ref).
Its signature is

```julia
pvtk_grid(
args...;
part,
nparts,
ismain = (part == 1),
ghost_level = 0,
kwargs...,
filename, args...;
part, nparts, ismain = (part == 1), ghost_level = 0, kwargs...,
)
```

which returns a handler representing a parallel VTK file that can be
appended with cell and point data and eventually written to disk with
[`vtk_save`](@ref) as usual.
In an MPI job, `vtk_save` will cause each rank to write a serial file and just
a single rank (e.g., rank 0) will write the header file.

Positional and keyword arguments in `args` and `kwargs`
are passed to `vtk_grid` verbatim in order to generate the serial files
(with the exception of file names that are augmented with the
corresponding part id).
This signature is valid for **unstructured grids**.
For the case of structured grids, there are small differences detailed [further below](#Parallel-structured-files).

Positional and keyword arguments in `args` and `kwargs` are passed to `vtk_grid`
verbatim.
Note that serial filenames are automatically generated from `filename` and from
the process id `part`.

The extra keyword arguments only apply to parallel VTK file formats.
The following keyword arguments only apply to parallel VTK file formats.

Mandatory ones are:

- `part` current (1-based) part id (typically MPI rank + 1),
- `nparts` total number of parts (typically the MPI communicator size).
- `part`: current (1-based) part id (typically MPI rank + 1),
- `nparts`: total number of parts (typically the MPI communicator size).

Optional ones are:

- `ismain` true if the current part id `part` is the main (the only one that will write the header file),
- `ghost_level` ghost level.
- `ismain`: `true` if the current part id `part` is the main (the only one that will write the header file),
- `ghost_level`: ghost level.

## Example
## Parallel structured files

This generates the header file and a single serial file:
For structured grids, one needs to specify the portion of the grid associated to each process.
This is done via the `extents` keyword argument, which must be an array containing the data ranges along each dimension associated to each process.
For example, for a dataset of global dimensions ``15×12×4`` distributed across 4
processes, this array may look like the following:

```julia
cells = [
MeshCell(VTKCellTypes.VTK_TRIANGLE, [1, 4, 2]),
MeshCell(VTKCellTypes.VTK_QUAD, [2, 4, 3, 5]),
extents = [
( 1:10, 1:5, 1:4), # process 1
(10:15, 1:5, 1:4), # process 2
( 1:10, 5:12, 1:4), # process 3
(10:15, 5:12, 1:4), # process 4
]
```

In practice, in parallel applications, all processes need to have this
information, and the `extents` argument must be the same for all processes.
Also note that the length of the `extents` array gives the number of processes,
and therefore the `nparts` argument is redundant and not needed for structured
grids.

Finally, note that in the example above the extents for different processes
**overlap** (for instance, the ranges `1:10` and `10:15` overlap at the index
`i = 10`).
This is a requirement of VTK, and without it the full data cannot be visualised
in ParaView.
For MPI applications, this typically means that ghost data need to be exchanged
before writing VTK files.

## Example 1: Unstructured files

This generates two serial files (typically held by two different processes) and
a header file combining them:

```julia
all_data = [
# Process 1
(
points = rand(3, 5), # 5 points on process 1
cells = [ # 2 cells on process 1
MeshCell(VTKCellTypes.VTK_TRIANGLE, [1, 4, 2]),
MeshCell(VTKCellTypes.VTK_QUAD, [2, 4, 3, 5]),
],
),

# Process 2
(
points = rand(3, 4), # 4 points on process 2
cells = [ # 1 cell on process 2
MeshCell(VTKCellTypes.VTK_QUAD, [1, 2, 3, 4]),
]
),
]

saved_files = Vector{Vector{String}}(undef, 2) # files saved by each "process"

x = rand(5)
y = rand(5)
for part = 1:2
data = all_data[part]
saved_files[part] = pvtk_grid(
"simulation", data.points, data.cells;
part = part, nparts = 2,
) do pvtk
pvtk["Pressure"] = sum(data.points; dims = 1)
end
end
```

In this example, `saved_files` lists the files saved by each "process":

```julia
julia> saved_files
2-element Vector{Vector{String}}:
["simulation.pvtu", "simulation/simulation_1.vtu"]
["simulation/simulation_2.vtu"]
```

Note that the files containing the actual data (in this case `simulation_*.vtu`) are stored in a separate `simulation` directory.

## Example 2: Structured files

This generates 4 serial [image data](@ref Image-data) files (`.vti`) and
a header file (`.pvti`) combining them:

```julia
# Global grid
xs_global = range(0, 2; length = 15)
ys_global = range(-1, 1; length = 12)
zs_global = range(0, 1; length = 4)

extents = [
( 1:10, 1:5, 1:4), # process 1
(10:15, 1:5, 1:4), # process 2
( 1:10, 5:12, 1:4), # process 3
(10:15, 5:12, 1:4), # process 4
]

saved_files = pvtk_grid("simulation", x, y, cells; part = 1, nparts = 1) do pvtk
pvtk["Pressure"] = x
pvtk["Processor"] = rand(2)
saved_files = Vector{Vector{String}}(undef, 4) # files saved by each "process"

for part = 1:4
is, js, ks = extents[part] # local indices
xs, ys, zs = xs_global[is], ys_global[js], zs_global[ks] # local grid
saved_files[part] = pvtk_grid(
"fields", xs, ys, zs;
part = part, extents = extents,
) do pvtk
pvtk["Temperature"] = [x + 2y + 3z for x ∈ xs, y ∈ ys, z ∈ zs]
end
end
```

In this example, `saved_files` lists the two files saved at the end of the do-block:
As in the previous example, `saved_files` lists the files saved by each "process":

```julia
julia> saved_files
2-element Vector{String}:
"simulation.pvtu"
"simulation/simulation_1.vtu"
4-element Vector{Vector{String}}:
["fields.pvti", "fields/fields_1.vti"]
["fields/fields_2.vti"]
["fields/fields_3.vti"]
["fields/fields_4.vti"]
```

Note that the files containing the actual data (in this case `simulation_1.vtu`) are stored in a separate `simulation` directory.

## Acknowledgements

Thanks to [Francesc Verdugo](https://www.francescverdugo.com/) and [Alberto
F. Martin](https://research.monash.edu/en/persons/alberto-f-martin) for
the initial parallel file format implementation.
the initial parallel file format implementation, and to [Corentin Lothode](https://lmrs.univ-rouen.fr/fr/persopage/corentin-lothode) for the initial work on structured grids.
46 changes: 30 additions & 16 deletions src/gridtypes/common.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,31 +14,45 @@ function vtk_xml_write_header(vtk::DatasetFile)
xroot::XMLElement
end

# Specifies the "extent" of a structured grid, e.g. (1:4, 1:3, 1:42)
const Extent{N} = Tuple{Vararg{AbstractUnitRange{<:Integer}, N}} where {N}

# Switch to zero-based extent used by VTK.
to_vtk_extent(r::AbstractUnitRange) = r .- 1

function to_vtk_extent(extent::Tuple)
ext = to_extent3(extent)
map(to_vtk_extent, ext)
end

# Convert different extent specifications to Extent{3}.
to_extent3(extent::Extent{3}) = extent

function to_extent3(extent::Extent{N}) where {N}
if N > 3
throw(ArgumentError("dimensionalities N > 3 not supported (got N = $N)"))
elseif N < 3
M = 3 - N
(extent..., ntuple(d -> Base.OneTo(1), Val(M))...) :: Extent{3}
end
end

to_extent3(Ns::Dims) = to_extent3(map(Base.OneTo, Ns))

# Returns the "extent" attribute required for structured (including rectilinear)
# grids.
extent_attribute(Ni, Nj, Nk, ::Nothing=nothing) =
"0 $(Ni - 1) 0 $(Nj - 1) 0 $(Nk - 1)"

function extent_attribute(Ni, Nj, Nk, extent::Array{T}) where T <: Integer
length(extent) == 6 || throw(ArgumentError("extent must have length 6."))
(extent[2] - extent[1] + 1 == Ni) &&
(extent[4] - extent[3] + 1 == Nj) &&
(extent[6] - extent[5] + 1 == Nk) ||
throw(ArgumentError("extent is not consistent with dataset dimensions."))
join(extent, " ")
function extent_attribute(extent_in)
ext = to_vtk_extent(extent_in)
iter = Iterators.map(e -> string(first(e), " ", last(e)), ext)
join(iter, " ")
end

# Number of cells in structured grids.
# In 3D, all cells are hexahedrons (i.e. VTK_HEXAHEDRON), and the number of
# cells is (Ni-1)*(Nj-1)*(Nk-1). In 2D, they are quadrilaterals (VTK_QUAD), and in
# 1D they are line segments (VTK_LINE).
function num_cells_structured(Ni, Nj, Nk)
Ncls = one(Ni)
for N in (Ni, Nj, Nk)
Ncls *= max(1, N - 1)
end
Ncls
function num_cells_structured(Ns::Dims)
prod(N -> max(1, N - 1), Ns)
end

# Accept passing coordinates as a tuple.
Expand Down
Loading