Specifies the global parameters for an N-dimensional distributed transform. These include the element type T and global data sizes of input and output data, as well as the transform types to be performed along each dimension.
transforms must be a tuple of length N specifying the transforms to be applied along each dimension. Each element must be a subtype of Transforms.AbstractTransform. For all the possible transforms, see Transform types.
The element type must be a real type accepted by FFTW, i.e. either Float32 or Float64.
Note that the transforms are applied one dimension at a time, with the leftmost dimension first for forward transforms.
Example
To perform a 3D FFT of real data, first a real-to-complex FFT must be applied along the first dimension, followed by two complex-to-complex FFTs along the other dimensions:
julia> size_global = (64, 32, 128); # size of real input data
+Global FFT parameters · PencilFFTs.jl
Specifies the global parameters for an N-dimensional distributed transform. These include the element type T and global data sizes of input and output data, as well as the transform types to be performed along each dimension.
transforms must be a tuple of length N specifying the transforms to be applied along each dimension. Each element must be a subtype of Transforms.AbstractTransform. For all the possible transforms, see Transform types.
The element type must be a real type accepted by FFTW, i.e. either Float32 or Float64.
Note that the transforms are applied one dimension at a time, with the leftmost dimension first for forward transforms.
Example
To perform a 3D FFT of real data, first a real-to-complex FFT must be applied along the first dimension, followed by two complex-to-complex FFTs along the other dimensions:
Plan for N-dimensional FFT-based transform on MPI-distributed data, where input data has type T.
PencilFFTPlan(p::Pencil, transforms; kwargs...)
Create a PencilFFTPlan for distributed arrays following a given Pencil configuration. See variant below for details on the specification of transforms and on possible keyword arguments.
Plan for N-dimensional FFT-based transform on MPI-distributed data, where input data has type T.
PencilFFTPlan(p::Pencil, transforms; kwargs...)
Create a PencilFFTPlan for distributed arrays following a given Pencil configuration. See variant below for details on the specification of transforms and on possible keyword arguments.
Allocate uninitialised PencilArray that can hold input data for the given plan.
The second and third forms respectively allocate an array of PencilArrays of size dims, and a tuple of NPencilArrays.
In-place plans
If p is an in-place plan, a ManyPencilArray is allocated. This type holds PencilArray wrappers for the input and output transforms (as well as for intermediate transforms) which share the same space in memory. The input and output PencilArrays should be respectively accessed by calling first(::ManyPencilArray) and last(::ManyPencilArray).
Allocate uninitialised PencilArray that can hold input data for the given plan.
The second and third forms respectively allocate an array of PencilArrays of size dims, and a tuple of NPencilArrays.
In-place plans
If p is an in-place real-to-real or complex-to-complex plan, a ManyPencilArray is allocated. If p is an in-place real-to-complex plan, a ManyPencilArrayRFFT! is allocated.
These types hold PencilArray wrappers for the input and output transforms (as well as for intermediate transforms) which share the same space in memory. The input and output PencilArrays should be respectively accessed by calling first(::ManyPencilArray) and last(::ManyPencilArray).
Example
Suppose p is an in-place PencilFFTPlan. Then,
@assert is_inplace(p)
A = allocate_input(p) :: ManyPencilArray
v_in = first(A) :: PencilArray # input data view
v_out = last(A) :: PencilArray # output data view
Also note that in-place plans must be performed directly on the returned ManyPencilArray, and not on the contained PencilArray views:
p * A # perform forward transform in-place
p \ A # perform backward transform in-place
-# p * v_in # not allowed!!
Container holding M different PencilArray views to the same underlying data buffer. All views share the same dimensionality N. The element type T of the first view is real, that of subsequent views is Complex{T}.
This can be used to perform in-place real-to-complex plan, see alsoTransforms.RFFT!. It is used internally for such transforms by allocate_input and should not be constructed directly.
It is possible to measure the time spent in different sections of the distributed transforms using the TimerOutputs package. This has a (very small) performance overhead, so it is disabled by default. To enable time measurements, call TimerOutputs.enable_debug_timings after loading PencilFFTs (see below for an example). For more details see the TimerOutputs docs.
It is possible to measure the time spent in different sections of the distributed transforms using the TimerOutputs package. This has a (very small) performance overhead, so it is disabled by default. To enable time measurements, call TimerOutputs.enable_debug_timings after loading PencilFFTs (see below for an example). For more details see the TimerOutputs docs.
Minimal example:
using MPI
using PencilFFTs
using TimerOutputs
@@ -21,4 +21,4 @@
# [do stuff with `plan`...]
-print_timer(to)
Settings
This document was generated with Documenter.jl version 0.27.24 on Wednesday 15 March 2023. Using Julia version 1.9.0-rc1.
+print_timer(to)
Settings
This document was generated with Documenter.jl version 1.7.0 on Friday 11 October 2024. Using Julia version 1.11.0.
Like AbstractFFTs.bfft, this transform is not normalised. To obtain the inverse transform, divide the output by the length of the transformed dimension.
To obtain the inverse transform, divide the output by the length of the transformed dimension (of the real output array).
As described in the AbstractFFTs docs, the length of the output cannot be fully inferred from the input length. For this reason, the BRFFT constructor accepts an optional d argument indicating the output length.
For multidimensional datasets, a tuple of dimensions (d1, d2, ..., dN) may also be passed. This is equivalent to passing just dN. In this case, the last dimension (dN) is the one that changes size between the input and output. Note that this is the opposite of FFTW.brfft. The reason is that, in PencilFFTs, the last dimension is the one along which a complex-to-real transform is performed.
plan(transform::AbstractTransform, A, [dims];
- flags=FFTW.ESTIMATE, timelimit=Inf)
Create plan to transform array A along dimensions dims.
If dims is not specified, all dimensions of A are transformed.
For FFT plans, this function wraps the AbstractFFTs.jl and FFTW.jl plan creation functions. For more details on the function arguments, see AbstractFFTs.plan_fft.
Returns the backwards transform associated to the given transform.
The second argument must be the length of the first transformed dimension in the forward transform. It is used in particular when transform = RFFT(), to determine the length of the inverse (complex-to-real) transform. See the AbstractFFTs.irfft docs for details.
The backwards transform returned by this function is not normalised. The normalisation factor for a given array can be obtained by calling scale_factor.
Like AbstractFFTs.bfft, this transform is not normalised. To obtain the inverse transform, divide the output by the length of the transformed dimension.
To obtain the inverse transform, divide the output by the length of the transformed dimension (of the real output array).
As described in the AbstractFFTs docs, the length of the output cannot be fully inferred from the input length. For this reason, the BRFFT constructor accepts an optional d argument indicating the output length.
For multidimensional datasets, a tuple of dimensions (d1, d2, ..., dN) may also be passed. This is equivalent to passing just dN. In this case, the last dimension (dN) is the one that changes size between the input and output. Note that this is the opposite of FFTW.brfft. The reason is that, in PencilFFTs, the last dimension is the one along which a complex-to-real transform is performed.
plan(transform::AbstractTransform, A, [dims];
+ flags=FFTW.ESTIMATE, timelimit=Inf)
Create plan to transform array A along dimensions dims.
If dims is not specified, all dimensions of A are transformed.
For FFT plans, this function wraps the AbstractFFTs.jl and FFTW.jl plan creation functions. For more details on the function arguments, see AbstractFFTs.plan_fft.
Returns the backwards transform associated to the given transform.
The second argument must be the length of the first transformed dimension in the forward transform. It is used in particular when transform = RFFT(), to determine the length of the inverse (complex-to-real) transform. See the AbstractFFTs.irfft docs for details.
The backwards transform returned by this function is not normalised. The normalisation factor for a given array can be obtained by calling scale_factor.
Returns the length of the transform output, given the length of its input.
The input and output lengths are specified in terms of the respective input and output datatypes. For instance, for real-to-complex transforms, these are respectively the length of input real data and of output complex data.
Returns the length of the transform output, given the length of its input.
The input and output lengths are specified in terms of the respective input and output datatypes. For instance, for real-to-complex transforms, these are respectively the length of input real data and of output complex data.
This document was generated with Documenter.jl version 1.7.0 on Friday 11 October 2024. Using Julia version 1.11.0.
diff --git a/dev/assets/documenter.js b/dev/assets/documenter.js
index 6adfbbbf..82252a11 100644
--- a/dev/assets/documenter.js
+++ b/dev/assets/documenter.js
@@ -1,15 +1,15 @@
// Generated by Documenter.jl
requirejs.config({
paths: {
- 'highlight-julia': 'https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.5.1/languages/julia.min',
+ 'highlight-julia': 'https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/languages/julia.min',
'headroom': 'https://cdnjs.cloudflare.com/ajax/libs/headroom/0.12.0/headroom.min',
- 'jqueryui': 'https://cdnjs.cloudflare.com/ajax/libs/jqueryui/1.12.1/jquery-ui.min',
- 'katex-auto-render': 'https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.24/contrib/auto-render.min',
- 'jquery': 'https://cdnjs.cloudflare.com/ajax/libs/jquery/3.6.0/jquery.min',
+ 'jqueryui': 'https://cdnjs.cloudflare.com/ajax/libs/jqueryui/1.13.2/jquery-ui.min',
+ 'katex-auto-render': 'https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/contrib/auto-render.min',
+ 'jquery': 'https://cdnjs.cloudflare.com/ajax/libs/jquery/3.7.0/jquery.min',
'headroom-jquery': 'https://cdnjs.cloudflare.com/ajax/libs/headroom/0.12.0/jQuery.headroom.min',
- 'katex': 'https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.13.24/katex.min',
- 'highlight': 'https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.5.1/highlight.min',
- 'highlight-julia-repl': 'https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.5.1/languages/julia-repl.min',
+ 'katex': 'https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.8/katex.min',
+ 'highlight': 'https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/highlight.min',
+ 'highlight-julia-repl': 'https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/languages/julia-repl.min',
},
shim: {
"highlight-julia": {
@@ -70,13 +70,96 @@ $(document).ready(function() {
hljs.highlightAll();
})
+})
+////////////////////////////////////////////////////////////////////////////////
+require(['jquery'], function($) {
+
+let timer = 0;
+var isExpanded = true;
+
+$(document).on(
+ "click",
+ ".docstring .docstring-article-toggle-button",
+ function () {
+ let articleToggleTitle = "Expand docstring";
+ const parent = $(this).parent();
+
+ debounce(() => {
+ if (parent.siblings("section").is(":visible")) {
+ parent
+ .find("a.docstring-article-toggle-button")
+ .removeClass("fa-chevron-down")
+ .addClass("fa-chevron-right");
+ } else {
+ parent
+ .find("a.docstring-article-toggle-button")
+ .removeClass("fa-chevron-right")
+ .addClass("fa-chevron-down");
+
+ articleToggleTitle = "Collapse docstring";
+ }
+
+ parent
+ .children(".docstring-article-toggle-button")
+ .prop("title", articleToggleTitle);
+ parent.siblings("section").slideToggle();
+ });
+ }
+);
+
+$(document).on("click", ".docs-article-toggle-button", function (event) {
+ let articleToggleTitle = "Expand docstring";
+ let navArticleToggleTitle = "Expand all docstrings";
+ let animationSpeed = event.noToggleAnimation ? 0 : 400;
+
+ debounce(() => {
+ if (isExpanded) {
+ $(this).removeClass("fa-chevron-up").addClass("fa-chevron-down");
+ $("a.docstring-article-toggle-button")
+ .removeClass("fa-chevron-down")
+ .addClass("fa-chevron-right");
+
+ isExpanded = false;
+
+ $(".docstring section").slideUp(animationSpeed);
+ } else {
+ $(this).removeClass("fa-chevron-down").addClass("fa-chevron-up");
+ $("a.docstring-article-toggle-button")
+ .removeClass("fa-chevron-right")
+ .addClass("fa-chevron-down");
+
+ isExpanded = true;
+ articleToggleTitle = "Collapse docstring";
+ navArticleToggleTitle = "Collapse all docstrings";
+
+ $(".docstring section").slideDown(animationSpeed);
+ }
+
+ $(this).prop("title", navArticleToggleTitle);
+ $(".docstring-article-toggle-button").prop("title", articleToggleTitle);
+ });
+});
+
+function debounce(callback, timeout = 300) {
+ if (Date.now() - timer > timeout) {
+ callback();
+ }
+
+ clearTimeout(timer);
+
+ timer = Date.now();
+}
+
})
////////////////////////////////////////////////////////////////////////////////
require([], function() {
function addCopyButtonCallbacks() {
for (const el of document.getElementsByTagName("pre")) {
const button = document.createElement("button");
- button.classList.add("copy-button", "fas", "fa-copy");
+ button.classList.add("copy-button", "fa-solid", "fa-copy");
+ button.setAttribute("aria-label", "Copy this code block");
+ button.setAttribute("title", "Copy");
+
el.appendChild(button);
const success = function () {
@@ -85,7 +168,7 @@ function addCopyButtonCallbacks() {
};
const failure = function () {
- button.classList.add("error", "fa-times");
+ button.classList.add("error", "fa-xmark");
button.classList.remove("fa-copy");
};
@@ -94,7 +177,7 @@ function addCopyButtonCallbacks() {
setTimeout(function () {
button.classList.add("fa-copy");
- button.classList.remove("success", "fa-check", "fa-times");
+ button.classList.remove("success", "fa-check", "fa-xmark");
}, 5000);
});
}
@@ -138,29 +221,584 @@ require(['jquery', 'headroom', 'headroom-jquery'], function($, Headroom) {
// Manages the top navigation bar (hides it when the user starts scrolling down on the
// mobile).
window.Headroom = Headroom; // work around buggy module loading?
-$(document).ready(function() {
- $('#documenter .docs-navbar').headroom({
- "tolerance": {"up": 10, "down": 10},
+$(document).ready(function () {
+ $("#documenter .docs-navbar").headroom({
+ tolerance: { up: 10, down: 10 },
});
+});
+
+})
+////////////////////////////////////////////////////////////////////////////////
+require(['jquery'], function($) {
+
+$(document).ready(function () {
+ let meta = $("div[data-docstringscollapsed]").data();
+
+ if (meta?.docstringscollapsed) {
+ $("#documenter-article-toggle-button").trigger({
+ type: "click",
+ noToggleAnimation: true,
+ });
+ }
+});
+
})
+////////////////////////////////////////////////////////////////////////////////
+require(['jquery'], function($) {
+
+/*
+To get an in-depth about the thought process you can refer: https://hetarth02.hashnode.dev/series/gsoc
+
+PSEUDOCODE:
+
+Searching happens automatically as the user types or adjusts the selected filters.
+To preserve responsiveness, as much as possible of the slow parts of the search are done
+in a web worker. Searching and result generation are done in the worker, and filtering and
+DOM updates are done in the main thread. The filters are in the main thread as they should
+be very quick to apply. This lets filters be changed without re-searching with minisearch
+(which is possible even if filtering is on the worker thread) and also lets filters be
+changed _while_ the worker is searching and without message passing (neither of which are
+possible if filtering is on the worker thread)
+
+SEARCH WORKER:
+
+Import minisearch
+
+Build index
+
+On message from main thread
+ run search
+ find the first 200 unique results from each category, and compute their divs for display
+ note that this is necessary and sufficient information for the main thread to find the
+ first 200 unique results from any given filter set
+ post results to main thread
+
+MAIN:
+
+Launch worker
+
+Declare nonconstant globals (worker_is_running, last_search_text, unfiltered_results)
+
+On text update
+ if worker is not running, launch_search()
+
+launch_search
+ set worker_is_running to true, set last_search_text to the search text
+ post the search query to worker
+
+on message from worker
+ if last_search_text is not the same as the text in the search field,
+ the latest search result is not reflective of the latest search query, so update again
+ launch_search()
+ otherwise
+ set worker_is_running to false
+
+ regardless, display the new search results to the user
+ save the unfiltered_results as a global
+ update_search()
+
+on filter click
+ adjust the filter selection
+ update_search()
+
+update_search
+ apply search filters by looping through the unfiltered_results and finding the first 200
+ unique results that match the filters
+
+ Update the DOM
+*/
+
+/////// SEARCH WORKER ///////
+
+function worker_function(documenterSearchIndex, documenterBaseURL, filters) {
+ importScripts(
+ "https://cdn.jsdelivr.net/npm/minisearch@6.1.0/dist/umd/index.min.js"
+ );
+
+ let data = documenterSearchIndex.map((x, key) => {
+ x["id"] = key; // minisearch requires a unique for each object
+ return x;
+ });
+
+ // list below is the lunr 2.1.3 list minus the intersect with names(Base)
+ // (all, any, get, in, is, only, which) and (do, else, for, let, where, while, with)
+ // ideally we'd just filter the original list but it's not available as a variable
+ const stopWords = new Set([
+ "a",
+ "able",
+ "about",
+ "across",
+ "after",
+ "almost",
+ "also",
+ "am",
+ "among",
+ "an",
+ "and",
+ "are",
+ "as",
+ "at",
+ "be",
+ "because",
+ "been",
+ "but",
+ "by",
+ "can",
+ "cannot",
+ "could",
+ "dear",
+ "did",
+ "does",
+ "either",
+ "ever",
+ "every",
+ "from",
+ "got",
+ "had",
+ "has",
+ "have",
+ "he",
+ "her",
+ "hers",
+ "him",
+ "his",
+ "how",
+ "however",
+ "i",
+ "if",
+ "into",
+ "it",
+ "its",
+ "just",
+ "least",
+ "like",
+ "likely",
+ "may",
+ "me",
+ "might",
+ "most",
+ "must",
+ "my",
+ "neither",
+ "no",
+ "nor",
+ "not",
+ "of",
+ "off",
+ "often",
+ "on",
+ "or",
+ "other",
+ "our",
+ "own",
+ "rather",
+ "said",
+ "say",
+ "says",
+ "she",
+ "should",
+ "since",
+ "so",
+ "some",
+ "than",
+ "that",
+ "the",
+ "their",
+ "them",
+ "then",
+ "there",
+ "these",
+ "they",
+ "this",
+ "tis",
+ "to",
+ "too",
+ "twas",
+ "us",
+ "wants",
+ "was",
+ "we",
+ "were",
+ "what",
+ "when",
+ "who",
+ "whom",
+ "why",
+ "will",
+ "would",
+ "yet",
+ "you",
+ "your",
+ ]);
+
+ let index = new MiniSearch({
+ fields: ["title", "text"], // fields to index for full-text search
+ storeFields: ["location", "title", "text", "category", "page"], // fields to return with results
+ processTerm: (term) => {
+ let word = stopWords.has(term) ? null : term;
+ if (word) {
+ // custom trimmer that doesn't strip @ and !, which are used in julia macro and function names
+ word = word
+ .replace(/^[^a-zA-Z0-9@!]+/, "")
+ .replace(/[^a-zA-Z0-9@!]+$/, "");
+
+ word = word.toLowerCase();
+ }
+
+ return word ?? null;
+ },
+ // add . as a separator, because otherwise "title": "Documenter.Anchors.add!", would not
+ // find anything if searching for "add!", only for the entire qualification
+ tokenize: (string) => string.split(/[\s\-\.]+/),
+ // options which will be applied during the search
+ searchOptions: {
+ prefix: true,
+ boost: { title: 100 },
+ fuzzy: 2,
+ },
+ });
+
+ index.addAll(data);
+
+ /**
+ * Used to map characters to HTML entities.
+ * Refer: https://github.com/lodash/lodash/blob/main/src/escape.ts
+ */
+ const htmlEscapes = {
+ "&": "&",
+ "<": "<",
+ ">": ">",
+ '"': """,
+ "'": "'",
+ };
+
+ /**
+ * Used to match HTML entities and HTML characters.
+ * Refer: https://github.com/lodash/lodash/blob/main/src/escape.ts
+ */
+ const reUnescapedHtml = /[&<>"']/g;
+ const reHasUnescapedHtml = RegExp(reUnescapedHtml.source);
+
+ /**
+ * Escape function from lodash
+ * Refer: https://github.com/lodash/lodash/blob/main/src/escape.ts
+ */
+ function escape(string) {
+ return string && reHasUnescapedHtml.test(string)
+ ? string.replace(reUnescapedHtml, (chr) => htmlEscapes[chr])
+ : string || "";
+ }
+
+ /**
+ * RegX escape function from MDN
+ * Refer: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping
+ */
+ function escapeRegExp(string) {
+ return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string
+ }
+
+ /**
+ * Make the result component given a minisearch result data object and the value
+ * of the search input as queryString. To view the result object structure, refer:
+ * https://lucaong.github.io/minisearch/modules/_minisearch_.html#searchresult
+ *
+ * @param {object} result
+ * @param {string} querystring
+ * @returns string
+ */
+ function make_search_result(result, querystring) {
+ let search_divider = ``;
+ let display_link =
+ result.location.slice(Math.max(0), Math.min(50, result.location.length)) +
+ (result.location.length > 30 ? "..." : ""); // To cut-off the link because it messes with the overflow of the whole div
+
+ if (result.page !== "") {
+ display_link += ` (${result.page})`;
+ }
+ searchstring = escapeRegExp(querystring);
+ let textindex = new RegExp(`${searchstring}`, "i").exec(result.text);
+ let text =
+ textindex !== null
+ ? result.text.slice(
+ Math.max(textindex.index - 100, 0),
+ Math.min(
+ textindex.index + querystring.length + 100,
+ result.text.length
+ )
+ )
+ : ""; // cut-off text before and after from the match
+
+ text = text.length ? escape(text) : "";
+
+ let display_result = text.length
+ ? "..." +
+ text.replace(
+ new RegExp(`${escape(searchstring)}`, "i"), // For first occurrence
+ '$&'
+ ) +
+ "..."
+ : ""; // highlights the match
+
+ let in_code = false;
+ if (!["page", "section"].includes(result.category.toLowerCase())) {
+ in_code = true;
+ }
+
+ // We encode the full url to escape some special characters which can lead to broken links
+ let result_div = `
+
+
The performance of PencilFFTs.jl is comparable to that of other open-source parallel FFT libraries implemented in lower-level languages. Below, we show comparisons with the Fortran implementation of P3DFFT, possibly the most popular of these libraries. The benchmarks were performed on the Jean–Zay cluster of the IDRIS French computing centre (CNRS).
The figure below shows strong scaling benchmarks of 3D real-to-complex FFTs using 2D ("pencil") decomposition. The benchmarks were run for input arrays of dimensions $N_x × N_y × N_z = 512^3$, $1024^3$ and $2048^3$. Each timing is averaged over 100 repetitions.
The performance of PencilFFTs.jl is comparable to that of other open-source parallel FFT libraries implemented in lower-level languages. Below, we show comparisons with the Fortran implementation of P3DFFT, possibly the most popular of these libraries. The benchmarks were performed on the Jean–Zay cluster of the IDRIS French computing centre (CNRS).
The figure below shows strong scaling benchmarks of 3D real-to-complex FFTs using 2D ("pencil") decomposition. The benchmarks were run for input arrays of dimensions $N_x × N_y × N_z = 512^3$, $1024^3$ and $2048^3$. Each timing is averaged over 100 repetitions.
\n \n
","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"By default, the domain is distributed on a 2D MPI topology of dimensions N_1 N_2. As an example, the above figure shows such a topology with N_1 = 4 and N_2 = 3, for a total of 12 MPI processes.","category":"page"},{"location":"tutorial/#tutorial:creating_plans","page":"Tutorial","title":"Creating plans","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The first thing to do is to create a domain decomposition configuration for the given dataset dimensions N_x N_y N_z. In the framework of PencilArrays, such a configuration is described by a Pencil object. As described in the PencilArrays docs, we can let the Pencil constructor automatically determine such a configuration. For this, only an MPI communicator and the dataset dimensions are needed:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"using MPI\nusing PencilFFTs\n\nMPI.Init()\ncomm = MPI.COMM_WORLD\n\n# Input data dimensions (Nx × Ny × Nz)\ndims = (16, 32, 64)\npen = Pencil(dims, comm)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"By default this creates a 2D decomposition (for the case of a 3D dataset), but one can change this as detailed in the PencilArrays documentation linked above.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"We can now create a PencilFFTPlan, which requires information on decomposition configuration (the Pencil object) and on the transforms that will be applied:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# Apply a 3D real-to-complex (r2c) FFT.\ntransform = Transforms.RFFT()\n\n# Note that, for more control, one can instead separately specify the transforms along each dimension:\n# transform = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT())\n\n# Create plan\nplan = PencilFFTPlan(pen, transform)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"See the PencilFFTPlan constructor for details on the accepted options, and the Transforms module for the possible transforms. It is also possible to enable fine-grained performance measurements via the TimerOutputs package, as described in Measuring performance.","category":"page"},{"location":"tutorial/#Allocating-data","page":"Tutorial","title":"Allocating data","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Next, we want to apply the plan on some data. Transforms may only be applied on PencilArrays, which are array wrappers that include MPI decomposition information (in some sense, analogous to DistributedArrays in Julia's distributed computing approach). The helper function allocate_input can be used to allocate a PencilArray that is compatible with our plan:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# In our example, this returns a 3D PencilArray of real data (Float64).\nu = allocate_input(plan)\n\n# Fill the array with some (random) data\nusing Random\nrandn!(u)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"PencilArrays are a subtype of AbstractArray, and thus they support all common array operations.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Similarly, to preallocate output data, one can use allocate_output:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# In our example, this returns a 3D PencilArray of complex data (Complex{Float64}).\nv = allocate_output(plan)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"This is only required if one wants to apply the plans using a preallocated output (with mul!, see right below).","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The data types returned by allocate_input and allocate_output are slightly different when working with in-place transforms. See the in-place example for details.","category":"page"},{"location":"tutorial/#Applying-plans","page":"Tutorial","title":"Applying plans","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The interface to apply plans is consistent with that of AbstractFFTs. Namely, * and mul! are respectively used for forward transforms without and with preallocated output data. Similarly, \\ and ldiv! are used for backward transforms.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"using LinearAlgebra # for mul!, ldiv!\n\n# Apply plan on `u` with `v` as an output\nmul!(v, plan, u)\n\n# Apply backward plan on `v` with `w` as an output\nw = similar(u)\nldiv!(w, plan, v) # now w ≈ u","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Note that, consistently with AbstractFFTs, normalisation is performed at the end of a backward transform, so that the original data is recovered when applying a forward followed by a backward transform.","category":"page"},{"location":"tutorial/#Accessing-and-modifying-data","page":"Tutorial","title":"Accessing and modifying data","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For any given MPI process, a PencilArray holds the data associated to its local partition in the global geometry. PencilArrays are accessed using local indices that start at 1, regardless of the location of the local process in the MPI topology. Note that PencilArrays, being based on regular Arrays, support both linear and Cartesian indexing (see the Julia docs for details).","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For convenience, the global_view function can be used to generate an OffsetArray wrapper that takes global indices.","category":"page"},{"location":"tutorial/#tutorial:output_data_layout","page":"Tutorial","title":"Output data layout","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"In memory, the dimensions of the transform output are by default reversed with respect to the input. That is, if the order of indices in the input data is (x, y, z), then the output has order (z, y, x) in memory. This detail is hidden from the user, and output arrays are always accessed in the same order as the input data, regardless of the underlying output dimension permutation. This applies to PencilArrays and to OffsetArrays returned by global_view.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The reasoning behind dimension permutations, is that they allow to always perform FFTs along the fastest array dimension and to avoid a local data transposition, resulting in performance gains. A similar approach is followed by other parallel FFT libraries. FFTW itself, in its distributed-memory routines, includes a flag that enables a similar behaviour. In PencilFFTs, index permutation is the default, but it can be disabled via the permute_dims flag of PencilFFTPlan.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"A great deal of work has been spent in making generic index permutations as efficient as possible, both in intermediate and in the output state of the multidimensional transforms. This has been achieved, in part, by making sure that permutations such as (3, 2, 1) are compile-time constants.","category":"page"},{"location":"tutorial/#Further-reading","page":"Tutorial","title":"Further reading","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For details on working with PencilArrays see the PencilArrays docs.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The examples on the sidebar further illustrate the use of transforms and provide an introduction to working with MPI-distributed data in the form of PencilArrays. In particular, the gradient example illustrates different ways of computing things using Fourier-transformed distributed arrays. Then, the incompressible Navier–Stokes example is a more advanced and complete example of a possible application of the PencilFFTs package.","category":"page"},{"location":"benchmarks/#Benchmarks","page":"Benchmarks","title":"Benchmarks","text":"","category":"section"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The performance of PencilFFTs.jl is comparable to that of other open-source parallel FFT libraries implemented in lower-level languages. Below, we show comparisons with the Fortran implementation of P3DFFT, possibly the most popular of these libraries. The benchmarks were performed on the Jean–Zay cluster of the IDRIS French computing centre (CNRS).","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The figure below shows strong scaling benchmarks of 3D real-to-complex FFTs using 2D (\"pencil\") decomposition. The benchmarks were run for input arrays of dimensions N_x N_y N_z = 512^3, 1024^3 and 2048^3. Each timing is averaged over 100 repetitions.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"
\n \n \n
","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"As seen above, PencilFFTs generally outperforms P3DFFT in its default setting. This is largely explained by the choice of using non-blocking point-to-point MPI communications (via MPI_Isend and MPI_Irecv), while P3DFFT uses collective MPI_Alltoallv calls. This enables PencilFFTs to perform data reordering operations on the partially received data while waiting for the incoming data, leading to better performance. Moreover, in contrast with P3DFFT, the high performance and scalability of PencilFFTs results from a highly generic code, handling decompositions in arbitrary dimensions and a relatively large (and extensible) variety of transformations.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"Note that PencilFFTs can optionally use collective communications (using MPI_Alltoallv) instead of point-to-point communications. For details, see the docs for PencilFFTPlan and for PencilArray transpositions. As seen above, collective communications generally perform worse than point-to-point ones, and runtimes are nearly indistinguishable from those of P3DFFT.","category":"page"},{"location":"benchmarks/#Benchmark-details","page":"Benchmarks","title":"Benchmark details","text":"","category":"section"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The benchmarks were performed using Julia 1.7-beta3 and Intel MPI 2019. We used PencilFFTs v0.12.5 with FFTW.jl v1.4.3 and MPI.jl v0.19.0. We used the Fortran implementation of P3DFFT, version 2.7.6, which was built with Intel 2019 compilers and linked to FFTW 3.3.8. The cluster where the benchmarks were run has Intel Cascade Lake 6248 processors with 2×20 cores per node.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The number of MPI processes along each decomposed dimension, P_1 and P_2, was automatically determined by a call to MPI_Dims_create, which tends to create a balanced decomposition with P_1 P_2. For instance, a total of 1024 processes is divided into P_1 = P_2 = 32. Different results may be obtained with other combinations, but this was not benchmarked.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The source files used to generate this benchmark, as well as the raw benchmark results, are all available in the PencilFFTs repo.","category":"page"},{"location":"GlobalFFTParams/#Global-FFT-parameters","page":"Global FFT parameters","title":"Global FFT parameters","text":"","category":"section"},{"location":"GlobalFFTParams/","page":"Global FFT parameters","title":"Global FFT parameters","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"GlobalFFTParams/","page":"Global FFT parameters","title":"Global FFT parameters","text":"GlobalFFTParams","category":"page"},{"location":"GlobalFFTParams/#PencilFFTs.GlobalFFTParams","page":"Global FFT parameters","title":"PencilFFTs.GlobalFFTParams","text":"GlobalFFTParams{T, N, inplace}\n\nSpecifies the global parameters for an N-dimensional distributed transform. These include the element type T and global data sizes of input and output data, as well as the transform types to be performed along each dimension.\n\n\n\nGlobalFFTParams(size_global, transforms, [real_type=Float64])\n\nDefine parameters for N-dimensional transform.\n\ntransforms must be a tuple of length N specifying the transforms to be applied along each dimension. Each element must be a subtype of Transforms.AbstractTransform. For all the possible transforms, see Transform types.\n\nThe element type must be a real type accepted by FFTW, i.e. either Float32 or Float64.\n\nNote that the transforms are applied one dimension at a time, with the leftmost dimension first for forward transforms.\n\nExample\n\nTo perform a 3D FFT of real data, first a real-to-complex FFT must be applied along the first dimension, followed by two complex-to-complex FFTs along the other dimensions:\n\njulia> size_global = (64, 32, 128); # size of real input data\n\njulia> transforms = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT());\n\njulia> fft_params = PencilFFTs.GlobalFFTParams(size_global, transforms)\nTransforms: (RFFT, FFT, FFT)\nInput type: Float64\nGlobal dimensions: (64, 32, 128) -> (33, 32, 128)\n\n\n\n\n\n","category":"type"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/in-place.jl\"","category":"page"},{"location":"generated/in-place/#In-place-transforms","page":"In-place transforms","title":"In-place transforms","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Complex-to-complex and real-to-real transforms can be performed in-place, enabling important memory savings. The procedure is very similar to that of out-of-place transforms described in the tutorial. The differences are illustrated in the sections below.","category":"page"},{"location":"generated/in-place/#Creating-a-domain-partition","page":"In-place transforms","title":"Creating a domain partition","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"We start by partitioning a domain of dimensions 163264 along all available MPI processes.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"using PencilFFTs\nusing MPI\nMPI.Init()\n\ndims_global = (16, 32, 64) # global dimensions","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Such a partitioning is described by a Pencil object. Here we choose to decompose the domain along the last two dimensions. In this case, the actual number of processes along each of these dimensions is chosen automatically.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"decomp_dims = (2, 3)\ncomm = MPI.COMM_WORLD\npen = Pencil(dims_global, decomp_dims, comm)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"warning: Allowed decompositions\nDistributed transforms using PencilFFTs.jl require that the first dimension is not decomposed. In other words, if one wants to perform transforms, then decomp_dims above must not contain 1.","category":"page"},{"location":"generated/in-place/#Creating-in-place-plans","page":"In-place transforms","title":"Creating in-place plans","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"To create an in-place plan, pass an in-place transform such as Transforms.FFT! or Transforms.R2R! to PencilFFTPlan. For instance:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"# Perform a 3D in-place complex-to-complex FFT.\ntransform = Transforms.FFT!()\n\n# Note that one can also combine different types of in-place transforms.\n# For instance:\n# transform = (\n# Transforms.R2R!(FFTW.REDFT01),\n# Transforms.FFT!(),\n# Transforms.R2R!(FFTW.DHT),\n# )","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"We can now create a distributed plan from the previously-created domain partition and the chosen transform.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan = PencilFFTPlan(pen, transform)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Note that in-place real-to-complex transforms are not currently supported. (In other words, the RFFT! transform type is not defined.)","category":"page"},{"location":"generated/in-place/#Allocating-data","page":"In-place transforms","title":"Allocating data","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"As with out-of-place plans, data should be allocated using allocate_input. The difference is that, for in-place plans, this function returns a ManyPencilArray object, which is a container holding multiple PencilArray views sharing the same memory space.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"# Allocate data for the plan.\n# Since `plan` is in-place, this returns a `ManyPencilArray` container.\nA = allocate_input(plan)\nsummary(A)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Note that allocate_output also works for in-place plans. In this case, it returns exactly the same thing as allocate_input.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"As shown in the next section, in-place plans must be applied on the returned ManyPencilArray. On the other hand, one usually wants to access and modify data, and for this one needs the PencilArray views contained in the ManyPencilArray. The input and output array views can be obtained by calling first(::ManyPencilArray) and last(::ManyPencilArray).","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"For instance, we can initialise the input array with some data before transforming:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"using Random\nu_in = first(A) # input data view\nrandn!(u_in)\nsummary(u_in)","category":"page"},{"location":"generated/in-place/#Applying-plans","page":"In-place transforms","title":"Applying plans","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Like in FFTW.jl, one can perform in-place transforms using the * and \\ operators. As mentioned above, in-place plans must be applied on the ManyPencilArray containers returned by allocate_input.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan * A; # performs in-place forward transform\nnothing #hide","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"After performing an in-place transform, data contained in u_in has been overwritten and has no \"physical\" meaning. In other words, u_in should not be used at this point. To access the transformed data, one should retrieve the output data view using last(A).","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"For instance, to compute the global sum of the transformed data:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"u_out = last(A) # output data view\nsum(u_out) # sum of transformed data (note that `sum` reduces over all processes)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Finally, we can perform a backward transform and do stuff with the input view:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan \\ A; # perform in-place backward transform\nnothing #hide","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"At this point, the data can be once again found in the input view u_in, while u_out should not be accessed.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"This page was generated using Literate.jl.","category":"page"},{"location":"Transforms/#Available-transforms","page":"Available transforms","title":"Available transforms","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"CurrentModule = PencilFFTs.Transforms","category":"page"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"Transforms","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms","page":"Available transforms","title":"PencilFFTs.Transforms","text":"Defines different one-dimensional FFT-based transforms.\n\nThe transforms are all subtypes of an AbstractTransform type.\n\nWhen possible, the names of the transforms are kept consistent with the functions exported by AbstractFFTs.jl and FFTW.jl.\n\n\n\n\n\n","category":"module"},{"location":"Transforms/#Transform-types","page":"Available transforms","title":"Transform types","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"FFT\nFFT!\nBFFT\nBFFT!\n\nRFFT\nBRFFT\n\nR2R\nR2R!\n\nNoTransform\nNoTransform!","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.FFT","page":"Available transforms","title":"PencilFFTs.Transforms.FFT","text":"FFT()\n\nComplex-to-complex FFT.\n\nSee also AbstractFFTs.fft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.FFT!","page":"Available transforms","title":"PencilFFTs.Transforms.FFT!","text":"FFT!()\n\nIn-place version of FFT.\n\nSee also AbstractFFTs.fft!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BFFT","page":"Available transforms","title":"PencilFFTs.Transforms.BFFT","text":"BFFT()\n\nUnnormalised backward complex-to-complex FFT.\n\nLike AbstractFFTs.bfft, this transform is not normalised. To obtain the inverse transform, divide the output by the length of the transformed dimension.\n\nSee also AbstractFFTs.bfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BFFT!","page":"Available transforms","title":"PencilFFTs.Transforms.BFFT!","text":"BFFT()\n\nIn-place version of BFFT.\n\nSee also AbstractFFTs.bfft!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.RFFT","page":"Available transforms","title":"PencilFFTs.Transforms.RFFT","text":"RFFT()\n\nReal-to-complex FFT.\n\nSee also AbstractFFTs.rfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BRFFT","page":"Available transforms","title":"PencilFFTs.Transforms.BRFFT","text":"BRFFT(d::Integer)\nBRFFT((d1, d2, ..., dN))\n\nUnnormalised inverse of RFFT.\n\nTo obtain the inverse transform, divide the output by the length of the transformed dimension (of the real output array).\n\nAs described in the AbstractFFTs docs, the length of the output cannot be fully inferred from the input length. For this reason, the BRFFT constructor accepts an optional d argument indicating the output length.\n\nFor multidimensional datasets, a tuple of dimensions (d1, d2, ..., dN) may also be passed. This is equivalent to passing just dN. In this case, the last dimension (dN) is the one that changes size between the input and output. Note that this is the opposite of FFTW.brfft. The reason is that, in PencilFFTs, the last dimension is the one along which a complex-to-real transform is performed.\n\nSee also AbstractFFTs.brfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.R2R","page":"Available transforms","title":"PencilFFTs.Transforms.R2R","text":"R2R(kind)\n\nReal-to-real transform of type kind.\n\nThe possible values of kind are those described in the FFTW.r2r docs and the FFTW manual:\n\ndiscrete cosine transforms: FFTW.REDFT00, FFTW.REDFT01, FFTW.REDFFT10, FFTW.REDFFT11\ndiscrete sine transforms: FFTW.RODFT00, FFTW.RODFT01, FFTW.RODFFT10, FFTW.RODFFT11\ndiscrete Hartley transform: FFTW.DHT\n\nNote: half-complex format DFTs (FFTW.R2HC, FFTW.HC2R) are not currently supported.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.R2R!","page":"Available transforms","title":"PencilFFTs.Transforms.R2R!","text":"R2R!(kind)\n\nIn-place version of R2R.\n\nSee also FFTW.r2r!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.NoTransform","page":"Available transforms","title":"PencilFFTs.Transforms.NoTransform","text":"NoTransform()\n\nIdentity transform.\n\nSpecifies that no transformation should be applied.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.NoTransform!","page":"Available transforms","title":"PencilFFTs.Transforms.NoTransform!","text":"NoTransform!()\n\nIn-place version of NoTransform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#Internals","page":"Available transforms","title":"Internals","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"What follows is used internally in PencilFFTs.","category":"page"},{"location":"Transforms/#Types","page":"Available transforms","title":"Types","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"AbstractCustomPlan\nAbstractTransform\nIdentityPlan\nIdentityPlan!\nPlan","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.AbstractCustomPlan","page":"Available transforms","title":"PencilFFTs.Transforms.AbstractCustomPlan","text":"AbstractCustomPlan\n\nAbstract type defining a custom plan, to be used as an alternative to FFTW plans (FFTW.FFTWPlan).\n\nThe only custom plan defined in this module is IdentityPlan. The user can define other custom plans that are also subtypes of AbstractCustomPlan.\n\nNote that plan returns a subtype of either AbstractFFTs.Plan or AbstractCustomPlan.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.AbstractTransform","page":"Available transforms","title":"PencilFFTs.Transforms.AbstractTransform","text":"AbstractTransform\n\nSpecifies a one-dimensional FFT-based transform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.IdentityPlan","page":"Available transforms","title":"PencilFFTs.Transforms.IdentityPlan","text":"IdentityPlan\n\nType of plan associated to NoTransform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.IdentityPlan!","page":"Available transforms","title":"PencilFFTs.Transforms.IdentityPlan!","text":"IdentityPlan!\n\nType of plan associated to NoTransform!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.Plan","page":"Available transforms","title":"PencilFFTs.Transforms.Plan","text":"Plan = Union{AbstractFFTs.Plan, AbstractCustomPlan}\n\nUnion type representing any plan returned by plan.\n\nSee also AbstractCustomPlan.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#Functions","page":"Available transforms","title":"Functions","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"plan\n\nbinv\nscale_factor\n\neltype_input\neltype_output\nexpand_dims\nis_inplace\nkind\nlength_output","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.plan","page":"Available transforms","title":"PencilFFTs.Transforms.plan","text":"plan(transform::AbstractTransform, A, [dims];\n flags=FFTW.ESTIMATE, timelimit=Inf)\n\nCreate plan to transform array A along dimensions dims.\n\nIf dims is not specified, all dimensions of A are transformed.\n\nFor FFT plans, this function wraps the AbstractFFTs.jl and FFTW.jl plan creation functions. For more details on the function arguments, see AbstractFFTs.plan_fft.\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.binv","page":"Available transforms","title":"PencilFFTs.Transforms.binv","text":"binv(transform::AbstractTransform, d::Integer)\n\nReturns the backwards transform associated to the given transform.\n\nThe second argument must be the length of the first transformed dimension in the forward transform. It is used in particular when transform = RFFT(), to determine the length of the inverse (complex-to-real) transform. See the AbstractFFTs.irfft docs for details.\n\nThe backwards transform returned by this function is not normalised. The normalisation factor for a given array can be obtained by calling scale_factor.\n\nExample\n\njulia> binv(Transforms.FFT(), 42)\nBFFT\n\njulia> binv(Transforms.BRFFT(9), 42)\nRFFT\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.scale_factor","page":"Available transforms","title":"PencilFFTs.Transforms.scale_factor","text":"scale_factor(transform::AbstractTransform, A, [dims = 1:ndims(A)])\n\nGet factor required to normalise the given array after a transformation along dimensions dims (all dimensions by default).\n\nThe array A must have the dimensions of the transform input.\n\nImportant: the dimensions dims must be the same that were passed to plan.\n\nExamples\n\njulia> C = zeros(ComplexF32, 3, 4, 5);\n\njulia> scale_factor(Transforms.FFT(), C)\n60\n\njulia> scale_factor(Transforms.BFFT(), C)\n60\n\njulia> scale_factor(Transforms.BFFT(), C, 2:3)\n20\n\njulia> R = zeros(Float64, 3, 4, 5);\n\njulia> scale_factor(Transforms.RFFT(), R, 2)\n4\n\njulia> scale_factor(Transforms.RFFT(), R, 2:3)\n20\n\njulia> scale_factor(Transforms.BRFFT(8), C)\n96\n\njulia> scale_factor(Transforms.BRFFT(9), C)\n108\n\nThis will fail because the input of RFFT is real, and R is a complex array:\n\njulia> scale_factor(Transforms.RFFT(), C, 2:3)\nERROR: MethodError: no method matching scale_factor(::PencilFFTs.Transforms.RFFT, ::Array{ComplexF32, 3}, ::UnitRange{Int64})\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.eltype_input","page":"Available transforms","title":"PencilFFTs.Transforms.eltype_input","text":"eltype_input(transform::AbstractTransform, real_type<:AbstractFloat)\n\nDetermine input data type for a given transform given the floating point precision of the input data.\n\nSome transforms, such as R2R and NoTransform, can take both real and complex data. For those kinds of transforms, nothing is returned.\n\nExample\n\njulia> eltype_input(Transforms.FFT(), Float32)\nComplexF32 (alias for Complex{Float32})\n\njulia> eltype_input(Transforms.RFFT(), Float64)\nFloat64\n\njulia> eltype_input(Transforms.R2R(FFTW.REDFT01), Float64) # nothing\n\njulia> eltype_input(Transforms.NoTransform(), Float64) # nothing\n\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.eltype_output","page":"Available transforms","title":"PencilFFTs.Transforms.eltype_output","text":"eltype_output(transform::AbstractTransform, eltype_input)\n\nReturns the output data type for a given transform given the input type.\n\nThrows ArgumentError if the input data type is incompatible with the transform type.\n\nExample\n\njulia> eltype_output(Transforms.NoTransform(), Float32)\nFloat32\n\njulia> eltype_output(Transforms.RFFT(), Float64)\nComplexF64 (alias for Complex{Float64})\n\njulia> eltype_output(Transforms.BRFFT(4), ComplexF32)\nFloat32\n\njulia> eltype_output(Transforms.FFT(), Float64)\nERROR: ArgumentError: invalid input data type for PencilFFTs.Transforms.FFT: Float64\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.expand_dims","page":"Available transforms","title":"PencilFFTs.Transforms.expand_dims","text":"expand_dims(transform::AbstractTransform, Val(N))\n\nExpand a single multidimensional transform into one transform per dimension.\n\nExample\n\n# Expand a real-to-complex transform in 3 dimensions.\njulia> expand_dims(Transforms.RFFT(), Val(3))\n(RFFT, FFT, FFT)\n\njulia> expand_dims(Transforms.BRFFT(4), Val(3))\n(BFFT, BFFT, BRFFT{even})\n\njulia> expand_dims(Transforms.NoTransform(), Val(2))\n(NoTransform, NoTransform)\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.is_inplace","page":"Available transforms","title":"PencilFFTs.Transforms.is_inplace","text":"is_inplace(transform::AbstractTransform) -> Bool\nis_inplace(transforms::Vararg{AbtractTransform}) -> Union{Bool, Nothing}\n\nCheck whether a transform or a list of transforms is performed in-place.\n\nIf the list of transforms has a combination of in-place and out-of-place transforms, nothing is returned.\n\nExample\n\njulia> is_inplace(Transforms.RFFT())\nfalse\n\njulia> is_inplace(Transforms.NoTransform!())\ntrue\n\njulia> is_inplace(Transforms.FFT!(), Transforms.R2R!(FFTW.REDFT01))\ntrue\n\njulia> is_inplace(Transforms.FFT(), Transforms.R2R(FFTW.REDFT01))\nfalse\n\njulia> is_inplace(Transforms.FFT(), Transforms.R2R!(FFTW.REDFT01)) === nothing\ntrue\n\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.kind","page":"Available transforms","title":"PencilFFTs.Transforms.kind","text":"kind(transform::R2R)\n\nGet kind of real-to-real transform.\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.length_output","page":"Available transforms","title":"PencilFFTs.Transforms.length_output","text":"length_output(transform::AbstractTransform, length_in::Integer)\n\nReturns the length of the transform output, given the length of its input.\n\nThe input and output lengths are specified in terms of the respective input and output datatypes. For instance, for real-to-complex transforms, these are respectively the length of input real data and of output complex data.\n\n\n\n\n\n","category":"function"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/navier_stokes.jl\"","category":"page"},{"location":"generated/navier_stokes/#Navier–Stokes-equations","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"In this example, we numerically solve the incompressible Navier–Stokes equations","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"_t bmv + (bmv bm) bmv = -frac1ρ bm p + ν ^2 bmv\nquad bm bmv = 0","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where bmv(bmx t) and p(bmx t) are respectively the velocity and pressure fields, ν is the fluid kinematic viscosity and ρ is the fluid density.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We solve the above equations a 3D periodic domain using a standard Fourier pseudo-spectral method.","category":"page"},{"location":"generated/navier_stokes/#First-steps","page":"Navier–Stokes equations","title":"First steps","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We start by loading the required packages, initialising MPI and setting the simulation parameters.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using MPI\nusing PencilFFTs\n\nMPI.Init()\ncomm = MPI.COMM_WORLD\nprocid = MPI.Comm_rank(comm) + 1\n\n# Simulation parameters\nNs = (64, 64, 64) # = (Nx, Ny, Nz)\nLs = (2π, 2π, 2π) # = (Lx, Ly, Lz)\n\n# Collocation points (\"global\" = over all processes).\n# We include the endpoint (length = N + 1) for convenience.\nxs_global = map((N, L) -> range(0, L; length = N + 1), Ns, Ls) # = (x, y, z)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's check the number of MPI processes over which we're running our simulation:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"MPI.Comm_size(comm)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can now create a partitioning of the domain based on the number of grid points (Ns) and on the number of MPI processes. There are different ways to do this. For simplicity, here we do it automatically following the PencilArrays.jl docs:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"pen = Pencil(Ns, comm)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"The subdomain associated to the local MPI process can be obtained using range_local:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"range_local(pen)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We now construct a distributed vector field that follows the decomposition configuration we just created:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v⃗₀ = (\n PencilArray{Float64}(undef, pen), # vx\n PencilArray{Float64}(undef, pen), # vy\n PencilArray{Float64}(undef, pen), # vz\n)\nsummary(v⃗₀[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We still need to fill this array with interesting values that represent a physical velocity field.","category":"page"},{"location":"generated/navier_stokes/#Initial-condition","page":"Navier–Stokes equations","title":"Initial condition","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's set the initial condition in physical space. In this example, we choose the Taylor–Green vortex configuration as an initial condition:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"beginaligned\nv_x(x y z) = u₀ sin(k₀ x) cos(k₀ y) cos(k₀ z) \nv_y(x y z) = -u₀ cos(k₀ x) sin(k₀ y) cos(k₀ z) \nv_z(x y z) = 0\nendaligned","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where u₀ and k₀ are two parameters setting the amplitude and the period of the velocity field.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To set the initial condition, each MPI process needs to know which portion of the physical grid it has been attributed. For this, PencilArrays.jl includes a localgrid helper function:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"grid = localgrid(pen, xs_global)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can use this to initialise the velocity field:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"u₀ = 1.0\nk₀ = 2π / Ls[1] # should be integer if L = 2π (to preserve periodicity)\n\n@. v⃗₀[1] = u₀ * sin(k₀ * grid.x) * cos(k₀ * grid.y) * cos(k₀ * grid.z)\n@. v⃗₀[2] = -u₀ * cos(k₀ * grid.x) * sin(k₀ * grid.y) * cos(k₀ * grid.z)\n@. v⃗₀[3] = 0\nnothing # hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's plot a 2D slice of the velocity field managed by the local MPI process:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using GLMakie\n\n# Compute the norm of a vector field represented by a tuple of arrays.\nfunction vecnorm(v⃗::NTuple)\n vnorm = similar(v⃗[1])\n for n ∈ eachindex(v⃗[1])\n w = zero(eltype(vnorm))\n for v ∈ v⃗\n w += v[n]^2\n end\n vnorm[n] = sqrt(w)\n end\n vnorm\nend\n\nlet fig = Figure(resolution = (700, 600))\n ax = Axis3(fig[1, 1]; aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\")\n vnorm = vecnorm(v⃗₀)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, vnorm;\n alpha = 0.2, levels = 4,\n colormap = :viridis, colorrange = (0.0, 1.0),\n )\n cb = Colorbar(fig[1, 2], ct; label = \"Velocity magnitude\")\n fig\nend","category":"page"},{"location":"generated/navier_stokes/#Velocity-in-Fourier-space","page":"Navier–Stokes equations","title":"Velocity in Fourier space","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"In the Fourier pseudo-spectral method, the periodic velocity field is discretised in space as a truncated Fourier series","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"bmv(bmx t) =\n_bmk hatbmv_bmk(t) e^i bmk bmx","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where bmk = (k_x k_y k_z) are the discrete wave numbers.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"The wave numbers can be obtained using the fftfreq function. Since we perform a real-to-complex transform along the first dimension, we use rfftfreq instead for k_x:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using AbstractFFTs: fftfreq, rfftfreq\n\nks_global = (\n rfftfreq(Ns[1], 2π * Ns[1] / Ls[1]), # kx | real-to-complex\n fftfreq(Ns[2], 2π * Ns[2] / Ls[2]), # ky | complex-to-complex\n fftfreq(Ns[3], 2π * Ns[3] / Ls[3]), # kz | complex-to-complex\n)\n\nks_global[1]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ks_global[2]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ks_global[3]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To transform the velocity field to Fourier space, we first create a real-to-complex FFT plan to be applied to one of the velocity components:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"plan = PencilFFTPlan(v⃗₀[1], Transforms.RFFT())","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"See PencilFFTPlan for details on creating plans and on optional keyword arguments.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can now apply this plan to the three velocity components to obtain the respective Fourier coefficients hatbmv_bmk:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v̂s = plan .* v⃗₀\nsummary(v̂s[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Note that, in Fourier space, the domain decomposition is performed along the directions x and y:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"pencil(v̂s[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"This is because the 3D FFTs are performed one dimension at a time, with the x direction first and the z direction last. To efficiently perform an FFT along a given direction (taking advantage of serial FFT implementations like FFTW), all the data along that direction must be contained locally within a single MPI process. For that reason, data redistributions (or transpositions) among MPI processes are performed behind the scenes during each FFT computation. Such transpositions require important communications between MPI processes, and are usually the most time-consuming aspect of massively-parallel simulations using this kind of methods.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To solve the Navier–Stokes equations in Fourier space, we will also need the respective wave numbers bmk associated to the local MPI process. Similarly to the local grid points, these are obtained using the localgrid function:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"grid_fourier = localgrid(v̂s[1], ks_global)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"As an example, let's first use this to compute and plot the vorticity associated to the initial condition. The vorticity is defined as the curl of the velocity, bmω = bm bmv. In Fourier space, this becomes hatbmω = i bmk hatbmv.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using StaticArrays: SVector\nusing LinearAlgebra: ×\n\nfunction curl_fourier!(\n ω̂s::NTuple{N, <:PencilArray}, v̂s::NTuple{N, <:PencilArray}, grid_fourier,\n ) where {N}\n @inbounds for I ∈ eachindex(grid_fourier)\n # We use StaticArrays for the cross product between small vectors.\n ik⃗ = im * SVector(grid_fourier[I])\n v⃗ = SVector(getindex.(v̂s, Ref(I))) # = (v̂s[1][I], v̂s[2][I], ...)\n ω⃗ = ik⃗ × v⃗\n for n ∈ eachindex(ω⃗)\n ω̂s[n][I] = ω⃗[n]\n end\n end\n ω̂s\nend\n\nω̂s = similar.(v̂s)\ncurl_fourier!(ω̂s, v̂s, grid_fourier);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We finally transform back to physical space and plot the result:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ωs = plan .\\ ω̂s\n\nlet fig = Figure(resolution = (700, 600))\n ax = Axis3(fig[1, 1]; aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\")\n ω_norm = vecnorm(ωs)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, ω_norm;\n alpha = 0.1, levels = 0.8:0.2:2.0,\n colormap = :viridis, colorrange = (0.8, 2.0),\n )\n cb = Colorbar(fig[1, 2], ct; label = \"Vorticity magnitude\")\n fig\nend","category":"page"},{"location":"generated/navier_stokes/#Computing-the-non-linear-term","page":"Navier–Stokes equations","title":"Computing the non-linear term","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"One can show that, in Fourier space, the incompressible Navier–Stokes equations can be written as","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"_t hatbmv_bmk =\n- mathcalP_bmk left widehat(bmv bm) bmv right\n- ν bmk^2 hatbmv_bmk\nquad text with quad\nmathcalP_bmk(hatbmF_bmk) = left( I - fracbmk \nbmkbmk^2 right) hatbmF_bmk","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where mathcalP_bmk is a projection operator allowing to preserve the incompressibility condition bm bmv = 0. This operator encodes the action of the pressure gradient term, which serves precisely to enforce incompressibility. Note that, because of this, the pressure gradient dissapears from the equations.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Now that we have the wave numbers bmk, computing the linear viscous term in Fourier space is straighforward once we have the Fourier coefficients hatbmv_bmk of the velocity field. What is slightly more challenging (and much more costly) is the computation of the non-linear term in Fourier space, hatbmF_bmk = left widehat(bmv bm) bmv right_bmk. In the pseudo-spectral method, the quadratic nonlinearity is computed by collocation in physical space (i.e. this term is evaluated at grid points), while derivatives are computed in Fourier space. This requires transforming fields back and forth between both spaces.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Below we implement a function that computes the non-linear term in Fourier space based on its convective form (bmv bm) bmv = bm (bmv bmv). Note that this equivalence uses the incompressibility condition bm bmv = 0.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using LinearAlgebra: mul!, ldiv! # for applying FFT plans in-place\n\n# Compute non-linear term in Fourier space from velocity field in physical\n# space. Optional keyword arguments may be passed to avoid allocations.\nfunction ns_nonlinear!(\n F̂s, vs, plan, grid_fourier;\n vbuf = similar(vs[1]), v̂buf = similar(F̂s[1]),\n )\n # Compute F_i = ∂_j (v_i v_j) for each i.\n # In Fourier space: F̂_i = im * k_j * FFT(v_i * v_j)\n w, ŵ = vbuf, v̂buf\n @inbounds for (i, F̂i) ∈ enumerate(F̂s)\n F̂i .= 0\n vi = vs[i]\n for (j, vj) ∈ enumerate(vs)\n w .= vi .* vj # w = v_i * v_j in physical space\n mul!(ŵ, plan, w) # same in Fourier space\n # Add derivative in Fourier space\n for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n kj = k⃗[j]\n F̂i[I] += im * kj * ŵ[I]\n end\n end\n end\n F̂s\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"As an example, let's use this function on our initial velocity field:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"F̂s = similar.(v̂s)\nns_nonlinear!(F̂s, v⃗₀, plan, grid_fourier);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Strictly speaking, computing the non-linear term by collocation can lead to aliasing errors, as the quadratic term excites Fourier modes that fall beyond the range of resolved wave numbers. The typical solution is to apply Orzsag's 2/3 rule to zero-out the Fourier coefficients associated to the highest wave numbers. We define a function that applies this procedure below.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function dealias_twothirds!(ŵs::Tuple, grid_fourier, ks_global)\n ks_max = maximum.(abs, ks_global) # maximum stored wave numbers (kx_max, ky_max, kz_max)\n ks_lim = (2 / 3) .* ks_max\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I]\n if any(abs.(k⃗) .> ks_lim)\n for ŵ ∈ ŵs\n ŵ[I] = 0\n end\n end\n end\n ŵs\nend\n\n# We can apply this on the previously computed non-linear term:\ndealias_twothirds!(F̂s, grid_fourier, ks_global);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Finally, we implement the projection associated to the incompressibility condition:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function project_divergence_free!(ûs, grid_fourier)\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I]\n k² = sum(abs2, k⃗)\n iszero(k²) && continue # avoid division by zero\n û = getindex.(ûs, Ref(I)) # (ûs[1][I], ûs[2][I], ...)\n for i ∈ eachindex(û)\n ŵ = û[i]\n for j ∈ eachindex(û)\n ŵ -= k⃗[i] * k⃗[j] * û[j] / k²\n end\n ûs[i][I] = ŵ\n end\n end\n ûs\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can verify the correctness of the projection operator by checking that the initial velocity field is not modified by it, since it is already incompressible:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v̂s_proj = project_divergence_free!(copy.(v̂s), grid_fourier)\nv̂s_proj .≈ v̂s # the last one may be false because v_z = 0 initially","category":"page"},{"location":"generated/navier_stokes/#Putting-it-all-together","page":"Navier–Stokes equations","title":"Putting it all together","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To perform the time integration of the Navier–Stokes equations, we will use the timestepping routines implemented in the DifferentialEquations.jl suite. For simplicity, we use here an explicit Runge–Kutta scheme. In this case, we just need to write a function that computes the right-hand side of the Navier–Stokes equations in Fourier space:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function ns_rhs!(\n dvs::NTuple{N, <:PencilArray}, vs::NTuple{N, <:PencilArray}, p, t,\n ) where {N}\n # 1. Compute non-linear term and dealias it\n (; plan, cache, ks_global, grid_fourier) = p\n F̂s = cache.F̂s\n ns_nonlinear!(F̂s, vs, plan, grid_fourier; vbuf = dvs[1], v̂buf = cache.v̂s[1])\n dealias_twothirds!(F̂s, grid_fourier, ks_global)\n\n # 2. Project onto divergence-free space\n project_divergence_free!(F̂s, grid_fourier)\n\n # 3. Transform velocity to Fourier space\n v̂s = cache.v̂s\n map((v, v̂) -> mul!(v̂, plan, v), vs, v̂s)\n\n # 4. Add viscous term (and multiply projected non-linear term by -1)\n ν = p.ν\n for n ∈ eachindex(v̂s)\n v̂ = v̂s[n]\n F̂ = F̂s[n]\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n k² = sum(abs2, k⃗)\n F̂[I] = -F̂[I] - ν * k² * v̂[I]\n end\n end\n\n # 5. Transform RHS back to physical space\n map((dv, dv̂) -> ldiv!(dv, plan, dv̂), dvs, F̂s)\n\n nothing\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"For the time-stepping, we load OrdinaryDiffEq.jl from the DifferentialEquations.jl suite and set-up the simulation. Since DifferentialEquations.jl can't directly deal with tuples of arrays, we convert the input data to the ArrayPartition type and write an interface function to make things work with our functions defined above.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using OrdinaryDiffEq\nusing RecursiveArrayTools: ArrayPartition\n\nns_rhs!(dv::ArrayPartition, v::ArrayPartition, args...) = ns_rhs!(dv.x, v.x, args...)\nvs_init_ode = ArrayPartition(v⃗₀)\nsummary(vs_init_ode)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We now define solver parameters and temporary variables, and initialise the problem:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"params = (;\n ν = 5e-3, # kinematic viscosity\n plan, grid_fourier, ks_global,\n cache = (\n v̂s = similar.(v̂s),\n F̂s = similar.(v̂s),\n )\n)\n\ntspan = (0.0, 10.0)\nprob = ODEProblem(ns_rhs!, vs_init_ode, tspan, params)\nintegrator = init(prob, RK4(); dt = 1e-3, save_everystep = false);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We finally solve the problem over time and plot the vorticity associated to the solution. It is also useful to look at the energy spectrum E(k), to see if the small scales are correctly resolved. To obtain a turbulent flow, the viscosity ν must be small enough to allow the transient appearance of an energy cascade towards the small scales (i.e. from small to large k), while high enough to allow the small-scale motions to be correctly resolved.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function energy_spectrum!(Ek, ks, v̂s, grid_fourier)\n Nk = length(Ek)\n @assert Nk == length(ks)\n Ek .= 0\n for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n knorm = sqrt(sum(abs2, k⃗))\n i = searchsortedfirst(ks, knorm)\n i > Nk && continue\n v⃗ = getindex.(v̂s, Ref(I)) # = (v̂s[1][I], v̂s[2][I], ...)\n factor = k⃗[1] == 0 ? 1 : 2 # account for Hermitian symmetry and r2c transform\n Ek[i] += factor * sum(abs2, v⃗) / 2\n end\n MPI.Allreduce!(Ek, +, get_comm(v̂s[1])) # sum across all processes\n Ek\nend\n\nks = rfftfreq(Ns[1], 2π * Ns[1] / Ls[1])\nEk = similar(ks)\nv̂s = plan .* integrator.u.x\nenergy_spectrum!(Ek, ks, v̂s, grid_fourier)\nEk ./= scale_factor(plan)^2 # rescale energy\n\ncurl_fourier!(ω̂s, v̂s, grid_fourier)\nldiv!.(ωs, plan, ω̂s)\nω⃗_plot = Observable(ωs)\nk_plot = @view ks[2:end]\nE_plot = Observable(@view Ek[2:end])\nt_plot = Observable(integrator.t)\n\nfig = let\n fig = Figure(resolution = (1200, 600))\n ax = Axis3(\n fig[1, 1][1, 1]; title = @lift(\"t = $(round($t_plot, digits = 3))\"),\n aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\",\n )\n ω_mag = @lift vecnorm($ω⃗_plot)\n ω_mag_norm = @lift $ω_mag ./ maximum($ω_mag)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, ω_mag_norm;\n alpha = 0.3, levels = 3,\n colormap = :viridis, colorrange = (0.0, 1.0),\n )\n cb = Colorbar(fig[1, 1][1, 2], ct; label = \"Normalised vorticity magnitude\")\n ax_sp = Axis(\n fig[1, 2];\n xlabel = \"k\", ylabel = \"E(k)\", xscale = log2, yscale = log10,\n title = \"Kinetic energy spectrum\",\n )\n ylims!(ax_sp, 1e-8, 1e0)\n scatterlines!(ax_sp, k_plot, E_plot)\n ks_slope = exp.(range(log(2.5), log(25.0), length = 3))\n E_fivethirds = @. 0.3 * ks_slope^(-5/3)\n @views lines!(ax_sp, ks_slope, E_fivethirds; color = :black, linestyle = :dot)\n text!(ax_sp, L\"k^{-5/3}\"; position = (ks_slope[2], E_fivethirds[2]), align = (:left, :bottom))\n fig\nend\n\nusing Printf # hide\nwith_xvfb = ENV[\"DISPLAY\"] == \":99\" # hide\nnstep = 0 # hide\nconst tmpdir = mktempdir() # hide\nfilename_frame(procid, nstep) = joinpath(tmpdir, @sprintf(\"proc%d_%04d.png\", procid, nstep)) # hide\nrecord(fig, \"vorticity_proc$procid.mp4\"; framerate = 10) do io\n with_xvfb && recordframe!(io) # hide\n while integrator.t < 20\n dt = 0.001\n step!(integrator, dt)\n t_plot[] = integrator.t\n mul!.(v̂s, plan, integrator.u.x) # current velocity in Fourier space\n curl_fourier!(ω̂s, v̂s, grid_fourier)\n ldiv!.(ω⃗_plot[], plan, ω̂s)\n ω⃗_plot[] = ω⃗_plot[] # to force updating the plot\n energy_spectrum!(Ek, ks, v̂s, grid_fourier)\n Ek ./= scale_factor(plan)^2 # rescale energy\n E_plot[] = E_plot[]\n global nstep += 1 # hide\n with_xvfb ? # hide\n save(filename_frame(procid, nstep), fig) : # hide\n recordframe!(io)\n end\nend;\n\nif with_xvfb # hide\n run(pipeline(`ffmpeg -y -r 10 -i $tmpdir/proc$(procid)_%04d.png -c:v libx264 -vf \"fps=25,format=yuv420p\" vorticity_proc$procid.mp4`; stdout = \"ffmpeg.out\", stderr = \"ffmpeg.err\")) # hide\nend # hide\nnothing # hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"This page was generated using Literate.jl.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/gradient.jl\"","category":"page"},{"location":"generated/gradient/#Gradient-of-a-scalar-field","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This example shows different methods to compute the gradient of a real-valued 3D scalar field θ(bmx) in Fourier space, where bmx = (x y z). It is assumed that the field is periodic with period L = 2π along all dimensions.","category":"page"},{"location":"generated/gradient/#General-procedure","page":"Gradient of a scalar field","title":"General procedure","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The discrete Fourier expansion of θ writes","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"θ(bmx) = _bmk Z^3 hatθ(bmk) e^i bmk bmx","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"where bmk = (k_x k_y k_z) are the Fourier wave numbers and hatθ is the discrete Fourier transform of θ. Then, the spatial derivatives of θ are given by","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"frac θ(bmx) x_i =\n_bmk Z^3 i k_i hatθ(bmk) e^i bmk bmx","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"where the subscript i denotes one of the spatial components x, y or z.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In other words, to compute bm θ = (_x θ _y θ _z θ), one has to:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"transform θ to Fourier space to obtain hatθ,\nmultiply hatθ by i bmk,\ntransform the result back to physical space to obtain bm θ.","category":"page"},{"location":"generated/gradient/#Preparation","page":"Gradient of a scalar field","title":"Preparation","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In this section, we initialise a random real-valued scalar field θ and compute its FFT. For more details see the Tutorial.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"using MPI\nusing PencilFFTs\nusing Random\n\nMPI.Init()\n\n# Input data dimensions (Nx × Ny × Nz)\ndims = (64, 32, 64)\n\n# Apply a 3D real-to-complex (r2c) FFT.\ntransform = Transforms.RFFT()\n\n# Automatically create decomposition configuration\ncomm = MPI.COMM_WORLD\npen = Pencil(dims, comm)\n\n# Create plan\nplan = PencilFFTPlan(pen, transform)\n\n# Allocate data and initialise field\nθ = allocate_input(plan)\nrandn!(θ)\n\n# Perform distributed FFT\nθ_hat = plan * θ\nnothing # hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, we initialise the output that will hold ∇θ in Fourier space. Noting that ∇θ is a vector field, we choose to store it as a tuple of 3 PencilArrays.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"∇θ_hat = allocate_output(plan, Val(3))\n\n# This is equivalent:\n# ∇θ_hat = ntuple(d -> similar(θ_hat), Val(3))\n\nsummary(∇θ_hat)","category":"page"},{"location":"generated/gradient/#Fourier-wave-numbers","page":"Gradient of a scalar field","title":"Fourier wave numbers","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In general, the Fourier wave numbers are of the form k_i = 0 frac2πL_i frac4πL_i frac6πL_i , where L_i is the period along dimension i. When a real-to-complex Fourier transform is applied, roughly half of these wave numbers are redundant due to the Hermitian symmetry of the complex Fourier coefficients. In practice, this means that for the fastest dimension x (along which a real-to-complex transform is performed), the negative wave numbers are dropped, i.e. k_x = 0 frac2πL_x frac4πL_x .","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The AbstractFFTs package provides a convenient way to generate the Fourier wave numbers, using the functions fftfreq and rfftfreq. We can use these functions to initialise a \"grid\" of wave numbers associated to our 3D real-to-complex transform:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"using AbstractFFTs: fftfreq, rfftfreq\n\nbox_size = (2π, 2π, 2π) # Lx, Ly, Lz\nsample_rate = 2π .* dims ./ box_size\n\n# In our case (Lx = 2π and Nx even), this gives kx = [0, 1, 2, ..., Nx/2].\nkx = rfftfreq(dims[1], sample_rate[1])\n\n# In our case (Ly = 2π and Ny even), this gives\n# ky = [0, 1, 2, ..., Ny/2-1, -Ny/2, -Ny/2+1, ..., -1] (and similarly for kz).\nky = fftfreq(dims[2], sample_rate[2])\nkz = fftfreq(dims[3], sample_rate[3])\n\nkvec = (kx, ky, kz)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that kvec now contains the wave numbers associated to the global domain. In the following, we will only need the wave numbers associated to the portion of the domain handled by the local MPI process.","category":"page"},{"location":"generated/gradient/#gradient_method_global","page":"Gradient of a scalar field","title":"Method 1: global views","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"PencilArrays, returned for instance by allocate_input and allocate_output, take indices that start at 1, regardless of the location of the subdomain associated to the local process on the global grid. (In other words, PencilArrays take local indices.) On the other hand, we have defined the wave number vector kvec which, for each MPI process, is defined over the global domain, and as such it takes global indices.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"One straightforward way of making data arrays compatible with wave numbers is to use global views, i.e. arrays that take global indices. These are generated from PencilArrays by calling the global_view function. Note that, in general, global indices do not start at 1 for a given MPI process. A given process will own a range of data given by indices in (i1:i2, j1:j2, k1:k2).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"θ_glob = global_view(θ_hat)\n∇θ_glob = global_view.(∇θ_hat)\nsummary(θ_glob)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Once we have global views, we can combine data and wave numbers using the portion of global indices owned by the local MPI process, as shown below. We can use CartesianIndices to iterate over the global indices associated to the local process.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"for I in CartesianIndices(θ_glob)\n i, j, k = Tuple(I) # unpack indices\n\n # Wave number vector associated to current Cartesian index.\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n # Compute gradient in Fourier space.\n # Note that modifying ∇θ_glob also modifies the original PencilArray ∇θ_hat.\n ∇θ_glob[1][I] = im * kx * θ_glob[I]\n ∇θ_glob[2][I] = im * ky * θ_glob[I]\n ∇θ_glob[3][I] = im * kz * θ_glob[I]\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The above loop can be written in a slightly more efficient manner by precomputing im * θ_glob[I]:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@inbounds for I in CartesianIndices(θ_glob)\n i, j, k = Tuple(I)\n\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n u = im * θ_glob[I]\n\n ∇θ_glob[1][I] = kx * u\n ∇θ_glob[2][I] = ky * u\n ∇θ_glob[3][I] = kz * u\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Also note that the above can be easily written in a more generic way, e.g. for arbitrary dimensions, thanks in part to the use of CartesianIndices. Moreover, in the above there is no notion of the dimension permutations discussed in the tutorial, as it is all hidden behind the implementation of PencilArrays. And as seen later in the benchmarks, these (hidden) permutations have zero cost, as the speed is identical to that of a function that explicitly takes into account these permutations.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, we can perform a backwards transform to obtain bm θ in physical space:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"∇θ = plan \\ ∇θ_hat;\nnothing #hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that the transform is automatically broadcast over the three fields of the ∇θ_hat vector, and the result ∇θ is also a tuple of three PencilArrays.","category":"page"},{"location":"generated/gradient/#gradient_method_global_explicit","page":"Gradient of a scalar field","title":"Method 2: explicit global indexing","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Sometimes, one does not need to write generic code. In our case, one often knows the dimensionality of the problem and the memory layout of the data (i.e. the underlying index permutation).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Below is a reimplementation of the above loop, using explicit indices instead of CartesianIndices, and assuming that the underlying index permutation is (3, 2, 1), that is, data is stored in (z y x) order. As discussed in the tutorial, this is the default for transformed arrays. This example also serves as a more explicit explanation for what is going on in the first method.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"# Get local data range in the global grid.\nrng = axes(θ_glob) # = (i1:i2, j1:j2, k1:k2)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"For the loop below, we're assuming that the permutation is (3, 2, 1). In other words, the fastest index is the last one, and not the first one as it is usually in Julia. If the permutation is not (3, 2, 1), things will still work (well, except for the assertion below!), but the loop order will not be optimal.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@assert permutation(θ_hat) === Permutation(3, 2, 1)\n\n@inbounds for i in rng[1], j in rng[2], k in rng[3]\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n # Note that we still access the arrays in (i, j, k) order.\n # (The permutation happens behind the scenes!)\n u = im * θ_glob[i, j, k]\n\n ∇θ_glob[1][i, j, k] = kx * u\n ∇θ_glob[2][i, j, k] = ky * u\n ∇θ_glob[3][i, j, k] = kz * u\nend","category":"page"},{"location":"generated/gradient/#gradient_method_local","page":"Gradient of a scalar field","title":"Method 3: using local indices","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Alternatively, we can avoid global views and work directly on PencilArrays using local indices that start at 1. In this case, part of the strategy is to construct a \"local\" grid of wave numbers that can also be accessed with local indices. This can be conveniently done using the localgrid function of the PencilArrays.jl package, which accepts a PencilArray (or its associated Pencil) and the global coordinates (here kvec):","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"grid_fourier = localgrid(θ_hat, kvec)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that one can directly iterate on the returned grid object:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@inbounds for I in CartesianIndices(grid_fourier)\n # Wave number vector associated to current Cartesian index.\n local k⃗ # hide\n k⃗ = grid_fourier[I]\n u = im * θ_hat[I]\n ∇θ_hat[1][I] = k⃗[1] * u\n ∇θ_hat[2][I] = k⃗[2] * u\n ∇θ_hat[3][I] = k⃗[3] * u\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This implementation is as efficient as the other examples, while being slightly shorter to write. Moreover, it is quite generic, and can be made independent of the number of dimensions with little effort.","category":"page"},{"location":"generated/gradient/#gradient_method_broadcast","page":"Gradient of a scalar field","title":"Method 4: using broadcasting","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, note that the local grid object returned by localgrid makes it is possible to compute the gradient using broadcasting, thus fully avoiding scalar indexing. This can be quite convenient in some cases, and can also be very useful if one is working on GPUs (where scalar indexing is prohibitively expensive). Using broadcasting, the above examples simply become:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@. ∇θ_hat[1] = im * grid_fourier[1] * θ_hat\n@. ∇θ_hat[2] = im * grid_fourier[2] * θ_hat\n@. ∇θ_hat[3] = im * grid_fourier[3] * θ_hat\nnothing # hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Once again, as shown in the benchmarks further below, this method performs quite similarly to the other ones.","category":"page"},{"location":"generated/gradient/#Summary","page":"Gradient of a scalar field","title":"Summary","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The PencilArrays module provides different alternatives to deal with MPI-distributed data that may be subject to dimension permutations. In particular, one can choose to work with global indices (first two examples), with local indices (third example), or to avoid scalar indexing altogether (fourth example).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"If one wants to stay generic, making sure that the same code will work for arbitrary dimensions and will be efficient regardless of the underlying dimension permutation, methods 1, 3 or 4 should be preferred. These use CartesianIndices and make no assumptions on possible dimension permutations, which are by default enabled in the output of PencilFFTs transforms. In fact, such permutations are completely invisible in the implementations.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The second method uses explicit (i, j, k) indices. It assumes that the underlying permutation is (3, 2, 1) to loop with i as the slowest index and k as the fastest, which is the optimal order in this case given the permutation. As such, the implementation is less generic than the others, and differences in performance are negligible with respect to more generic variants.","category":"page"},{"location":"generated/gradient/#gradient_benchmarks","page":"Gradient of a scalar field","title":"Benchmark results","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The following are the benchmark results obtained from running examples/gradient.jl on a laptop, using 2 MPI processes and Julia 1.7.2, with an input array of global dimensions 64 32 64. The different methods detailed above are marked on the right. The \"lazy\" marks indicate runs where the wave numbers were represented by lazy Frequencies objects (returned by rfftfreq and fftfreq). Otherwise, they were collected into Vectors. For some reason, plain Vectors are faster when working with grids generated by localgrid.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In the script, additional implementations can be found which rely on a more advanced understanding of permutations and on the internals of the PencilArrays package. For instance, gradient_local_parent! directly works with the raw data stored in Julia Arrays, while gradient_local_linear! completely avoids CartesianIndices while staying generic and efficient. Nevertheless, these display roughly the same performance as the above examples.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":" gradient_global_view!... 89.900 μs\n gradient_global_view! (lazy)... 92.060 μs [Method 1]\n gradient_global_view_explicit!... 88.958 μs\n gradient_global_view_explicit! (lazy)... 81.055 μs [Method 2]\n gradient_local!... 92.305 μs\n gradient_grid!... 92.770 μs\n gradient_grid! (lazy)... 101.388 μs [Method 3]\n gradient_grid_broadcast!... 88.606 μs\n gradient_grid_broadcast! (lazy)... 151.020 μs [Method 4]\n gradient_local_parent!... 92.248 μs\n gradient_local_linear!... 91.212 μs\n gradient_local_linear_explicit!... 90.992 μs","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This page was generated using Literate.jl.","category":"page"},{"location":"PencilFFTs_timers/#PencilFFTs.measuring_performance","page":"Measuring performance","title":"Measuring performance","text":"","category":"section"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"It is possible to measure the time spent in different sections of the distributed transforms using the TimerOutputs package. This has a (very small) performance overhead, so it is disabled by default. To enable time measurements, call TimerOutputs.enable_debug_timings after loading PencilFFTs (see below for an example). For more details see the TimerOutputs docs.","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"Minimal example:","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"using MPI\nusing PencilFFTs\nusing TimerOutputs\n\n# Enable timing of `PencilFFTs` functions\nTimerOutputs.enable_debug_timings(PencilFFTs)\nTimerOutputs.enable_debug_timings(PencilArrays)\nTimerOutputs.enable_debug_timings(Transpositions)\n\nMPI.Init()\n\nplan = PencilFFTPlan(#= args... =#)\n\n# [do stuff with `plan`...]\n\n# Retrieve and print timing data associated to `plan`\nto = timer(plan)\nprint_timer(to)","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"By default, each PencilFFTPlan has its own TimerOutput. If you already have a TimerOutput, you can pass it to the PencilFFTPlan constructor:","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"to = TimerOutput()\nplan = PencilFFTPlan(..., timer=to)\n\n# [do stuff with `plan`...]\n\nprint_timer(to)","category":"page"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"#PencilFFTs","page":"Home","title":"PencilFFTs","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Fast Fourier transforms of MPI-distributed Julia arrays.","category":"page"},{"location":"#Introduction","page":"Home","title":"Introduction","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package provides multidimensional FFTs and related transforms on MPI-distributed Julia arrays via the PencilArrays package.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The name of this package originates from the decomposition of 3D domains along two out of three dimensions, sometimes called pencil decomposition. This is illustrated by the figure below,[1] where each coloured block is managed by a different MPI process. Typically, one wants to compute FFTs on a scalar or vector field along the three spatial dimensions. In the case of a pencil decomposition, 3D FFTs are performed one dimension at a time, along the non-decomposed direction. Transforms must then be interleaved with global data transpositions to switch between pencil configurations. In high-performance computing environments, such data transpositions are generally the most expensive part of a parallel FFT computation, due to the large cost of communications between computing nodes.","category":"page"},{"location":"","page":"Home","title":"Home","text":"
\n \n
","category":"page"},{"location":"","page":"Home","title":"Home","text":"More generally, PencilFFTs allows to decompose and perform FFTs on geometries of arbitrary dimension N. The decompositions can be performed along an arbitrary number M N of dimensions.[2] Moreover, the transforms applied along each dimension can be arbitrarily chosen (and combined) among those supported by FFTW.jl, including complex-to-complex, real-to-complex and real-to-real transforms.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The generic and efficient implementation of this package is greatly enabled by the use of zero-cost abstractions in Julia. As shown in the Benchmarks section, PencilFFTs scales well to large numbers of processes, and performs similarly to the Fortran implementation of P3DFFT, possibly the most popular library for computing parallel FFTs using 2D domain decomposition.","category":"page"},{"location":"#Features","page":"Home","title":"Features","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"distributed N-dimensional FFTs of MPI-distributed Julia arrays, using the PencilArrays package;\nFFTs and related transforms (e.g. DCTs / Chebyshev transforms) may be arbitrarily combined along different dimensions;\nin-place and out-of-place transforms;\nhigh scalability up to (at least) tens of thousands of MPI processes.","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"PencilFFTs can be installed using the Julia package manager:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> ] add PencilFFTs","category":"page"},{"location":"#Similar-projects","page":"Home","title":"Similar projects","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"FFTW3 implements distributed-memory transforms using MPI, but these are limited to 1D decompositions. Also, this functionality is not currently included in the FFTW.jl wrappers.\nPFFT is a very general parallel FFT library written in C.\nP3DFFT implements parallel 3D FFTs using pencil decomposition in Fortran and C++.\n2DECOMP&FFT is another parallel 3D FFT library using pencil decomposition written in Fortran.","category":"page"},{"location":"","page":"Home","title":"Home","text":"[1]: Figure adapted from this PhD thesis.","category":"page"},{"location":"","page":"Home","title":"Home","text":"[2]: For the pencil decomposition represented in the figure, N = 3 and M = 2.","category":"page"},{"location":"PencilFFTs/#Distributed-FFT-plans","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"Distributed FFTs are implemented in the PencilFFTs module, and are built on top of the PencilArrays package.","category":"page"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"PencilFFTs/#Creating-plans","page":"Distributed FFT plans","title":"Creating plans","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"PencilFFTPlan","category":"page"},{"location":"PencilFFTs/#PencilFFTs.PencilFFTPlan","page":"Distributed FFT plans","title":"PencilFFTs.PencilFFTPlan","text":"PencilFFTPlan{T,N} <: AbstractFFTs.Plan{T}\n\nPlan for N-dimensional FFT-based transform on MPI-distributed data, where input data has type T.\n\n\n\nPencilFFTPlan(p::Pencil, transforms; kwargs...)\n\nCreate a PencilFFTPlan for distributed arrays following a given Pencil configuration. See variant below for details on the specification of transforms and on possible keyword arguments.\n\n\n\nPencilFFTPlan(\n A::PencilArray, transforms;\n fftw_flags = FFTW.ESTIMATE,\n fftw_timelimit = FFTW.NO_TIMELIMIT,\n permute_dims = Val(true),\n transpose_method = Transpositions.PointToPoint(),\n timer = timer(pencil(A)),\n)\n\nCreate plan for N-dimensional transform on MPI-distributed PencilArrays.\n\nExtended help\n\nThis creates a PencilFFTPlan for arrays sharing the same properties as A (dimensions, MPI decomposition, memory layout, ...), which describe data on an N-dimensional domain.\n\nTransforms\n\nThe transforms to be applied along each dimension are specified by the transforms argument. Possible transforms are defined as subtypes of Transforms.AbstractTransform, and are listed in Transform types. This argument may be either:\n\na tuple of N transforms to be applied along each dimension. For instance, transforms = (Transforms.R2R(FFTW.REDFT01), Transforms.RFFT(), Transforms.FFT());\na single transform to be applied along all dimensions. The input is automatically expanded into N equivalent transforms. For instance, for a three-dimensional array, transforms = Transforms.RFFT() specifies a 3D real-to-complex transform, and is equivalent to passing (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT()).\n\nNote that forward transforms are applied from left to right. In the last example, this means that a real-to-complex transform (RFFT) is first performed along the first dimension. This is followed by complex-to-complex transforms (FFT) along the second and third dimensions.\n\nInput data layout\n\nThe input PencilArray must satisfy the following constraints:\n\narray dimensions must not be permuted. This is the default when constructing PencilArrays.\nfor an M-dimensional domain decomposition (with M < N), the input array must be decomposed along the last M dimensions. For example, for a 2D decomposition of 3D data, the decomposed dimensions must be (2, 3). In particular, the first array dimension must not be distributed among different MPI processes.\nIn the PencilArrays package, the decomposed dimensions are specified at the moment of constructing a Pencil.\nthe element type must be compatible with the specified transform. For instance, real-to-complex transforms (Transforms.RFFT) require the input to be real floating point values. Other transforms, such as Transforms.R2R, accept both real and complex data.\n\nKeyword arguments\n\nThe keyword arguments fftw_flags and fftw_timelimit are passed to the FFTW plan creation functions (see AbstractFFTs docs).\npermute_dims determines whether the indices of the output data should be reversed. For instance, if the input data has global dimensions (Nx, Ny, Nz), then the output of a complex-to-complex FFT would have dimensions (Nz, Ny, Nx). This enables FFTs to always be performed along the first (i.e. fastest) array dimension, which could lead to performance gains. This option is enabled by default. For type inference reasons, it must be a value type (Val(true) or Val(false)).\ntranspose_method allows to select between implementations of the global data transpositions. See PencilArrays docs docs for details.\ntimer should be a TimerOutput object. See Measuring performance for details.\n\n\n\nPencilFFTPlan(\n dims_global::Dims{N}, transforms, proc_dims::Dims{M}, comm::MPI.Comm,\n [real_type = Float64]; extra_dims = (), kws...\n)\n\nCreate plan for N-dimensional transform.\n\nExtended help\n\nInstead of taking a PencilArray or a Pencil, this constructor requires the global dimensions of the input data, passed via the size_global argument.\n\nThe data is distributed over the MPI processes in the comm communicator. The distribution is performed over M dimensions (with M < N) according to the values in proc_dims, which specifies the number of MPI processes to put along each dimension.\n\nPencilArrays that may be transformed with the returned plan can be created using allocate_input.\n\nOptional arguments\n\nThe floating point precision can be selected by setting real_type parameter, which is Float64 by default.\nextra_dims may be used to specify the sizes of one or more extra dimensions that should not be transformed. These dimensions will be added to the rightmost (i.e. slowest) indices of the arrays. See Extra dimensions below for usage hints.\nsee the other constructor for more keyword arguments.\n\nExtra dimensions\n\nOne possible application of extra_dims is for describing the components of a vector or tensor field. However, this means that different PencilFFTPlans would need to be created for each kind of field (scalar, vector, ...). To avoid the creation of multiple plans, a possibly better alternative is to create tuples (or arrays) of PencilArrays using allocate_input and allocate_output.\n\nAnother more legitimate usage of extra_dims is to specify one or more Cartesian dimensions that should not be transformed nor split among MPI processes.\n\nExample\n\nSuppose we want to perform a 3D FFT of real data. The data is to be decomposed along two dimensions, over 8 MPI processes:\n\nsize_global = (64, 32, 128) # size of real input data\n\n# Perform real-to-complex transform along the first dimension, then\n# complex-to-complex transforms along the other dimensions.\ntransforms = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT())\n# transforms = Transforms.RFFT() # this is equivalent to the above line\n\nproc_dims = (4, 2) # 2D decomposition\ncomm = MPI.COMM_WORLD\n\nplan = PencilFFTPlan(size_global, transforms, proc_dims, comm)\n\n\n\n\n\n","category":"type"},{"location":"PencilFFTs/#Allocating-data","page":"Distributed FFT plans","title":"Allocating data","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"allocate_input\nallocate_output","category":"page"},{"location":"PencilFFTs/#PencilFFTs.allocate_input","page":"Distributed FFT plans","title":"PencilFFTs.allocate_input","text":"allocate_input(p::PencilFFTPlan) -> PencilArray\nallocate_input(p::PencilFFTPlan, dims...) -> Array{PencilArray}\nallocate_input(p::PencilFFTPlan, Val(N)) -> NTuple{N, PencilArray}\n\nAllocate uninitialised PencilArray that can hold input data for the given plan.\n\nThe second and third forms respectively allocate an array of PencilArrays of size dims, and a tuple of N PencilArrays.\n\nnote: In-place plans\nIf p is an in-place plan, a ManyPencilArray is allocated. This type holds PencilArray wrappers for the input and output transforms (as well as for intermediate transforms) which share the same space in memory. The input and output PencilArrays should be respectively accessed by calling first(::ManyPencilArray) and last(::ManyPencilArray).ExampleSuppose p is an in-place PencilFFTPlan. Then,@assert is_inplace(p)\nA = allocate_input(p) :: ManyPencilArray\nv_in = first(A) :: PencilArray # input data view\nv_out = last(A) :: PencilArray # output data viewAlso note that in-place plans must be performed directly on the returned ManyPencilArray, and not on the contained PencilArray views:p * A # perform forward transform in-place\np \\ A # perform backward transform in-place\n# p * v_in # not allowed!!\n\n\n\n\n\n","category":"function"},{"location":"PencilFFTs/#PencilFFTs.allocate_output","page":"Distributed FFT plans","title":"PencilFFTs.allocate_output","text":"allocate_output(p::PencilFFTPlan) -> PencilArray\nallocate_output(p::PencilFFTPlan, dims...) -> Array{PencilArray}\nallocate_output(p::PencilFFTPlan, Val(N)) -> NTuple{N, PencilArray}\n\nAllocate uninitialised PencilArray that can hold output data for the given plan.\n\nIf p is an in-place plan, a ManyPencilArray is allocated.\n\nSee allocate_input for details.\n\n\n\n\n\n","category":"function"},{"location":"PencilFFTs/#Methods","page":"Distributed FFT plans","title":"Methods","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"get_comm(::PencilFFTPlan)\nscale_factor(::PencilFFTPlan)\ntimer(::PencilFFTPlan)\nis_inplace(::PencilFFTPlan)","category":"page"},{"location":"PencilFFTs/#PencilArrays.Pencils.MPITopologies.get_comm-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilArrays.Pencils.MPITopologies.get_comm","text":"get_comm(p::PencilFFTPlan)\n\nGet MPI communicator associated to a PencilFFTPlan.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilFFTs.Transforms.scale_factor-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilFFTs.Transforms.scale_factor","text":"scale_factor(p::PencilFFTPlan)\n\nGet scale factor associated to a PencilFFTPlan.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilArrays.Pencils.timer-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilArrays.Pencils.timer","text":"timer(p::PencilFFTPlan)\n\nGet TimerOutput attached to a PencilFFTPlan.\n\nSee Measuring performance for details.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilFFTs.Transforms.is_inplace-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilFFTs.Transforms.is_inplace","text":"Transforms.is_inplace(p::PencilFFTPlan)\n\nReturns true if the given plan operates in-place on the input data, false otherwise.\n\n\n\n\n\n","category":"method"}]
+[{"location":"tutorial/#Tutorial","page":"Tutorial","title":"Tutorial","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The following tutorial shows how to perform a 3D FFT of real periodic data defined on a grid of N_x N_y N_z points.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"
\n \n \n
","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"By default, the domain is distributed on a 2D MPI topology of dimensions N_1 N_2. As an example, the above figure shows such a topology with N_1 = 4 and N_2 = 3, for a total of 12 MPI processes.","category":"page"},{"location":"tutorial/#tutorial:creating_plans","page":"Tutorial","title":"Creating plans","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The first thing to do is to create a domain decomposition configuration for the given dataset dimensions N_x N_y N_z. In the framework of PencilArrays, such a configuration is described by a Pencil object. As described in the PencilArrays docs, we can let the Pencil constructor automatically determine such a configuration. For this, only an MPI communicator and the dataset dimensions are needed:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"using MPI\nusing PencilFFTs\n\nMPI.Init()\ncomm = MPI.COMM_WORLD\n\n# Input data dimensions (Nx × Ny × Nz)\ndims = (16, 32, 64)\npen = Pencil(dims, comm)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"By default this creates a 2D decomposition (for the case of a 3D dataset), but one can change this as detailed in the PencilArrays documentation linked above.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"We can now create a PencilFFTPlan, which requires information on decomposition configuration (the Pencil object) and on the transforms that will be applied:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# Apply a 3D real-to-complex (r2c) FFT.\ntransform = Transforms.RFFT()\n\n# Note that, for more control, one can instead separately specify the transforms along each dimension:\n# transform = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT())\n\n# Create plan\nplan = PencilFFTPlan(pen, transform)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"See the PencilFFTPlan constructor for details on the accepted options, and the Transforms module for the possible transforms. It is also possible to enable fine-grained performance measurements via the TimerOutputs package, as described in Measuring performance.","category":"page"},{"location":"tutorial/#Allocating-data","page":"Tutorial","title":"Allocating data","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Next, we want to apply the plan on some data. Transforms may only be applied on PencilArrays, which are array wrappers that include MPI decomposition information (in some sense, analogous to DistributedArrays in Julia's distributed computing approach). The helper function allocate_input can be used to allocate a PencilArray that is compatible with our plan:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# In our example, this returns a 3D PencilArray of real data (Float64).\nu = allocate_input(plan)\n\n# Fill the array with some (random) data\nusing Random\nrandn!(u)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"PencilArrays are a subtype of AbstractArray, and thus they support all common array operations.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Similarly, to preallocate output data, one can use allocate_output:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# In our example, this returns a 3D PencilArray of complex data (Complex{Float64}).\nv = allocate_output(plan)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"This is only required if one wants to apply the plans using a preallocated output (with mul!, see right below).","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The data types returned by allocate_input and allocate_output are slightly different when working with in-place transforms. See the in-place example for details.","category":"page"},{"location":"tutorial/#Applying-plans","page":"Tutorial","title":"Applying plans","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The interface to apply plans is consistent with that of AbstractFFTs. Namely, * and mul! are respectively used for forward transforms without and with preallocated output data. Similarly, \\ and ldiv! are used for backward transforms.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"using LinearAlgebra # for mul!, ldiv!\n\n# Apply plan on `u` with `v` as an output\nmul!(v, plan, u)\n\n# Apply backward plan on `v` with `w` as an output\nw = similar(u)\nldiv!(w, plan, v) # now w ≈ u","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Note that, consistently with AbstractFFTs, normalisation is performed at the end of a backward transform, so that the original data is recovered when applying a forward followed by a backward transform.","category":"page"},{"location":"tutorial/#Accessing-and-modifying-data","page":"Tutorial","title":"Accessing and modifying data","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For any given MPI process, a PencilArray holds the data associated to its local partition in the global geometry. PencilArrays are accessed using local indices that start at 1, regardless of the location of the local process in the MPI topology. Note that PencilArrays, being based on regular Arrays, support both linear and Cartesian indexing (see the Julia docs for details).","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For convenience, the global_view function can be used to generate an OffsetArray wrapper that takes global indices.","category":"page"},{"location":"tutorial/#tutorial:output_data_layout","page":"Tutorial","title":"Output data layout","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"In memory, the dimensions of the transform output are by default reversed with respect to the input. That is, if the order of indices in the input data is (x, y, z), then the output has order (z, y, x) in memory. This detail is hidden from the user, and output arrays are always accessed in the same order as the input data, regardless of the underlying output dimension permutation. This applies to PencilArrays and to OffsetArrays returned by global_view.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The reasoning behind dimension permutations, is that they allow to always perform FFTs along the fastest array dimension and to avoid a local data transposition, resulting in performance gains. A similar approach is followed by other parallel FFT libraries. FFTW itself, in its distributed-memory routines, includes a flag that enables a similar behaviour. In PencilFFTs, index permutation is the default, but it can be disabled via the permute_dims flag of PencilFFTPlan.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"A great deal of work has been spent in making generic index permutations as efficient as possible, both in intermediate and in the output state of the multidimensional transforms. This has been achieved, in part, by making sure that permutations such as (3, 2, 1) are compile-time constants.","category":"page"},{"location":"tutorial/#Further-reading","page":"Tutorial","title":"Further reading","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For details on working with PencilArrays see the PencilArrays docs.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The examples on the sidebar further illustrate the use of transforms and provide an introduction to working with MPI-distributed data in the form of PencilArrays. In particular, the gradient example illustrates different ways of computing things using Fourier-transformed distributed arrays. Then, the incompressible Navier–Stokes example is a more advanced and complete example of a possible application of the PencilFFTs package.","category":"page"},{"location":"benchmarks/#Benchmarks","page":"Benchmarks","title":"Benchmarks","text":"","category":"section"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The performance of PencilFFTs.jl is comparable to that of other open-source parallel FFT libraries implemented in lower-level languages. Below, we show comparisons with the Fortran implementation of P3DFFT, possibly the most popular of these libraries. The benchmarks were performed on the Jean–Zay cluster of the IDRIS French computing centre (CNRS).","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The figure below shows strong scaling benchmarks of 3D real-to-complex FFTs using 2D (\"pencil\") decomposition. The benchmarks were run for input arrays of dimensions N_x N_y N_z = 512^3, 1024^3 and 2048^3. Each timing is averaged over 100 repetitions.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"
\n \n \n
","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"As seen above, PencilFFTs generally outperforms P3DFFT in its default setting. This is largely explained by the choice of using non-blocking point-to-point MPI communications (via MPI_Isend and MPI_Irecv), while P3DFFT uses collective MPI_Alltoallv calls. This enables PencilFFTs to perform data reordering operations on the partially received data while waiting for the incoming data, leading to better performance. Moreover, in contrast with P3DFFT, the high performance and scalability of PencilFFTs results from a highly generic code, handling decompositions in arbitrary dimensions and a relatively large (and extensible) variety of transformations.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"Note that PencilFFTs can optionally use collective communications (using MPI_Alltoallv) instead of point-to-point communications. For details, see the docs for PencilFFTPlan and for PencilArray transpositions. As seen above, collective communications generally perform worse than point-to-point ones, and runtimes are nearly indistinguishable from those of P3DFFT.","category":"page"},{"location":"benchmarks/#Benchmark-details","page":"Benchmarks","title":"Benchmark details","text":"","category":"section"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The benchmarks were performed using Julia 1.7-beta3 and Intel MPI 2019. We used PencilFFTs v0.12.5 with FFTW.jl v1.4.3 and MPI.jl v0.19.0. We used the Fortran implementation of P3DFFT, version 2.7.6, which was built with Intel 2019 compilers and linked to FFTW 3.3.8. The cluster where the benchmarks were run has Intel Cascade Lake 6248 processors with 2×20 cores per node.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The number of MPI processes along each decomposed dimension, P_1 and P_2, was automatically determined by a call to MPI_Dims_create, which tends to create a balanced decomposition with P_1 P_2. For instance, a total of 1024 processes is divided into P_1 = P_2 = 32. Different results may be obtained with other combinations, but this was not benchmarked.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The source files used to generate this benchmark, as well as the raw benchmark results, are all available in the PencilFFTs repo.","category":"page"},{"location":"GlobalFFTParams/#Global-FFT-parameters","page":"Global FFT parameters","title":"Global FFT parameters","text":"","category":"section"},{"location":"GlobalFFTParams/","page":"Global FFT parameters","title":"Global FFT parameters","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"GlobalFFTParams/","page":"Global FFT parameters","title":"Global FFT parameters","text":"GlobalFFTParams","category":"page"},{"location":"GlobalFFTParams/#PencilFFTs.GlobalFFTParams","page":"Global FFT parameters","title":"PencilFFTs.GlobalFFTParams","text":"GlobalFFTParams{T, N, inplace}\n\nSpecifies the global parameters for an N-dimensional distributed transform. These include the element type T and global data sizes of input and output data, as well as the transform types to be performed along each dimension.\n\n\n\nGlobalFFTParams(size_global, transforms, [real_type=Float64])\n\nDefine parameters for N-dimensional transform.\n\ntransforms must be a tuple of length N specifying the transforms to be applied along each dimension. Each element must be a subtype of Transforms.AbstractTransform. For all the possible transforms, see Transform types.\n\nThe element type must be a real type accepted by FFTW, i.e. either Float32 or Float64.\n\nNote that the transforms are applied one dimension at a time, with the leftmost dimension first for forward transforms.\n\nExample\n\nTo perform a 3D FFT of real data, first a real-to-complex FFT must be applied along the first dimension, followed by two complex-to-complex FFTs along the other dimensions:\n\njulia> size_global = (64, 32, 128); # size of real input data\n\njulia> transforms = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT());\n\njulia> fft_params = PencilFFTs.GlobalFFTParams(size_global, transforms)\nTransforms: (RFFT, FFT, FFT)\nInput type: Float64\nGlobal dimensions: (64, 32, 128) -> (33, 32, 128)\n\n\n\n\n\n","category":"type"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"EditURL = \"../../examples/in-place.jl\"","category":"page"},{"location":"generated/in-place/#In-place-transforms","page":"In-place transforms","title":"In-place transforms","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Complex-to-complex and real-to-real transforms can be performed in-place, enabling important memory savings. The procedure is very similar to that of out-of-place transforms described in the tutorial. The differences are illustrated in the sections below.","category":"page"},{"location":"generated/in-place/#Creating-a-domain-partition","page":"In-place transforms","title":"Creating a domain partition","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"We start by partitioning a domain of dimensions 163264 along all available MPI processes.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"using PencilFFTs\nusing MPI\nMPI.Init()\n\ndims_global = (16, 32, 64) # global dimensions","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Such a partitioning is described by a Pencil object. Here we choose to decompose the domain along the last two dimensions. In this case, the actual number of processes along each of these dimensions is chosen automatically.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"decomp_dims = (2, 3)\ncomm = MPI.COMM_WORLD\npen = Pencil(dims_global, decomp_dims, comm)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"warning: Allowed decompositions\nDistributed transforms using PencilFFTs.jl require that the first dimension is not decomposed. In other words, if one wants to perform transforms, then decomp_dims above must not contain 1.","category":"page"},{"location":"generated/in-place/#Creating-in-place-plans","page":"In-place transforms","title":"Creating in-place plans","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"To create an in-place plan, pass an in-place transform such as Transforms.FFT! or Transforms.R2R! to PencilFFTPlan. For instance:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"# Perform a 3D in-place complex-to-complex FFT.\ntransform = Transforms.FFT!()\n\n# Note that one can also combine different types of in-place transforms.\n# For instance:\n# transform = (\n# Transforms.R2R!(FFTW.REDFT01),\n# Transforms.FFT!(),\n# Transforms.R2R!(FFTW.DHT),\n# )","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"We can now create a distributed plan from the previously-created domain partition and the chosen transform.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan = PencilFFTPlan(pen, transform)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Note that in-place real-to-complex transforms are not currently supported. (In other words, the RFFT! transform type is not defined.)","category":"page"},{"location":"generated/in-place/#Allocating-data","page":"In-place transforms","title":"Allocating data","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"As with out-of-place plans, data should be allocated using allocate_input. The difference is that, for in-place plans, this function returns a ManyPencilArray object, which is a container holding multiple PencilArray views sharing the same memory space.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"# Allocate data for the plan.\n# Since `plan` is in-place, this returns a `ManyPencilArray` container.\nA = allocate_input(plan)\nsummary(A)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Note that allocate_output also works for in-place plans. In this case, it returns exactly the same thing as allocate_input.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"As shown in the next section, in-place plans must be applied on the returned ManyPencilArray. On the other hand, one usually wants to access and modify data, and for this one needs the PencilArray views contained in the ManyPencilArray. The input and output array views can be obtained by calling first(::ManyPencilArray) and last(::ManyPencilArray).","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"For instance, we can initialise the input array with some data before transforming:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"using Random\nu_in = first(A) # input data view\nrandn!(u_in)\nsummary(u_in)","category":"page"},{"location":"generated/in-place/#Applying-plans","page":"In-place transforms","title":"Applying plans","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Like in FFTW.jl, one can perform in-place transforms using the * and \\ operators. As mentioned above, in-place plans must be applied on the ManyPencilArray containers returned by allocate_input.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan * A; # performs in-place forward transform\nnothing #hide","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"After performing an in-place transform, data contained in u_in has been overwritten and has no \"physical\" meaning. In other words, u_in should not be used at this point. To access the transformed data, one should retrieve the output data view using last(A).","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"For instance, to compute the global sum of the transformed data:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"u_out = last(A) # output data view\nsum(u_out) # sum of transformed data (note that `sum` reduces over all processes)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Finally, we can perform a backward transform and do stuff with the input view:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan \\ A; # perform in-place backward transform\nnothing #hide","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"At this point, the data can be once again found in the input view u_in, while u_out should not be accessed.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"This page was generated using Literate.jl.","category":"page"},{"location":"Transforms/#Available-transforms","page":"Available transforms","title":"Available transforms","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"CurrentModule = PencilFFTs.Transforms","category":"page"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"Transforms","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms","page":"Available transforms","title":"PencilFFTs.Transforms","text":"Defines different one-dimensional FFT-based transforms.\n\nThe transforms are all subtypes of an AbstractTransform type.\n\nWhen possible, the names of the transforms are kept consistent with the functions exported by AbstractFFTs.jl and FFTW.jl.\n\n\n\n\n\n","category":"module"},{"location":"Transforms/#Transform-types","page":"Available transforms","title":"Transform types","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"FFT\nFFT!\nBFFT\nBFFT!\n\nRFFT\nRFFT!\nBRFFT\nBRFFT!\n\nR2R\nR2R!\n\nNoTransform\nNoTransform!","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.FFT","page":"Available transforms","title":"PencilFFTs.Transforms.FFT","text":"FFT()\n\nComplex-to-complex FFT.\n\nSee also AbstractFFTs.fft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.FFT!","page":"Available transforms","title":"PencilFFTs.Transforms.FFT!","text":"FFT!()\n\nIn-place version of FFT.\n\nSee also AbstractFFTs.fft!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BFFT","page":"Available transforms","title":"PencilFFTs.Transforms.BFFT","text":"BFFT()\n\nUnnormalised backward complex-to-complex FFT.\n\nLike AbstractFFTs.bfft, this transform is not normalised. To obtain the inverse transform, divide the output by the length of the transformed dimension.\n\nSee also AbstractFFTs.bfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BFFT!","page":"Available transforms","title":"PencilFFTs.Transforms.BFFT!","text":"BFFT()\n\nIn-place version of BFFT.\n\nSee also AbstractFFTs.bfft!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.RFFT","page":"Available transforms","title":"PencilFFTs.Transforms.RFFT","text":"RFFT()\n\nReal-to-complex FFT.\n\nSee also AbstractFFTs.rfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.RFFT!","page":"Available transforms","title":"PencilFFTs.Transforms.RFFT!","text":"RFFT!()\n\nIn-place version of RFFT.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BRFFT","page":"Available transforms","title":"PencilFFTs.Transforms.BRFFT","text":"BRFFT(d::Integer)\nBRFFT((d1, d2, ..., dN))\n\nUnnormalised inverse of RFFT.\n\nTo obtain the inverse transform, divide the output by the length of the transformed dimension (of the real output array).\n\nAs described in the AbstractFFTs docs, the length of the output cannot be fully inferred from the input length. For this reason, the BRFFT constructor accepts an optional d argument indicating the output length.\n\nFor multidimensional datasets, a tuple of dimensions (d1, d2, ..., dN) may also be passed. This is equivalent to passing just dN. In this case, the last dimension (dN) is the one that changes size between the input and output. Note that this is the opposite of FFTW.brfft. The reason is that, in PencilFFTs, the last dimension is the one along which a complex-to-real transform is performed.\n\nSee also AbstractFFTs.brfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BRFFT!","page":"Available transforms","title":"PencilFFTs.Transforms.BRFFT!","text":"BRFFT!(d::Integer)\nBRFFT!((d1, d2, ..., dN))\n\nIn-place version of BRFFT.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.R2R","page":"Available transforms","title":"PencilFFTs.Transforms.R2R","text":"R2R(kind)\n\nReal-to-real transform of type kind.\n\nThe possible values of kind are those described in the FFTW.r2r docs and the FFTW manual:\n\ndiscrete cosine transforms: FFTW.REDFT00, FFTW.REDFT01, FFTW.REDFFT10, FFTW.REDFFT11\ndiscrete sine transforms: FFTW.RODFT00, FFTW.RODFT01, FFTW.RODFFT10, FFTW.RODFFT11\ndiscrete Hartley transform: FFTW.DHT\n\nNote: half-complex format DFTs (FFTW.R2HC, FFTW.HC2R) are not currently supported.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.R2R!","page":"Available transforms","title":"PencilFFTs.Transforms.R2R!","text":"R2R!(kind)\n\nIn-place version of R2R.\n\nSee also FFTW.r2r!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.NoTransform","page":"Available transforms","title":"PencilFFTs.Transforms.NoTransform","text":"NoTransform()\n\nIdentity transform.\n\nSpecifies that no transformation should be applied.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.NoTransform!","page":"Available transforms","title":"PencilFFTs.Transforms.NoTransform!","text":"NoTransform!()\n\nIn-place version of NoTransform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#Internals","page":"Available transforms","title":"Internals","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"What follows is used internally in PencilFFTs.","category":"page"},{"location":"Transforms/#Types","page":"Available transforms","title":"Types","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"AbstractCustomPlan\nAbstractTransform\nIdentityPlan\nIdentityPlan!\nPlan","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.AbstractCustomPlan","page":"Available transforms","title":"PencilFFTs.Transforms.AbstractCustomPlan","text":"AbstractCustomPlan\n\nAbstract type defining a custom plan, to be used as an alternative to FFTW plans (FFTW.FFTWPlan).\n\nThe only custom plan defined in this module is IdentityPlan. The user can define other custom plans that are also subtypes of AbstractCustomPlan.\n\nNote that plan returns a subtype of either AbstractFFTs.Plan or AbstractCustomPlan.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.AbstractTransform","page":"Available transforms","title":"PencilFFTs.Transforms.AbstractTransform","text":"AbstractTransform\n\nSpecifies a one-dimensional FFT-based transform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.IdentityPlan","page":"Available transforms","title":"PencilFFTs.Transforms.IdentityPlan","text":"IdentityPlan\n\nType of plan associated to NoTransform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.IdentityPlan!","page":"Available transforms","title":"PencilFFTs.Transforms.IdentityPlan!","text":"IdentityPlan!\n\nType of plan associated to NoTransform!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.Plan","page":"Available transforms","title":"PencilFFTs.Transforms.Plan","text":"Plan = Union{AbstractFFTs.Plan, AbstractCustomPlan}\n\nUnion type representing any plan returned by plan.\n\nSee also AbstractCustomPlan.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#Functions","page":"Available transforms","title":"Functions","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"plan\n\nbinv\nscale_factor\n\neltype_input\neltype_output\nexpand_dims\nis_inplace\nkind\nlength_output","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.plan","page":"Available transforms","title":"PencilFFTs.Transforms.plan","text":"plan(transform::AbstractTransform, A, [dims];\n flags=FFTW.ESTIMATE, timelimit=Inf)\n\nCreate plan to transform array A along dimensions dims.\n\nIf dims is not specified, all dimensions of A are transformed.\n\nFor FFT plans, this function wraps the AbstractFFTs.jl and FFTW.jl plan creation functions. For more details on the function arguments, see AbstractFFTs.plan_fft.\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.binv","page":"Available transforms","title":"PencilFFTs.Transforms.binv","text":"binv(transform::AbstractTransform, d::Integer)\n\nReturns the backwards transform associated to the given transform.\n\nThe second argument must be the length of the first transformed dimension in the forward transform. It is used in particular when transform = RFFT(), to determine the length of the inverse (complex-to-real) transform. See the AbstractFFTs.irfft docs for details.\n\nThe backwards transform returned by this function is not normalised. The normalisation factor for a given array can be obtained by calling scale_factor.\n\nExample\n\njulia> binv(Transforms.FFT(), 42)\nBFFT\n\njulia> binv(Transforms.BRFFT(9), 42)\nRFFT\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.scale_factor","page":"Available transforms","title":"PencilFFTs.Transforms.scale_factor","text":"scale_factor(transform::AbstractTransform, A, [dims = 1:ndims(A)])\n\nGet factor required to normalise the given array after a transformation along dimensions dims (all dimensions by default).\n\nThe array A must have the dimensions of the transform input.\n\nImportant: the dimensions dims must be the same that were passed to plan.\n\nExamples\n\njulia> C = zeros(ComplexF32, 3, 4, 5);\n\njulia> scale_factor(Transforms.FFT(), C)\n60\n\njulia> scale_factor(Transforms.BFFT(), C)\n60\n\njulia> scale_factor(Transforms.BFFT(), C, 2:3)\n20\n\njulia> R = zeros(Float64, 3, 4, 5);\n\njulia> scale_factor(Transforms.RFFT(), R, 2)\n4\n\njulia> scale_factor(Transforms.RFFT(), R, 2:3)\n20\n\njulia> scale_factor(Transforms.BRFFT(8), C)\n96\n\njulia> scale_factor(Transforms.BRFFT(9), C)\n108\n\nThis will fail because the input of RFFT is real, and R is a complex array:\n\njulia> scale_factor(Transforms.RFFT(), C, 2:3)\nERROR: MethodError: no method matching scale_factor(::PencilFFTs.Transforms.RFFT, ::Array{ComplexF32, 3}, ::UnitRange{Int64})\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.eltype_input","page":"Available transforms","title":"PencilFFTs.Transforms.eltype_input","text":"eltype_input(transform::AbstractTransform, real_type<:AbstractFloat)\n\nDetermine input data type for a given transform given the floating point precision of the input data.\n\nSome transforms, such as R2R and NoTransform, can take both real and complex data. For those kinds of transforms, nothing is returned.\n\nExample\n\njulia> eltype_input(Transforms.FFT(), Float32)\nComplexF32 (alias for Complex{Float32})\n\njulia> eltype_input(Transforms.RFFT(), Float64)\nFloat64\n\njulia> eltype_input(Transforms.R2R(FFTW.REDFT01), Float64) # nothing\n\njulia> eltype_input(Transforms.NoTransform(), Float64) # nothing\n\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.eltype_output","page":"Available transforms","title":"PencilFFTs.Transforms.eltype_output","text":"eltype_output(transform::AbstractTransform, eltype_input)\n\nReturns the output data type for a given transform given the input type.\n\nThrows ArgumentError if the input data type is incompatible with the transform type.\n\nExample\n\njulia> eltype_output(Transforms.NoTransform(), Float32)\nFloat32\n\njulia> eltype_output(Transforms.RFFT(), Float64)\nComplexF64 (alias for Complex{Float64})\n\njulia> eltype_output(Transforms.BRFFT(4), ComplexF32)\nFloat32\n\njulia> eltype_output(Transforms.FFT(), Float64)\nERROR: ArgumentError: invalid input data type for PencilFFTs.Transforms.FFT: Float64\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.expand_dims","page":"Available transforms","title":"PencilFFTs.Transforms.expand_dims","text":"expand_dims(transform::AbstractTransform, Val(N))\n\nExpand a single multidimensional transform into one transform per dimension.\n\nExample\n\n# Expand a real-to-complex transform in 3 dimensions.\njulia> expand_dims(Transforms.RFFT(), Val(3))\n(RFFT, FFT, FFT)\n\njulia> expand_dims(Transforms.BRFFT(4), Val(3))\n(BFFT, BFFT, BRFFT{even})\n\njulia> expand_dims(Transforms.NoTransform(), Val(2))\n(NoTransform, NoTransform)\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.is_inplace","page":"Available transforms","title":"PencilFFTs.Transforms.is_inplace","text":"is_inplace(transform::AbstractTransform) -> Bool\nis_inplace(transforms::Vararg{AbtractTransform}) -> Union{Bool, Nothing}\n\nCheck whether a transform or a list of transforms is performed in-place.\n\nIf the list of transforms has a combination of in-place and out-of-place transforms, nothing is returned.\n\nExample\n\njulia> is_inplace(Transforms.RFFT())\nfalse\n\njulia> is_inplace(Transforms.NoTransform!())\ntrue\n\njulia> is_inplace(Transforms.FFT!(), Transforms.R2R!(FFTW.REDFT01))\ntrue\n\njulia> is_inplace(Transforms.FFT(), Transforms.R2R(FFTW.REDFT01))\nfalse\n\njulia> is_inplace(Transforms.FFT(), Transforms.R2R!(FFTW.REDFT01)) === nothing\ntrue\n\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.kind","page":"Available transforms","title":"PencilFFTs.Transforms.kind","text":"kind(transform::R2R)\n\nGet kind of real-to-real transform.\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.length_output","page":"Available transforms","title":"PencilFFTs.Transforms.length_output","text":"length_output(transform::AbstractTransform, length_in::Integer)\n\nReturns the length of the transform output, given the length of its input.\n\nThe input and output lengths are specified in terms of the respective input and output datatypes. For instance, for real-to-complex transforms, these are respectively the length of input real data and of output complex data.\n\n\n\n\n\n","category":"function"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"EditURL = \"../../examples/navier_stokes.jl\"","category":"page"},{"location":"generated/navier_stokes/#Navier–Stokes-equations","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"In this example, we numerically solve the incompressible Navier–Stokes equations","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"_t bmv + (bmv bm) bmv = -frac1ρ bm p + ν ^2 bmv\nquad bm bmv = 0","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where bmv(bmx t) and p(bmx t) are respectively the velocity and pressure fields, ν is the fluid kinematic viscosity and ρ is the fluid density.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We solve the above equations a 3D periodic domain using a standard Fourier pseudo-spectral method.","category":"page"},{"location":"generated/navier_stokes/#First-steps","page":"Navier–Stokes equations","title":"First steps","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We start by loading the required packages, initialising MPI and setting the simulation parameters.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using MPI\nusing PencilFFTs\n\nMPI.Init()\ncomm = MPI.COMM_WORLD\nprocid = MPI.Comm_rank(comm) + 1\n\n# Simulation parameters\nNs = (64, 64, 64) # = (Nx, Ny, Nz)\nLs = (2π, 2π, 2π) # = (Lx, Ly, Lz)\n\n# Collocation points (\"global\" = over all processes).\n# We include the endpoint (length = N + 1) for convenience.\nxs_global = map((N, L) -> range(0, L; length = N + 1), Ns, Ls) # = (x, y, z)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's check the number of MPI processes over which we're running our simulation:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"MPI.Comm_size(comm)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can now create a partitioning of the domain based on the number of grid points (Ns) and on the number of MPI processes. There are different ways to do this. For simplicity, here we do it automatically following the PencilArrays.jl docs:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"pen = Pencil(Ns, comm)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"The subdomain associated to the local MPI process can be obtained using range_local:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"range_local(pen)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We now construct a distributed vector field that follows the decomposition configuration we just created:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v⃗₀ = (\n PencilArray{Float64}(undef, pen), # vx\n PencilArray{Float64}(undef, pen), # vy\n PencilArray{Float64}(undef, pen), # vz\n)\nsummary(v⃗₀[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We still need to fill this array with interesting values that represent a physical velocity field.","category":"page"},{"location":"generated/navier_stokes/#Initial-condition","page":"Navier–Stokes equations","title":"Initial condition","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's set the initial condition in physical space. In this example, we choose the Taylor–Green vortex configuration as an initial condition:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"beginaligned\nv_x(x y z) = u₀ sin(k₀ x) cos(k₀ y) cos(k₀ z) \nv_y(x y z) = -u₀ cos(k₀ x) sin(k₀ y) cos(k₀ z) \nv_z(x y z) = 0\nendaligned","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where u₀ and k₀ are two parameters setting the amplitude and the period of the velocity field.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To set the initial condition, each MPI process needs to know which portion of the physical grid it has been attributed. For this, PencilArrays.jl includes a localgrid helper function:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"grid = localgrid(pen, xs_global)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can use this to initialise the velocity field:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"u₀ = 1.0\nk₀ = 2π / Ls[1] # should be integer if L = 2π (to preserve periodicity)\n\n@. v⃗₀[1] = u₀ * sin(k₀ * grid.x) * cos(k₀ * grid.y) * cos(k₀ * grid.z)\n@. v⃗₀[2] = -u₀ * cos(k₀ * grid.x) * sin(k₀ * grid.y) * cos(k₀ * grid.z)\n@. v⃗₀[3] = 0\nnothing # hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's plot a 2D slice of the velocity field managed by the local MPI process:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using GLMakie\n\n# Compute the norm of a vector field represented by a tuple of arrays.\nfunction vecnorm(v⃗::NTuple)\n vnorm = similar(v⃗[1])\n for n ∈ eachindex(v⃗[1])\n w = zero(eltype(vnorm))\n for v ∈ v⃗\n w += v[n]^2\n end\n vnorm[n] = sqrt(w)\n end\n vnorm\nend\n\n# This is useful for passing coordinates to Makie.contour!\nto_intervals(grid) = map(xs -> xs[begin]..xs[end], grid.coords)\n\nlet fig = Figure(size = (700, 600))\n ax = Axis3(fig[1, 1]; aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\")\n vnorm = parent(vecnorm(v⃗₀)) # use `parent` because Makie doesn't like custom array types...\n ct = contour!(\n ax, to_intervals(grid)..., vnorm;\n alpha = 0.2, levels = 4,\n colormap = :viridis,\n colorrange = (0.0, 1.0),\n highclip = (:red, 0.2), lowclip = (:green, 0.2),\n )\n cb = Colorbar(fig[1, 2], ct; label = \"Velocity magnitude\")\n fig\nend","category":"page"},{"location":"generated/navier_stokes/#Velocity-in-Fourier-space","page":"Navier–Stokes equations","title":"Velocity in Fourier space","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"In the Fourier pseudo-spectral method, the periodic velocity field is discretised in space as a truncated Fourier series","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"bmv(bmx t) =\n_bmk hatbmv_bmk(t) e^i bmk bmx","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where bmk = (k_x k_y k_z) are the discrete wave numbers.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"The wave numbers can be obtained using the fftfreq function. Since we perform a real-to-complex transform along the first dimension, we use rfftfreq instead for k_x:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using AbstractFFTs: fftfreq, rfftfreq\n\nks_global = (\n rfftfreq(Ns[1], 2π * Ns[1] / Ls[1]), # kx | real-to-complex\n fftfreq(Ns[2], 2π * Ns[2] / Ls[2]), # ky | complex-to-complex\n fftfreq(Ns[3], 2π * Ns[3] / Ls[3]), # kz | complex-to-complex\n)\n\nks_global[1]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ks_global[2]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ks_global[3]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To transform the velocity field to Fourier space, we first create a real-to-complex FFT plan to be applied to one of the velocity components:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"plan = PencilFFTPlan(v⃗₀[1], Transforms.RFFT())","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"See PencilFFTPlan for details on creating plans and on optional keyword arguments.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can now apply this plan to the three velocity components to obtain the respective Fourier coefficients hatbmv_bmk:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v̂s = plan .* v⃗₀\nsummary(v̂s[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Note that, in Fourier space, the domain decomposition is performed along the directions x and y:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"pencil(v̂s[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"This is because the 3D FFTs are performed one dimension at a time, with the x direction first and the z direction last. To efficiently perform an FFT along a given direction (taking advantage of serial FFT implementations like FFTW), all the data along that direction must be contained locally within a single MPI process. For that reason, data redistributions (or transpositions) among MPI processes are performed behind the scenes during each FFT computation. Such transpositions require important communications between MPI processes, and are usually the most time-consuming aspect of massively-parallel simulations using this kind of methods.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To solve the Navier–Stokes equations in Fourier space, we will also need the respective wave numbers bmk associated to the local MPI process. Similarly to the local grid points, these are obtained using the localgrid function:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"grid_fourier = localgrid(v̂s[1], ks_global)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"As an example, let's first use this to compute and plot the vorticity associated to the initial condition. The vorticity is defined as the curl of the velocity, bmω = bm bmv. In Fourier space, this becomes hatbmω = i bmk hatbmv.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using StaticArrays: SVector\nusing LinearAlgebra: ×\n\nfunction curl_fourier!(\n ω̂s::NTuple{N, <:PencilArray}, v̂s::NTuple{N, <:PencilArray}, grid_fourier,\n ) where {N}\n @inbounds for I ∈ eachindex(grid_fourier)\n # We use StaticArrays for the cross product between small vectors.\n ik⃗ = im * SVector(grid_fourier[I])\n v⃗ = SVector(getindex.(v̂s, Ref(I))) # = (v̂s[1][I], v̂s[2][I], ...)\n ω⃗ = ik⃗ × v⃗\n for n ∈ eachindex(ω⃗)\n ω̂s[n][I] = ω⃗[n]\n end\n end\n ω̂s\nend\n\nω̂s = similar.(v̂s)\ncurl_fourier!(ω̂s, v̂s, grid_fourier);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We finally transform back to physical space and plot the result:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ωs = plan .\\ ω̂s\n\nlet fig = Figure(size = (700, 600))\n ax = Axis3(fig[1, 1]; aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\")\n ω_norm = parent(vecnorm(ωs))\n ct = contour!(\n ax, to_intervals(grid)..., ω_norm;\n alpha = 0.1, levels = 0.8:0.2:2.0,\n colormap = :viridis, colorrange = (0.8, 2.0),\n highclip = (:red, 0.2), lowclip = (:green, 0.2),\n )\n cb = Colorbar(fig[1, 2], ct; label = \"Vorticity magnitude\")\n fig\nend","category":"page"},{"location":"generated/navier_stokes/#Computing-the-non-linear-term","page":"Navier–Stokes equations","title":"Computing the non-linear term","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"One can show that, in Fourier space, the incompressible Navier–Stokes equations can be written as","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"_t hatbmv_bmk =\n- mathcalP_bmk left widehat(bmv bm) bmv right\n- ν bmk^2 hatbmv_bmk\nquad text with quad\nmathcalP_bmk(hatbmF_bmk) = left( I - fracbmk \nbmkbmk^2 right) hatbmF_bmk","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where mathcalP_bmk is a projection operator allowing to preserve the incompressibility condition bm bmv = 0. This operator encodes the action of the pressure gradient term, which serves precisely to enforce incompressibility. Note that, because of this, the pressure gradient dissapears from the equations.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Now that we have the wave numbers bmk, computing the linear viscous term in Fourier space is straighforward once we have the Fourier coefficients hatbmv_bmk of the velocity field. What is slightly more challenging (and much more costly) is the computation of the non-linear term in Fourier space, hatbmF_bmk = left widehat(bmv bm) bmv right_bmk. In the pseudo-spectral method, the quadratic nonlinearity is computed by collocation in physical space (i.e. this term is evaluated at grid points), while derivatives are computed in Fourier space. This requires transforming fields back and forth between both spaces.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Below we implement a function that computes the non-linear term in Fourier space based on its convective form (bmv bm) bmv = bm (bmv bmv). Note that this equivalence uses the incompressibility condition bm bmv = 0.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using LinearAlgebra: mul!, ldiv! # for applying FFT plans in-place\n\n# Compute non-linear term in Fourier space from velocity field in physical\n# space. Optional keyword arguments may be passed to avoid allocations.\nfunction ns_nonlinear!(\n F̂s, vs, plan, grid_fourier;\n vbuf = similar(vs[1]), v̂buf = similar(F̂s[1]),\n )\n # Compute F_i = ∂_j (v_i v_j) for each i.\n # In Fourier space: F̂_i = im * k_j * FFT(v_i * v_j)\n w, ŵ = vbuf, v̂buf\n @inbounds for (i, F̂i) ∈ enumerate(F̂s)\n F̂i .= 0\n vi = vs[i]\n for (j, vj) ∈ enumerate(vs)\n w .= vi .* vj # w = v_i * v_j in physical space\n mul!(ŵ, plan, w) # same in Fourier space\n # Add derivative in Fourier space\n for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n kj = k⃗[j]\n F̂i[I] += im * kj * ŵ[I]\n end\n end\n end\n F̂s\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"As an example, let's use this function on our initial velocity field:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"F̂s = similar.(v̂s)\nns_nonlinear!(F̂s, v⃗₀, plan, grid_fourier);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Strictly speaking, computing the non-linear term by collocation can lead to aliasing errors, as the quadratic term excites Fourier modes that fall beyond the range of resolved wave numbers. The typical solution is to apply Orzsag's 2/3 rule to zero-out the Fourier coefficients associated to the highest wave numbers. We define a function that applies this procedure below.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function dealias_twothirds!(ŵs::Tuple, grid_fourier, ks_global)\n ks_max = maximum.(abs, ks_global) # maximum stored wave numbers (kx_max, ky_max, kz_max)\n ks_lim = (2 / 3) .* ks_max\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I]\n if any(abs.(k⃗) .> ks_lim)\n for ŵ ∈ ŵs\n ŵ[I] = 0\n end\n end\n end\n ŵs\nend\n\n# We can apply this on the previously computed non-linear term:\ndealias_twothirds!(F̂s, grid_fourier, ks_global);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Finally, we implement the projection associated to the incompressibility condition:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function project_divergence_free!(ûs, grid_fourier)\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I]\n k² = sum(abs2, k⃗)\n iszero(k²) && continue # avoid division by zero\n û = getindex.(ûs, Ref(I)) # (ûs[1][I], ûs[2][I], ...)\n for i ∈ eachindex(û)\n ŵ = û[i]\n for j ∈ eachindex(û)\n ŵ -= k⃗[i] * k⃗[j] * û[j] / k²\n end\n ûs[i][I] = ŵ\n end\n end\n ûs\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can verify the correctness of the projection operator by checking that the initial velocity field is not modified by it, since it is already incompressible:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v̂s_proj = project_divergence_free!(copy.(v̂s), grid_fourier)\nv̂s_proj .≈ v̂s # the last one may be false because v_z = 0 initially","category":"page"},{"location":"generated/navier_stokes/#Putting-it-all-together","page":"Navier–Stokes equations","title":"Putting it all together","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To perform the time integration of the Navier–Stokes equations, we will use the timestepping routines implemented in the DifferentialEquations.jl suite. For simplicity, we use here an explicit Runge–Kutta scheme. In this case, we just need to write a function that computes the right-hand side of the Navier–Stokes equations in Fourier space:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function ns_rhs!(\n dvs::NTuple{N, <:PencilArray}, vs::NTuple{N, <:PencilArray}, p, t,\n ) where {N}\n # 1. Compute non-linear term and dealias it\n (; plan, cache, ks_global, grid_fourier) = p\n F̂s = cache.F̂s\n ns_nonlinear!(F̂s, vs, plan, grid_fourier; vbuf = dvs[1], v̂buf = cache.v̂s[1])\n dealias_twothirds!(F̂s, grid_fourier, ks_global)\n\n # 2. Project onto divergence-free space\n project_divergence_free!(F̂s, grid_fourier)\n\n # 3. Transform velocity to Fourier space\n v̂s = cache.v̂s\n map((v, v̂) -> mul!(v̂, plan, v), vs, v̂s)\n\n # 4. Add viscous term (and multiply projected non-linear term by -1)\n ν = p.ν\n for n ∈ eachindex(v̂s)\n v̂ = v̂s[n]\n F̂ = F̂s[n]\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n k² = sum(abs2, k⃗)\n F̂[I] = -F̂[I] - ν * k² * v̂[I]\n end\n end\n\n # 5. Transform RHS back to physical space\n map((dv, dv̂) -> ldiv!(dv, plan, dv̂), dvs, F̂s)\n\n nothing\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"For the time-stepping, we load OrdinaryDiffEq.jl from the DifferentialEquations.jl suite and set-up the simulation. Since DifferentialEquations.jl can't directly deal with tuples of arrays, we convert the input data to the ArrayPartition type and write an interface function to make things work with our functions defined above.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using OrdinaryDiffEqLowOrderRK # includes RK4\nusing RecursiveArrayTools: ArrayPartition\n\nns_rhs!(dv::ArrayPartition, v::ArrayPartition, args...) = ns_rhs!(dv.x, v.x, args...)\nvs_init_ode = ArrayPartition(v⃗₀)\nsummary(vs_init_ode)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We now define solver parameters and temporary variables, and initialise the problem:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"params = (;\n ν = 5e-3, # kinematic viscosity\n plan, grid_fourier, ks_global,\n cache = (\n v̂s = similar.(v̂s),\n F̂s = similar.(v̂s),\n )\n)\n\ntspan = (0.0, 10.0)\nprob = ODEProblem{true}(ns_rhs!, vs_init_ode, tspan, params)\nintegrator = init(prob, RK4(); dt = 1e-3, save_everystep = false);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We finally solve the problem over time and plot the vorticity associated to the solution. It is also useful to look at the energy spectrum E(k), to see if the small scales are correctly resolved. To obtain a turbulent flow, the viscosity ν must be small enough to allow the transient appearance of an energy cascade towards the small scales (i.e. from small to large k), while high enough to allow the small-scale motions to be correctly resolved.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function energy_spectrum!(Ek, ks, v̂s, grid_fourier)\n Nk = length(Ek)\n @assert Nk == length(ks)\n Ek .= 0\n for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n knorm = sqrt(sum(abs2, k⃗))\n i = searchsortedfirst(ks, knorm)\n i > Nk && continue\n v⃗ = getindex.(v̂s, Ref(I)) # = (v̂s[1][I], v̂s[2][I], ...)\n factor = k⃗[1] == 0 ? 1 : 2 # account for Hermitian symmetry and r2c transform\n Ek[i] += factor * sum(abs2, v⃗) / 2\n end\n MPI.Allreduce!(Ek, +, get_comm(v̂s[1])) # sum across all processes\n Ek\nend\n\nks = rfftfreq(Ns[1], 2π * Ns[1] / Ls[1])\nEk = similar(ks)\nv̂s = plan .* integrator.u.x\nenergy_spectrum!(Ek, ks, v̂s, grid_fourier)\nEk ./= scale_factor(plan)^2 # rescale energy\n\ncurl_fourier!(ω̂s, v̂s, grid_fourier)\nldiv!.(ωs, plan, ω̂s)\nω⃗_plot = Observable(ωs)\nk_plot = @view ks[2:end]\nE_plot = Observable(@view Ek[2:end])\nt_plot = Observable(integrator.t)\n\nfig = let\n fig = Figure(size = (1200, 600))\n ax = Axis3(\n fig[1, 1][1, 1]; title = @lift(\"t = $(round($t_plot, digits = 3))\"),\n aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\",\n )\n ω_mag = @lift parent(vecnorm($ω⃗_plot))\n ω_mag_norm = @lift $ω_mag ./ maximum($ω_mag)\n ct = contour!(\n ax, to_intervals(grid)..., ω_mag_norm;\n alpha = 0.3, levels = 3,\n colormap = :viridis, colorrange = (0.0, 1.0),\n highclip = (:red, 0.2), lowclip = (:green, 0.2),\n )\n cb = Colorbar(fig[1, 1][1, 2], ct; label = \"Normalised vorticity magnitude\")\n ax_sp = Axis(\n fig[1, 2];\n xlabel = \"k\", ylabel = \"E(k)\", xscale = log2, yscale = log10,\n title = \"Kinetic energy spectrum\",\n )\n ylims!(ax_sp, 1e-8, 1e0)\n scatterlines!(ax_sp, k_plot, E_plot)\n ks_slope = exp.(range(log(2.5), log(25.0), length = 3))\n E_fivethirds = @. 0.3 * ks_slope^(-5/3)\n @views lines!(ax_sp, ks_slope, E_fivethirds; color = :black, linestyle = :dot)\n text!(ax_sp, L\"k^{-5/3}\"; position = (ks_slope[2], E_fivethirds[2]), align = (:left, :bottom))\n fig\nend\n\nusing Printf # hide\nwith_xvfb::Bool = ENV[\"DISPLAY\"] == \":99\" # hide\nnstep::Int = 0 # hide\ntmpdir::String = mktempdir() # hide\nfilename_frame(procid, nstep) = joinpath(tmpdir, @sprintf(\"proc%d_%04d.png\", procid, nstep)) # hide\nrecord(fig, \"vorticity_proc$procid.mp4\"; framerate = 10) do io\n with_xvfb && recordframe!(io) # hide\n while integrator.t < 20\n dt = 0.001\n step!(integrator, dt)\n t_plot[] = integrator.t\n mul!.(v̂s, plan, integrator.u.x) # current velocity in Fourier space\n curl_fourier!(ω̂s, v̂s, grid_fourier)\n ldiv!.(ω⃗_plot[], plan, ω̂s)\n ω⃗_plot[] = ω⃗_plot[] # to force updating the plot\n energy_spectrum!(Ek, ks, v̂s, grid_fourier)\n Ek ./= scale_factor(plan)^2 # rescale energy\n E_plot[] = E_plot[]\n global nstep += 1 # hide\n with_xvfb ? # hide\n save(filename_frame(procid, nstep), fig) : # hide\n recordframe!(io)\n end\nend;\n\nif with_xvfb # hide\n run(pipeline(`ffmpeg -y -r 10 -i $tmpdir/proc$(procid)_%04d.png -c:v libx264 -vf \"fps=25,format=yuv420p\" vorticity_proc$procid.mp4`; stdout = \"ffmpeg.out\", stderr = \"ffmpeg.err\")) # hide\nend # hide\nnothing # hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"This page was generated using Literate.jl.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"EditURL = \"../../examples/gradient.jl\"","category":"page"},{"location":"generated/gradient/#Gradient-of-a-scalar-field","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This example shows different methods to compute the gradient of a real-valued 3D scalar field θ(bmx) in Fourier space, where bmx = (x y z). It is assumed that the field is periodic with period L = 2π along all dimensions.","category":"page"},{"location":"generated/gradient/#General-procedure","page":"Gradient of a scalar field","title":"General procedure","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The discrete Fourier expansion of θ writes","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"θ(bmx) = _bmk Z^3 hatθ(bmk) e^i bmk bmx","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"where bmk = (k_x k_y k_z) are the Fourier wave numbers and hatθ is the discrete Fourier transform of θ. Then, the spatial derivatives of θ are given by","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"frac θ(bmx) x_i =\n_bmk Z^3 i k_i hatθ(bmk) e^i bmk bmx","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"where the subscript i denotes one of the spatial components x, y or z.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In other words, to compute bm θ = (_x θ _y θ _z θ), one has to:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"transform θ to Fourier space to obtain hatθ,\nmultiply hatθ by i bmk,\ntransform the result back to physical space to obtain bm θ.","category":"page"},{"location":"generated/gradient/#Preparation","page":"Gradient of a scalar field","title":"Preparation","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In this section, we initialise a random real-valued scalar field θ and compute its FFT. For more details see the Tutorial.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"using MPI\nusing PencilFFTs\nusing Random\n\nMPI.Init()\n\n# Input data dimensions (Nx × Ny × Nz)\ndims = (64, 32, 64)\n\n# Apply a 3D real-to-complex (r2c) FFT.\ntransform = Transforms.RFFT()\n\n# Automatically create decomposition configuration\ncomm = MPI.COMM_WORLD\npen = Pencil(dims, comm)\n\n# Create plan\nplan = PencilFFTPlan(pen, transform)\n\n# Allocate data and initialise field\nθ = allocate_input(plan)\nrandn!(θ)\n\n# Perform distributed FFT\nθ_hat = plan * θ\nnothing # hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, we initialise the output that will hold ∇θ in Fourier space. Noting that ∇θ is a vector field, we choose to store it as a tuple of 3 PencilArrays.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"∇θ_hat = allocate_output(plan, Val(3))\n\n# This is equivalent:\n# ∇θ_hat = ntuple(d -> similar(θ_hat), Val(3))\n\nsummary(∇θ_hat)","category":"page"},{"location":"generated/gradient/#Fourier-wave-numbers","page":"Gradient of a scalar field","title":"Fourier wave numbers","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In general, the Fourier wave numbers are of the form k_i = 0 frac2πL_i frac4πL_i frac6πL_i , where L_i is the period along dimension i. When a real-to-complex Fourier transform is applied, roughly half of these wave numbers are redundant due to the Hermitian symmetry of the complex Fourier coefficients. In practice, this means that for the fastest dimension x (along which a real-to-complex transform is performed), the negative wave numbers are dropped, i.e. k_x = 0 frac2πL_x frac4πL_x .","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The AbstractFFTs package provides a convenient way to generate the Fourier wave numbers, using the functions fftfreq and rfftfreq. We can use these functions to initialise a \"grid\" of wave numbers associated to our 3D real-to-complex transform:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"using AbstractFFTs: fftfreq, rfftfreq\n\nbox_size = (2π, 2π, 2π) # Lx, Ly, Lz\nsample_rate = 2π .* dims ./ box_size\n\n# In our case (Lx = 2π and Nx even), this gives kx = [0, 1, 2, ..., Nx/2].\nkx = rfftfreq(dims[1], sample_rate[1])\n\n# In our case (Ly = 2π and Ny even), this gives\n# ky = [0, 1, 2, ..., Ny/2-1, -Ny/2, -Ny/2+1, ..., -1] (and similarly for kz).\nky = fftfreq(dims[2], sample_rate[2])\nkz = fftfreq(dims[3], sample_rate[3])\n\nkvec = (kx, ky, kz)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that kvec now contains the wave numbers associated to the global domain. In the following, we will only need the wave numbers associated to the portion of the domain handled by the local MPI process.","category":"page"},{"location":"generated/gradient/#gradient_method_global","page":"Gradient of a scalar field","title":"Method 1: global views","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"PencilArrays, returned for instance by allocate_input and allocate_output, take indices that start at 1, regardless of the location of the subdomain associated to the local process on the global grid. (In other words, PencilArrays take local indices.) On the other hand, we have defined the wave number vector kvec which, for each MPI process, is defined over the global domain, and as such it takes global indices.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"One straightforward way of making data arrays compatible with wave numbers is to use global views, i.e. arrays that take global indices. These are generated from PencilArrays by calling the global_view function. Note that, in general, global indices do not start at 1 for a given MPI process. A given process will own a range of data given by indices in (i1:i2, j1:j2, k1:k2).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"θ_glob = global_view(θ_hat)\n∇θ_glob = global_view.(∇θ_hat)\nsummary(θ_glob)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Once we have global views, we can combine data and wave numbers using the portion of global indices owned by the local MPI process, as shown below. We can use CartesianIndices to iterate over the global indices associated to the local process.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"for I in CartesianIndices(θ_glob)\n i, j, k = Tuple(I) # unpack indices\n\n # Wave number vector associated to current Cartesian index.\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n # Compute gradient in Fourier space.\n # Note that modifying ∇θ_glob also modifies the original PencilArray ∇θ_hat.\n ∇θ_glob[1][I] = im * kx * θ_glob[I]\n ∇θ_glob[2][I] = im * ky * θ_glob[I]\n ∇θ_glob[3][I] = im * kz * θ_glob[I]\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The above loop can be written in a slightly more efficient manner by precomputing im * θ_glob[I]:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@inbounds for I in CartesianIndices(θ_glob)\n i, j, k = Tuple(I)\n\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n u = im * θ_glob[I]\n\n ∇θ_glob[1][I] = kx * u\n ∇θ_glob[2][I] = ky * u\n ∇θ_glob[3][I] = kz * u\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Also note that the above can be easily written in a more generic way, e.g. for arbitrary dimensions, thanks in part to the use of CartesianIndices. Moreover, in the above there is no notion of the dimension permutations discussed in the tutorial, as it is all hidden behind the implementation of PencilArrays. And as seen later in the benchmarks, these (hidden) permutations have zero cost, as the speed is identical to that of a function that explicitly takes into account these permutations.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, we can perform a backwards transform to obtain bm θ in physical space:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"∇θ = plan \\ ∇θ_hat;\nnothing #hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that the transform is automatically broadcast over the three fields of the ∇θ_hat vector, and the result ∇θ is also a tuple of three PencilArrays.","category":"page"},{"location":"generated/gradient/#gradient_method_global_explicit","page":"Gradient of a scalar field","title":"Method 2: explicit global indexing","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Sometimes, one does not need to write generic code. In our case, one often knows the dimensionality of the problem and the memory layout of the data (i.e. the underlying index permutation).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Below is a reimplementation of the above loop, using explicit indices instead of CartesianIndices, and assuming that the underlying index permutation is (3, 2, 1), that is, data is stored in (z y x) order. As discussed in the tutorial, this is the default for transformed arrays. This example also serves as a more explicit explanation for what is going on in the first method.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"# Get local data range in the global grid.\nrng = axes(θ_glob) # = (i1:i2, j1:j2, k1:k2)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"For the loop below, we're assuming that the permutation is (3, 2, 1). In other words, the fastest index is the last one, and not the first one as it is usually in Julia. If the permutation is not (3, 2, 1), things will still work (well, except for the assertion below!), but the loop order will not be optimal.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@assert permutation(θ_hat) === Permutation(3, 2, 1)\n\n@inbounds for i in rng[1], j in rng[2], k in rng[3]\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n # Note that we still access the arrays in (i, j, k) order.\n # (The permutation happens behind the scenes!)\n u = im * θ_glob[i, j, k]\n\n ∇θ_glob[1][i, j, k] = kx * u\n ∇θ_glob[2][i, j, k] = ky * u\n ∇θ_glob[3][i, j, k] = kz * u\nend","category":"page"},{"location":"generated/gradient/#gradient_method_local","page":"Gradient of a scalar field","title":"Method 3: using local indices","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Alternatively, we can avoid global views and work directly on PencilArrays using local indices that start at 1. In this case, part of the strategy is to construct a \"local\" grid of wave numbers that can also be accessed with local indices. This can be conveniently done using the localgrid function of the PencilArrays.jl package, which accepts a PencilArray (or its associated Pencil) and the global coordinates (here kvec):","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"grid_fourier = localgrid(θ_hat, kvec)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that one can directly iterate on the returned grid object:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@inbounds for I in CartesianIndices(grid_fourier)\n # Wave number vector associated to current Cartesian index.\n local k⃗ # hide\n k⃗ = grid_fourier[I]\n u = im * θ_hat[I]\n ∇θ_hat[1][I] = k⃗[1] * u\n ∇θ_hat[2][I] = k⃗[2] * u\n ∇θ_hat[3][I] = k⃗[3] * u\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This implementation is as efficient as the other examples, while being slightly shorter to write. Moreover, it is quite generic, and can be made independent of the number of dimensions with little effort.","category":"page"},{"location":"generated/gradient/#gradient_method_broadcast","page":"Gradient of a scalar field","title":"Method 4: using broadcasting","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, note that the local grid object returned by localgrid makes it is possible to compute the gradient using broadcasting, thus fully avoiding scalar indexing. This can be quite convenient in some cases, and can also be very useful if one is working on GPUs (where scalar indexing is prohibitively expensive). Using broadcasting, the above examples simply become:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@. ∇θ_hat[1] = im * grid_fourier[1] * θ_hat\n@. ∇θ_hat[2] = im * grid_fourier[2] * θ_hat\n@. ∇θ_hat[3] = im * grid_fourier[3] * θ_hat\nnothing # hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Once again, as shown in the benchmarks further below, this method performs quite similarly to the other ones.","category":"page"},{"location":"generated/gradient/#Summary","page":"Gradient of a scalar field","title":"Summary","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The PencilArrays module provides different alternatives to deal with MPI-distributed data that may be subject to dimension permutations. In particular, one can choose to work with global indices (first two examples), with local indices (third example), or to avoid scalar indexing altogether (fourth example).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"If one wants to stay generic, making sure that the same code will work for arbitrary dimensions and will be efficient regardless of the underlying dimension permutation, methods 1, 3 or 4 should be preferred. These use CartesianIndices and make no assumptions on possible dimension permutations, which are by default enabled in the output of PencilFFTs transforms. In fact, such permutations are completely invisible in the implementations.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The second method uses explicit (i, j, k) indices. It assumes that the underlying permutation is (3, 2, 1) to loop with i as the slowest index and k as the fastest, which is the optimal order in this case given the permutation. As such, the implementation is less generic than the others, and differences in performance are negligible with respect to more generic variants.","category":"page"},{"location":"generated/gradient/#gradient_benchmarks","page":"Gradient of a scalar field","title":"Benchmark results","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The following are the benchmark results obtained from running examples/gradient.jl on a laptop, using 2 MPI processes and Julia 1.7.2, with an input array of global dimensions 64 32 64. The different methods detailed above are marked on the right. The \"lazy\" marks indicate runs where the wave numbers were represented by lazy Frequencies objects (returned by rfftfreq and fftfreq). Otherwise, they were collected into Vectors. For some reason, plain Vectors are faster when working with grids generated by localgrid.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In the script, additional implementations can be found which rely on a more advanced understanding of permutations and on the internals of the PencilArrays package. For instance, gradient_local_parent! directly works with the raw data stored in Julia Arrays, while gradient_local_linear! completely avoids CartesianIndices while staying generic and efficient. Nevertheless, these display roughly the same performance as the above examples.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":" gradient_global_view!... 89.900 μs\n gradient_global_view! (lazy)... 92.060 μs [Method 1]\n gradient_global_view_explicit!... 88.958 μs\n gradient_global_view_explicit! (lazy)... 81.055 μs [Method 2]\n gradient_local!... 92.305 μs\n gradient_grid!... 92.770 μs\n gradient_grid! (lazy)... 101.388 μs [Method 3]\n gradient_grid_broadcast!... 88.606 μs\n gradient_grid_broadcast! (lazy)... 151.020 μs [Method 4]\n gradient_local_parent!... 92.248 μs\n gradient_local_linear!... 91.212 μs\n gradient_local_linear_explicit!... 90.992 μs","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This page was generated using Literate.jl.","category":"page"},{"location":"PencilFFTs_timers/#PencilFFTs.measuring_performance","page":"Measuring performance","title":"Measuring performance","text":"","category":"section"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"It is possible to measure the time spent in different sections of the distributed transforms using the TimerOutputs package. This has a (very small) performance overhead, so it is disabled by default. To enable time measurements, call TimerOutputs.enable_debug_timings after loading PencilFFTs (see below for an example). For more details see the TimerOutputs docs.","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"Minimal example:","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"using MPI\nusing PencilFFTs\nusing TimerOutputs\n\n# Enable timing of `PencilFFTs` functions\nTimerOutputs.enable_debug_timings(PencilFFTs)\nTimerOutputs.enable_debug_timings(PencilArrays)\nTimerOutputs.enable_debug_timings(Transpositions)\n\nMPI.Init()\n\nplan = PencilFFTPlan(#= args... =#)\n\n# [do stuff with `plan`...]\n\n# Retrieve and print timing data associated to `plan`\nto = timer(plan)\nprint_timer(to)","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"By default, each PencilFFTPlan has its own TimerOutput. If you already have a TimerOutput, you can pass it to the PencilFFTPlan constructor:","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"to = TimerOutput()\nplan = PencilFFTPlan(..., timer=to)\n\n# [do stuff with `plan`...]\n\nprint_timer(to)","category":"page"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"#PencilFFTs","page":"Home","title":"PencilFFTs","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Fast Fourier transforms of MPI-distributed Julia arrays.","category":"page"},{"location":"#Introduction","page":"Home","title":"Introduction","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package provides multidimensional FFTs and related transforms on MPI-distributed Julia arrays via the PencilArrays package.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The name of this package originates from the decomposition of 3D domains along two out of three dimensions, sometimes called pencil decomposition. This is illustrated by the figure below,[1] where each coloured block is managed by a different MPI process. Typically, one wants to compute FFTs on a scalar or vector field along the three spatial dimensions. In the case of a pencil decomposition, 3D FFTs are performed one dimension at a time, along the non-decomposed direction. Transforms must then be interleaved with global data transpositions to switch between pencil configurations. In high-performance computing environments, such data transpositions are generally the most expensive part of a parallel FFT computation, due to the large cost of communications between computing nodes.","category":"page"},{"location":"","page":"Home","title":"Home","text":"
\n \n
","category":"page"},{"location":"","page":"Home","title":"Home","text":"More generally, PencilFFTs allows to decompose and perform FFTs on geometries of arbitrary dimension N. The decompositions can be performed along an arbitrary number M N of dimensions.[2] Moreover, the transforms applied along each dimension can be arbitrarily chosen (and combined) among those supported by FFTW.jl, including complex-to-complex, real-to-complex and real-to-real transforms.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The generic and efficient implementation of this package is greatly enabled by the use of zero-cost abstractions in Julia. As shown in the Benchmarks section, PencilFFTs scales well to large numbers of processes, and performs similarly to the Fortran implementation of P3DFFT, possibly the most popular library for computing parallel FFTs using 2D domain decomposition.","category":"page"},{"location":"#Features","page":"Home","title":"Features","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"distributed N-dimensional FFTs of MPI-distributed Julia arrays, using the PencilArrays package;\nFFTs and related transforms (e.g. DCTs / Chebyshev transforms) may be arbitrarily combined along different dimensions;\nin-place and out-of-place transforms;\nhigh scalability up to (at least) tens of thousands of MPI processes.","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"PencilFFTs can be installed using the Julia package manager:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> ] add PencilFFTs","category":"page"},{"location":"#Similar-projects","page":"Home","title":"Similar projects","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"FFTW3 implements distributed-memory transforms using MPI, but these are limited to 1D decompositions. Also, this functionality is not currently included in the FFTW.jl wrappers.\nPFFT is a very general parallel FFT library written in C.\nP3DFFT implements parallel 3D FFTs using pencil decomposition in Fortran and C++.\n2DECOMP&FFT is another parallel 3D FFT library using pencil decomposition written in Fortran.","category":"page"},{"location":"","page":"Home","title":"Home","text":"[1]: Figure adapted from this PhD thesis.","category":"page"},{"location":"","page":"Home","title":"Home","text":"[2]: For the pencil decomposition represented in the figure, N = 3 and M = 2.","category":"page"},{"location":"PencilFFTs/#Distributed-FFT-plans","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"Distributed FFTs are implemented in the PencilFFTs module, and are built on top of the PencilArrays package.","category":"page"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"PencilFFTs/#Creating-plans","page":"Distributed FFT plans","title":"Creating plans","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"PencilFFTPlan","category":"page"},{"location":"PencilFFTs/#PencilFFTs.PencilFFTPlan","page":"Distributed FFT plans","title":"PencilFFTs.PencilFFTPlan","text":"PencilFFTPlan{T,N} <: AbstractFFTs.Plan{T}\n\nPlan for N-dimensional FFT-based transform on MPI-distributed data, where input data has type T.\n\n\n\nPencilFFTPlan(p::Pencil, transforms; kwargs...)\n\nCreate a PencilFFTPlan for distributed arrays following a given Pencil configuration. See variant below for details on the specification of transforms and on possible keyword arguments.\n\n\n\nPencilFFTPlan(\n A::PencilArray, transforms;\n fftw_flags = FFTW.ESTIMATE,\n fftw_timelimit = FFTW.NO_TIMELIMIT,\n permute_dims = Val(true),\n transpose_method = Transpositions.PointToPoint(),\n timer = timer(pencil(A)),\n)\n\nCreate plan for N-dimensional transform on MPI-distributed PencilArrays.\n\nExtended help\n\nThis creates a PencilFFTPlan for arrays sharing the same properties as A (dimensions, MPI decomposition, memory layout, ...), which describe data on an N-dimensional domain.\n\nTransforms\n\nThe transforms to be applied along each dimension are specified by the transforms argument. Possible transforms are defined as subtypes of Transforms.AbstractTransform, and are listed in Transform types. This argument may be either:\n\na tuple of N transforms to be applied along each dimension. For instance, transforms = (Transforms.R2R(FFTW.REDFT01), Transforms.RFFT(), Transforms.FFT());\na single transform to be applied along all dimensions. The input is automatically expanded into N equivalent transforms. For instance, for a three-dimensional array, transforms = Transforms.RFFT() specifies a 3D real-to-complex transform, and is equivalent to passing (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT()).\n\nNote that forward transforms are applied from left to right. In the last example, this means that a real-to-complex transform (RFFT) is first performed along the first dimension. This is followed by complex-to-complex transforms (FFT) along the second and third dimensions.\n\nInput data layout\n\nThe input PencilArray must satisfy the following constraints:\n\narray dimensions must not be permuted. This is the default when constructing PencilArrays.\nfor an M-dimensional domain decomposition (with M < N), the input array must be decomposed along the last M dimensions. For example, for a 2D decomposition of 3D data, the decomposed dimensions must be (2, 3). In particular, the first array dimension must not be distributed among different MPI processes.\nIn the PencilArrays package, the decomposed dimensions are specified at the moment of constructing a Pencil.\nthe element type must be compatible with the specified transform. For instance, real-to-complex transforms (Transforms.RFFT) require the input to be real floating point values. Other transforms, such as Transforms.R2R, accept both real and complex data.\n\nKeyword arguments\n\nThe keyword arguments fftw_flags and fftw_timelimit are passed to the FFTW plan creation functions (see AbstractFFTs docs).\npermute_dims determines whether the indices of the output data should be reversed. For instance, if the input data has global dimensions (Nx, Ny, Nz), then the output of a complex-to-complex FFT would have dimensions (Nz, Ny, Nx). This enables FFTs to always be performed along the first (i.e. fastest) array dimension, which could lead to performance gains. This option is enabled by default. For type inference reasons, it must be a value type (Val(true) or Val(false)).\ntranspose_method allows to select between implementations of the global data transpositions. See PencilArrays docs docs for details.\ntimer should be a TimerOutput object. See Measuring performance for details.\n\n\n\nPencilFFTPlan(\n dims_global::Dims{N}, transforms, proc_dims::Dims{M}, comm::MPI.Comm,\n [real_type = Float64]; extra_dims = (), kws...\n)\n\nCreate plan for N-dimensional transform.\n\nExtended help\n\nInstead of taking a PencilArray or a Pencil, this constructor requires the global dimensions of the input data, passed via the size_global argument.\n\nThe data is distributed over the MPI processes in the comm communicator. The distribution is performed over M dimensions (with M < N) according to the values in proc_dims, which specifies the number of MPI processes to put along each dimension.\n\nPencilArrays that may be transformed with the returned plan can be created using allocate_input.\n\nOptional arguments\n\nThe floating point precision can be selected by setting real_type parameter, which is Float64 by default.\nextra_dims may be used to specify the sizes of one or more extra dimensions that should not be transformed. These dimensions will be added to the rightmost (i.e. slowest) indices of the arrays. See Extra dimensions below for usage hints.\nsee the other constructor for more keyword arguments.\n\nExtra dimensions\n\nOne possible application of extra_dims is for describing the components of a vector or tensor field. However, this means that different PencilFFTPlans would need to be created for each kind of field (scalar, vector, ...). To avoid the creation of multiple plans, a possibly better alternative is to create tuples (or arrays) of PencilArrays using allocate_input and allocate_output.\n\nAnother more legitimate usage of extra_dims is to specify one or more Cartesian dimensions that should not be transformed nor split among MPI processes.\n\nExample\n\nSuppose we want to perform a 3D FFT of real data. The data is to be decomposed along two dimensions, over 8 MPI processes:\n\nsize_global = (64, 32, 128) # size of real input data\n\n# Perform real-to-complex transform along the first dimension, then\n# complex-to-complex transforms along the other dimensions.\ntransforms = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT())\n# transforms = Transforms.RFFT() # this is equivalent to the above line\n\nproc_dims = (4, 2) # 2D decomposition\ncomm = MPI.COMM_WORLD\n\nplan = PencilFFTPlan(size_global, transforms, proc_dims, comm)\n\n\n\n\n\n","category":"type"},{"location":"PencilFFTs/#Allocating-data","page":"Distributed FFT plans","title":"Allocating data","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"allocate_input\nallocate_output","category":"page"},{"location":"PencilFFTs/#PencilFFTs.allocate_input","page":"Distributed FFT plans","title":"PencilFFTs.allocate_input","text":"allocate_input(p::PencilFFTPlan) -> PencilArray\nallocate_input(p::PencilFFTPlan, dims...) -> Array{PencilArray}\nallocate_input(p::PencilFFTPlan, Val(N)) -> NTuple{N, PencilArray}\n\nAllocate uninitialised PencilArray that can hold input data for the given plan.\n\nThe second and third forms respectively allocate an array of PencilArrays of size dims, and a tuple of N PencilArrays.\n\nnote: In-place plans\nIf p is an in-place real-to-real or complex-to-complex plan, a ManyPencilArray is allocated. If p is an in-place real-to-complex plan, a ManyPencilArrayRFFT! is allocated. These types hold PencilArray wrappers for the input and output transforms (as well as for intermediate transforms) which share the same space in memory. The input and output PencilArrays should be respectively accessed by calling first(::ManyPencilArray) and last(::ManyPencilArray).ExampleSuppose p is an in-place PencilFFTPlan. Then,@assert is_inplace(p)\nA = allocate_input(p) :: ManyPencilArray\nv_in = first(A) :: PencilArray # input data view\nv_out = last(A) :: PencilArray # output data viewAlso note that in-place plans must be performed directly on the returned ManyPencilArray, and not on the contained PencilArray views:p * A # perform forward transform in-place\np \\ A # perform backward transform in-place\n# p * v_in # not allowed!!\n\n\n\n\n\n","category":"function"},{"location":"PencilFFTs/#PencilFFTs.allocate_output","page":"Distributed FFT plans","title":"PencilFFTs.allocate_output","text":"allocate_output(p::PencilFFTPlan) -> PencilArray\nallocate_output(p::PencilFFTPlan, dims...) -> Array{PencilArray}\nallocate_output(p::PencilFFTPlan, Val(N)) -> NTuple{N, PencilArray}\n\nAllocate uninitialised PencilArray that can hold output data for the given plan.\n\nIf p is an in-place plan, a ManyPencilArray is allocated.\n\nSee allocate_input for details.\n\n\n\n\n\n","category":"function"},{"location":"PencilFFTs/#Methods","page":"Distributed FFT plans","title":"Methods","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"get_comm(::PencilFFTPlan)\nscale_factor(::PencilFFTPlan)\ntimer(::PencilFFTPlan)\nis_inplace(::PencilFFTPlan)","category":"page"},{"location":"PencilFFTs/#PencilArrays.Pencils.MPITopologies.get_comm-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilArrays.Pencils.MPITopologies.get_comm","text":"get_comm(p::PencilFFTPlan)\n\nGet MPI communicator associated to a PencilFFTPlan.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilFFTs.Transforms.scale_factor-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilFFTs.Transforms.scale_factor","text":"scale_factor(p::PencilFFTPlan)\n\nGet scale factor associated to a PencilFFTPlan.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilArrays.Pencils.timer-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilArrays.Pencils.timer","text":"timer(p::PencilFFTPlan)\n\nGet TimerOutput attached to a PencilFFTPlan.\n\nSee Measuring performance for details.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilFFTs.Transforms.is_inplace-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilFFTs.Transforms.is_inplace","text":"Transforms.is_inplace(p::PencilFFTPlan)\n\nReturns true if the given plan operates in-place on the input data, false otherwise.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#Internals","page":"Distributed FFT plans","title":"Internals","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"ManyPencilArrayRFFT!","category":"page"},{"location":"PencilFFTs/#PencilFFTs.ManyPencilArrayRFFT!","page":"Distributed FFT plans","title":"PencilFFTs.ManyPencilArrayRFFT!","text":"ManyPencilArrayRFFT!{T,N,M} <: AbstractManyPencilArray{N,M}\n\nContainer holding M different PencilArray views to the same underlying data buffer. All views share the same dimensionality N. The element type T of the first view is real, that of subsequent views is Complex{T}. \n\nThis can be used to perform in-place real-to-complex plan, see alsoTransforms.RFFT!. It is used internally for such transforms by allocate_input and should not be constructed directly.\n\n\n\nManyPencilArrayRFFT!{T}(undef, pencils...; extra_dims=())\n\nCreate a ManyPencilArrayRFFT! container that can hold data of type T and Complex{T} associated to all the given Pencils.\n\nThe optional extra_dims argument is the same as for PencilArray.\n\nSee also ManyPencilArray\n\n\n\n\n\n","category":"type"}]
}
diff --git a/dev/tutorial/index.html b/dev/tutorial/index.html
index 723c885b..216e9723 100644
--- a/dev/tutorial/index.html
+++ b/dev/tutorial/index.html
@@ -1,5 +1,5 @@
-Tutorial · PencilFFTs.jl
The following tutorial shows how to perform a 3D FFT of real periodic data defined on a grid of $N_x × N_y × N_z$ points.
-
-
As seen above, PencilFFTs generally outperforms P3DFFT in its default setting. This is largely explained by the choice of using non-blocking point-to-point MPI communications (via MPI_Isend and MPI_Irecv), while P3DFFT uses collective MPI_Alltoallv calls. This enables PencilFFTs to perform data reordering operations on the partially received data while waiting for the incoming data, leading to better performance. Moreover, in contrast with P3DFFT, the high performance and scalability of PencilFFTs results from a highly generic code, handling decompositions in arbitrary dimensions and a relatively large (and extensible) variety of transformations.
Note that PencilFFTs can optionally use collective communications (using MPI_Alltoallv) instead of point-to-point communications. For details, see the docs for PencilFFTPlan and for PencilArray transpositions. As seen above, collective communications generally perform worse than point-to-point ones, and runtimes are nearly indistinguishable from those of P3DFFT.
The benchmarks were performed using Julia 1.7-beta3 and Intel MPI 2019. We used PencilFFTs v0.12.5 with FFTW.jl v1.4.3 and MPI.jl v0.19.0. We used the Fortran implementation of P3DFFT, version 2.7.6, which was built with Intel 2019 compilers and linked to FFTW 3.3.8. The cluster where the benchmarks were run has Intel Cascade Lake 6248 processors with 2×20 cores per node.
The number of MPI processes along each decomposed dimension, $P_1$ and $P_2$, was automatically determined by a call to MPI_Dims_create, which tends to create a balanced decomposition with $P_1 ≈ P_2$. For instance, a total of 1024 processes is divided into $P_1 = P_2 = 32$. Different results may be obtained with other combinations, but this was not benchmarked.
The source files used to generate this benchmark, as well as the raw benchmark results, are all available in the PencilFFTs repo.
Settings
This document was generated with Documenter.jl version 0.27.12 on Tuesday 15 February 2022. Using Julia version 1.7.2.
This example shows different methods to compute the gradient of a real-valued 3D scalar field $θ(\bm{x})$ in Fourier space, where $\bm{x} = (x, y, z)$. It is assumed that the field is periodic with period $L = 2π$ along all dimensions.
where $\bm{k} = (k_x, k_y, k_z)$ are the Fourier wave numbers and $\hat{θ}$ is the discrete Fourier transform of $θ$. Then, the spatial derivatives of $θ$ are given by
In this section, we initialise a random real-valued scalar field $θ$ and compute its FFT. For more details see the Tutorial.
using MPI
-using PencilFFTs
-using Random
-
-MPI.Init()
-
-# Input data dimensions (Nx × Ny × Nz)
-dims = (64, 32, 64)
-
-# Apply a 3D real-to-complex (r2c) FFT.
-transform = Transforms.RFFT()
-
-# Automatically create decomposition configuration
-comm = MPI.COMM_WORLD
-pen = Pencil(dims, comm)
-
-# Create plan
-plan = PencilFFTPlan(pen, transform)
-
-# Allocate data and initialise field
-θ = allocate_input(plan)
-randn!(θ)
-
-# Perform distributed FFT
-θ_hat = plan * θ
Finally, we initialise the output that will hold ∇θ in Fourier space. Noting that ∇θ is a vector field, we choose to store it as a tuple of 3 PencilArrays.
In general, the Fourier wave numbers are of the form $k_i = 0, ±\frac{2π}{L_i}, ±\frac{4π}{L_i}, ±\frac{6π}{L_i}, …$, where $L_i$ is the period along dimension $i$. When a real-to-complex Fourier transform is applied, roughly half of these wave numbers are redundant due to the Hermitian symmetry of the complex Fourier coefficients. In practice, this means that for the fastest dimension $x$ (along which a real-to-complex transform is performed), the negative wave numbers are dropped, i.e. $k_x = 0, \frac{2π}{L_x}, \frac{4π}{L_x}, …$.
The AbstractFFTs package provides a convenient way to generate the Fourier wave numbers, using the functions fftfreq and rfftfreq. We can use these functions to initialise a "grid" of wave numbers associated to our 3D real-to-complex transform:
using AbstractFFTs: fftfreq, rfftfreq
-
-box_size = (2π, 2π, 2π) # Lx, Ly, Lz
-sample_rate = 2π .* dims ./ box_size
-
-# In our case (Lx = 2π and Nx even), this gives kx = [0, 1, 2, ..., Nx/2].
-kx = rfftfreq(dims[1], sample_rate[1])
-
-# In our case (Ly = 2π and Ny even), this gives
-# ky = [0, 1, 2, ..., Ny/2-1, -Ny/2, -Ny/2+1, ..., -1] (and similarly for kz).
-ky = fftfreq(dims[2], sample_rate[2])
-kz = fftfreq(dims[3], sample_rate[3])
-
-kvec = (kx, ky, kz)
Note that kvec now contains the wave numbers associated to the global domain. In the following, we will only need the wave numbers associated to the portion of the domain handled by the local MPI process.
PencilArrays, returned for instance by allocate_input and allocate_output, take indices that start at 1, regardless of the location of the subdomain associated to the local process on the global grid. (In other words, PencilArrays take local indices.) On the other hand, we have defined the wave number vector kvec which, for each MPI process, is defined over the global domain, and as such it takes global indices.
One straightforward way of making data arrays compatible with wave numbers is to use global views, i.e. arrays that take global indices. These are generated from PencilArrays by calling the global_view function. Note that, in general, global indices do not start at 1 for a given MPI process. A given process will own a range of data given by indices in (i1:i2, j1:j2, k1:k2).
"16×32×64 OffsetArray(::PencilArray{ComplexF64, 3}, 1:16, 1:32, 1:64) with eltype ComplexF64 with indices 1:16×1:32×1:64"
Once we have global views, we can combine data and wave numbers using the portion of global indices owned by the local MPI process, as shown below. We can use CartesianIndices to iterate over the global indices associated to the local process.
for I in CartesianIndices(θ_glob)
- i, j, k = Tuple(I) # unpack indices
-
- # Wave number vector associated to current Cartesian index.
- kx = kvec[1][i]
- ky = kvec[2][j]
- kz = kvec[3][k]
-
- # Compute gradient in Fourier space.
- # Note that modifying ∇θ_glob also modifies the original PencilArray ∇θ_hat.
- ∇θ_glob[1][I] = im * kx * θ_glob[I]
- ∇θ_glob[2][I] = im * ky * θ_glob[I]
- ∇θ_glob[3][I] = im * kz * θ_glob[I]
-end
The above loop can be written in a slightly more efficient manner by precomputing im * θ_glob[I]:
@inbounds for I in CartesianIndices(θ_glob)
- i, j, k = Tuple(I)
-
- kx = kvec[1][i]
- ky = kvec[2][j]
- kz = kvec[3][k]
-
- u = im * θ_glob[I]
-
- ∇θ_glob[1][I] = kx * u
- ∇θ_glob[2][I] = ky * u
- ∇θ_glob[3][I] = kz * u
-end
Also note that the above can be easily written in a more generic way, e.g. for arbitrary dimensions, thanks in part to the use of CartesianIndices. Moreover, in the above there is no notion of the dimension permutations discussed in the tutorial, as it is all hidden behind the implementation of PencilArrays. And as seen later in the benchmarks, these (hidden) permutations have zero cost, as the speed is identical to that of a function that explicitly takes into account these permutations.
Finally, we can perform a backwards transform to obtain $\bm{∇} θ$ in physical space:
∇θ = plan \ ∇θ_hat;
Note that the transform is automatically broadcast over the three fields of the ∇θ_hat vector, and the result ∇θ is also a tuple of three PencilArrays.
Sometimes, one does not need to write generic code. In our case, one often knows the dimensionality of the problem and the memory layout of the data (i.e. the underlying index permutation).
Below is a reimplementation of the above loop, using explicit indices instead of CartesianIndices, and assuming that the underlying index permutation is (3, 2, 1), that is, data is stored in $(z, y, x)$ order. As discussed in the tutorial, this is the default for transformed arrays. This example also serves as a more explicit explanation for what is going on in the first method.
# Get local data range in the global grid.
-rng = axes(θ_glob) # = (i1:i2, j1:j2, k1:k2)
For the loop below, we're assuming that the permutation is (3, 2, 1). In other words, the fastest index is the last one, and not the first one as it is usually in Julia. If the permutation is not (3, 2, 1), things will still work (well, except for the assertion below!), but the loop order will not be optimal.
@assert permutation(θ_hat) === Permutation(3, 2, 1)
-
-@inbounds for i in rng[1], j in rng[2], k in rng[3]
- kx = kvec[1][i]
- ky = kvec[2][j]
- kz = kvec[3][k]
-
- # Note that we still access the arrays in (i, j, k) order.
- # (The permutation happens behind the scenes!)
- u = im * θ_glob[i, j, k]
-
- ∇θ_glob[1][i, j, k] = kx * u
- ∇θ_glob[2][i, j, k] = ky * u
- ∇θ_glob[3][i, j, k] = kz * u
-end
Alternatively, we can avoid global views and work directly on PencilArrays using local indices that start at 1. In this case, part of the strategy is to construct a "local" grid of wave numbers that can also be accessed with local indices. This can be conveniently done using the localgrid function of the PencilArrays.jl package, which accepts a PencilArray (or its associated Pencil) and the global coordinates (here kvec):
Note that one can directly iterate on the returned grid object:
@inbounds for I in CartesianIndices(grid_fourier)
- # Wave number vector associated to current Cartesian index.
- k⃗ = grid_fourier[I]
- u = im * θ_hat[I]
- ∇θ_hat[1][I] = k⃗[1] * u
- ∇θ_hat[2][I] = k⃗[2] * u
- ∇θ_hat[3][I] = k⃗[3] * u
-end
This implementation is as efficient as the other examples, while being slightly shorter to write. Moreover, it is quite generic, and can be made independent of the number of dimensions with little effort.
Finally, note that the local grid object returned by localgrid makes it is possible to compute the gradient using broadcasting, thus fully avoiding scalar indexing. This can be quite convenient in some cases, and can also be very useful if one is working on GPUs (where scalar indexing is prohibitively expensive). Using broadcasting, the above examples simply become:
@. ∇θ_hat[1] = im * grid_fourier[1] * θ_hat
-@. ∇θ_hat[2] = im * grid_fourier[2] * θ_hat
-@. ∇θ_hat[3] = im * grid_fourier[3] * θ_hat
Once again, as shown in the benchmarks further below, this method performs quite similarly to the other ones.
The PencilArrays module provides different alternatives to deal with MPI-distributed data that may be subject to dimension permutations. In particular, one can choose to work with global indices (first two examples) or with local indices (third example).
If one wants to stay generic, making sure that the same code will work for arbitrary dimensions and will be efficient regardless of the underlying dimension permutation, methods 1 and 3 should be preferred. These use CartesianIndices and make no assumptions on the permutations (actually, permutations are completely invisible in the implementations).
The second method uses explicit (i, j, k) indices. It assumes that the underlying permutation is (3, 2, 1) to loop with i as the slowest index and k as the fastest, which is the optimal order in this case given the permutation. As such, the implementation is less generic than the others, and differences in performance are negligible with respect to more generic variants.
The following are the benchmark results obtained from running examples/gradient.jl on a laptop, using 2 MPI processes and Julia 1.7.2, with an input array of global dimensions $64 × 32 × 64$. The different methods detailed above are marked on the right. The "lazy" marks indicate runs where the wave numbers were represented by lazy Frequencies objects (returned by rfftfreq and fftfreq). Otherwise, they were collected into Vectors. For some reason, plain Vectors are faster when working with grids generated by localgrid.
In the script, additional implementations can be found which rely on a more advanced understanding of permutations and on the internals of the PencilArrays package. For instance, gradient_local_parent! directly works with the raw data stored in Julia Arrays, while gradient_local_linear! completely avoids CartesianIndices while staying generic and efficient. Nevertheless, these display roughly the same performance as the above examples.
Complex-to-complex and real-to-real transforms can be performed in-place, enabling important memory savings. The procedure is very similar to that of out-of-place transforms described in the tutorial. The differences are illustrated in the sections below.
As with out-of-place plans, data should be allocated using allocate_input. The difference is that, for in-place plans, this function returns a ManyPencilArray object, which is a container holding multiple PencilArray views sharing the same memory space.
# Allocate data for the plan.
-# Since `plan` is in-place, this returns a `ManyPencilArray` container.
-A = allocate_input(plan)
-summary(A)
Note that allocate_output also works for in-place plans, but it returns exactly the same thing as allocate_input.
As shown in the next section, in-place plans must be applied on the returned ManyPencilArray. On the other hand, one usually wants to access and modify data, and for this one needs the PencilArray views contained in the ManyPencilArray. The input and output array views can be obtained by calling first(::ManyPencilArray) and last(::ManyPencilArray).
For instance, we can initialise the input array with some data before transforming:
using Random
-u_in = first(A) # input data view
-randn!(u_in)
-summary(u_in)
Like in FFTW.jl, one can perform in-place transforms using the * and \ operators. As mentioned above, in-place plans must be applied on the ManyPencilArray containers returned by allocate_input.
plan * A; # performs in-place forward transform
After performing an in-place transform, we usually want to do operations on the output data. For instance, let's compute the global sum of the transformed data:
u_out = last(A) # output data view
-sum(u_out) # sum of transformed data (note that `sum` reduces over all processes)
-16704.57297999713 - 26075.211129702257im
Finally, we can perform a backward transform and do stuff with the input view:
plan \ A; # perform in-place backward transform
-
-# Now we can again do stuff with the input view `u_in`...
where $\bm{v}(\bm{x}, t)$ and $p(\bm{x}, t)$ are respectively the velocity and pressure fields, $ν$ is the fluid kinematic viscosity and $ρ$ is the fluid density.
We solve the above equations a 3D periodic domain using a standard Fourier pseudo-spectral method.
Let's check the number of MPI processes over which we're running our simulation:
MPI.Comm_size(comm)
2
We can now create a partitioning of the domain based on the number of grid points (Ns) and on the number of MPI processes. There are different ways to do this. For simplicity, here we do it automatically following the PencilArrays.jl docs:
pen = Pencil(Ns, comm)
Decomposition of 3D data
- Data dimensions: (64, 64, 64)
- Decomposed dimensions: (2, 3)
- Data permutation: NoPermutation()
- Array type: Array
The subdomain associated to the local MPI process can be obtained using range_local:
range_local(pen)
(1:64, 1:32, 1:64)
We now construct a distributed vector field that follows the decomposition configuration we just created:
where $u₀$ and $k₀$ are two parameters setting the amplitude and the period of the velocity field.
To set the initial condition, each MPI process needs to know which portion of the physical grid it has been attributed. For this, PencilArrays.jl includes a localgrid helper function:
where $\bm{k} = (k_x, k_y, k_z)$ are the discrete wave numbers.
The wave numbers can be obtained using the fftfreq function. Since we perform a real-to-complex transform along the first dimension, we use rfftfreq instead for $k_x$:
Note that, in Fourier space, the domain decomposition is performed along the directions $x$ and $y$:
pencil(v̂s[1])
Decomposition of 3D data
- Data dimensions: (33, 64, 64)
- Decomposed dimensions: (1, 2)
- Data permutation: Permutation(3, 2, 1)
- Array type: Array
This is because the 3D FFTs are performed one dimension at a time, with the $x$ direction first and the $z$ direction last. To efficiently perform an FFT along a given direction (taking advantage of serial FFT implementations like FFTW), all the data along that direction must be contained locally within a single MPI process. For that reason, data redistributions (or transpositions) among MPI processes are performed behind the scenes during each FFT computation. Such transpositions require important communications between MPI processes, and are usually the most time-consuming aspect of massively-parallel simulations using this kind of methods.
To solve the Navier–Stokes equations in Fourier space, we will also need the respective wave numbers $\bm{k}$ associated to the local MPI process. Similarly to the local grid points, these are obtained using the localgrid function:
As an example, let's first use this to compute and plot the vorticity associated to the initial condition. The vorticity is defined as the curl of the velocity, $\bm{ω} = \bm{∇} × \bm{v}$. In Fourier space, this becomes $\hat{\bm{ω}} = i \bm{k} × \hat{\bm{v}}$.
using StaticArrays: SVector
-using LinearAlgebra: ×
-
-function curl_fourier!(
- ω̂s::NTuple{N, <:PencilArray}, v̂s::NTuple{N, <:PencilArray}, grid_fourier,
- ) where {N}
- @inbounds for I ∈ eachindex(grid_fourier)
- # We use StaticArrays for the cross product between small vectors.
- ik⃗ = im * SVector(grid_fourier[I])
- v⃗ = SVector(getindex.(v̂s, Ref(I))) # = (v̂s[1][I], v̂s[2][I], ...)
- ω⃗ = ik⃗ × v⃗
- for n ∈ eachindex(ω⃗)
- ω̂s[n][I] = ω⃗[n]
- end
- end
- ω̂s
-end
-
-ω̂s = similar.(v̂s)
-curl_fourier!(ω̂s, v̂s, grid_fourier);
We finally transform back to physical space and plot the result:
where $\mathcal{P}_{\bm{k}}$ is a projection operator allowing to preserve the incompressibility condition $\bm{∇} ⋅ \bm{v} = 0$. This operator encodes the action of the pressure gradient term, which serves precisely to enforce incompressibility. Note that, because of this, the pressure gradient dissapears from the equations.
Now that we have the wave numbers $\bm{k}$, computing the linear viscous term in Fourier space is straighforward once we have the Fourier coefficients $\hat{\bm{v}}_{\bm{k}}$ of the velocity field. What is slightly more challenging (and much more costly) is the computation of the non-linear term in Fourier space, $\hat{\bm{F}}_{\bm{k}} = \left[ \widehat{(\bm{v} ⋅ \bm{∇}) \bm{v}} \right]_{\bm{k}}$. In the pseudo-spectral method, the quadratic nonlinearity is computed by collocation in physical space (i.e. this term is evaluated at grid points), while derivatives are computed in Fourier space. This requires transforming fields back and forth between both spaces.
Below we implement a function that computes the non-linear term in Fourier space based on its convective form $(\bm{v} ⋅ \bm{∇}) \bm{v} = \bm{∇} ⋅ (\bm{v} ⊗ \bm{v})$. Note that this equivalence uses the incompressibility condition $\bm{∇} ⋅ \bm{v} = 0$.
using LinearAlgebra: mul!, ldiv! # for applying FFT plans in-place
-
-# Compute non-linear term in Fourier space from velocity field in physical
-# space. Optional keyword arguments may be passed to avoid allocations.
-function ns_nonlinear!(
- F̂s, vs, plan, grid_fourier;
- vbuf = similar(vs[1]), v̂buf = similar(F̂s[1]),
- )
- # Compute F_i = ∂_j (v_i v_j) for each i.
- # In Fourier space: F̂_i = im * k_j * FFT(v_i * v_j)
- w, ŵ = vbuf, v̂buf
- @inbounds for (i, F̂i) ∈ enumerate(F̂s)
- F̂i .= 0
- vi = vs[i]
- for (j, vj) ∈ enumerate(vs)
- w .= vi .* vj # w = v_i * v_j in physical space
- mul!(ŵ, plan, w) # same in Fourier space
- # Add derivative in Fourier space
- for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I] # = (kx, ky, kz)
- kj = k⃗[j]
- F̂i[I] += im * kj * ŵ[I]
- end
- end
- end
- F̂s
-end
ns_nonlinear! (generic function with 1 method)
As an example, let's use this function on our initial velocity field:
Strictly speaking, computing the non-linear term by collocation can lead to aliasing errors, as the quadratic term excites Fourier modes that fall beyond the range of resolved wave numbers. The typical solution is to apply Orzsag's 2/3 rule to zero-out the Fourier coefficients associated to the highest wave numbers. We define a function that applies this procedure below.
function dealias_twothirds!(ŵs::Tuple, grid_fourier, ks_global)
- ks_max = maximum.(abs, ks_global) # maximum stored wave numbers (kx_max, ky_max, kz_max)
- ks_lim = (2 / 3) .* ks_max
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I]
- if any(abs.(k⃗) .> ks_lim)
- for ŵ ∈ ŵs
- ŵ[I] = 0
- end
- end
- end
- ŵs
-end
-
-# We can apply this on the previously computed non-linear term:
-dealias_twothirds!(F̂s, grid_fourier, ks_global);
Finally, we implement the projection associated to the incompressibility condition:
function project_divergence_free!(ûs, grid_fourier)
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I]
- k² = sum(abs2, k⃗)
- iszero(k²) && continue # avoid division by zero
- û = getindex.(ûs, Ref(I)) # (ûs[1][I], ûs[2][I], ...)
- for i ∈ eachindex(û)
- ŵ = û[i]
- for j ∈ eachindex(û)
- ŵ -= k⃗[i] * k⃗[j] * û[j] / k²
- end
- ûs[i][I] = ŵ
- end
- end
- ûs
-end
project_divergence_free! (generic function with 1 method)
We can verify the correctness of the projection operator by checking that the initial velocity field is not modified by it, since it is already incompressible:
v̂s_proj = project_divergence_free!(copy.(v̂s), grid_fourier)
-v̂s_proj .≈ v̂s # the last one may be false because v_z = 0 initially
To perform the time integration of the Navier–Stokes equations, we will use the timestepping routines implemented in the DifferentialEquations.jl suite. For simplicity, we use here an explicit Runge–Kutta scheme. In this case, we just need to write a function that computes the right-hand side of the Navier–Stokes equations in Fourier space:
function ns_rhs!(
- dvs::NTuple{N, <:PencilArray}, vs::NTuple{N, <:PencilArray}, p, t,
- ) where {N}
- # 1. Compute non-linear term and dealias it
- (; plan, cache, ks_global, grid_fourier) = p
- F̂s = cache.F̂s
- ns_nonlinear!(F̂s, vs, plan, grid_fourier; vbuf = dvs[1], v̂buf = cache.v̂s[1])
- dealias_twothirds!(F̂s, grid_fourier, ks_global)
-
- # 2. Project onto divergence-free space
- project_divergence_free!(F̂s, grid_fourier)
-
- # 3. Transform velocity to Fourier space
- v̂s = cache.v̂s
- map((v, v̂) -> mul!(v̂, plan, v), vs, v̂s)
-
- # 4. Add viscous term (and multiply projected non-linear term by -1)
- ν = p.ν
- for n ∈ eachindex(v̂s)
- v̂ = v̂s[n]
- F̂ = F̂s[n]
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I] # = (kx, ky, kz)
- k² = sum(abs2, k⃗)
- F̂[I] = -F̂[I] - ν * k² * v̂[I]
- end
- end
-
- # 5. Transform RHS back to physical space
- map((dv, dv̂) -> ldiv!(dv, plan, dv̂), dvs, F̂s)
-
- nothing
-end
ns_rhs! (generic function with 1 method)
For the time-stepping, we load OrdinaryDiffEq.jl from the DifferentialEquations.jl suite and set-up the simulation. Since DifferentialEquations.jl can't directly deal with tuples of arrays, we convert the input data to the ArrayPartition type and write an interface function to make things work with our functions defined above.
We finally solve the problem over time and plot the vorticity associated to the solution. It is also useful to look at the energy spectrum $E(k)$, to see if the small scales are correctly resolved. To obtain a turbulent flow, the viscosity $ν$ must be small enough to allow the transient appearance of an energy cascade towards the small scales (i.e. from small to large $k$), while high enough to allow the small-scale motions to be correctly resolved.
This package provides multidimensional FFTs and related transforms on MPI-distributed Julia arrays via the PencilArrays package.
The name of this package originates from the decomposition of 3D domains along two out of three dimensions, sometimes called pencil decomposition. This is illustrated by the figure below,[1] where each coloured block is managed by a different MPI process. Typically, one wants to compute FFTs on a scalar or vector field along the three spatial dimensions. In the case of a pencil decomposition, 3D FFTs are performed one dimension at a time, along the non-decomposed direction. Transforms must then be interleaved with global data transpositions to switch between pencil configurations. In high-performance computing environments, such data transpositions are generally the most expensive part of a parallel FFT computation, due to the large cost of communications between computing nodes.
-
-
More generally, PencilFFTs allows to decompose and perform FFTs on geometries of arbitrary dimension $N$. The decompositions can be performed along an arbitrary number $M < N$ of dimensions.[2] Moreover, the transforms applied along each dimension can be arbitrarily chosen (and combined) among those supported by FFTW.jl, including complex-to-complex, real-to-complex and real-to-real transforms.
The generic and efficient implementation of this package is greatly enabled by the use of zero-cost abstractions in Julia. As shown in the Benchmarks section, PencilFFTs scales well to large numbers of processes, and performs similarly to the Fortran implementation of P3DFFT, possibly the most popular library for computing parallel FFTs using 2D domain decomposition.
FFTW3 implements distributed-memory transforms using MPI, but these are limited to 1D decompositions. Also, this functionality is not currently included in the FFTW.jl wrappers.
PFFT is a very general parallel FFT library written in C.
P3DFFT implements parallel 3D FFTs using pencil decomposition in Fortran and C++.
2DECOMP&FFT is another parallel 3D FFT library using pencil decomposition written in Fortran.
This document was generated with Documenter.jl version 0.27.12 on Tuesday 15 February 2022. Using Julia version 1.7.2.
diff --git a/previews/PR39/search_index.js b/previews/PR39/search_index.js
deleted file mode 100644
index 72c43a4b..00000000
--- a/previews/PR39/search_index.js
+++ /dev/null
@@ -1,3 +0,0 @@
-var documenterSearchIndex = {"docs":
-[{"location":"tutorial/#Tutorial","page":"Tutorial","title":"Tutorial","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The following tutorial shows how to perform a 3D FFT of real periodic data defined on a grid of N_x N_y N_z points.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"
\n \n \n
","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"By default, the domain is distributed on a 2D MPI topology of dimensions N_1 N_2. As an example, the above figure shows such a topology with N_1 = 4 and N_2 = 3, for a total of 12 MPI processes.","category":"page"},{"location":"tutorial/#tutorial:creating_plans","page":"Tutorial","title":"Creating plans","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The first thing to do is to create a domain decomposition configuration for the given dataset dimensions N_x N_y N_z. In the framework of PencilArrays, such a configuration is described by a Pencil object. As described in the PencilArrays docs, we can let the Pencil constructor automatically determine such a configuration. For this, only an MPI communicator and the dataset dimensions are needed:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"using MPI\nusing PencilFFTs\n\nMPI.Init()\ncomm = MPI.COMM_WORLD\n\n# Input data dimensions (Nx × Ny × Nz)\ndims = (16, 32, 64)\npen = Pencil(dims, comm)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"By default this creates a 2D decomposition (for the case of a 3D dataset), but one can change this as detailed in the PencilArrays documentation linked above.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"We can now create a PencilFFTPlan, which requires information on decomposition configuration (the Pencil object) and on the transforms that will be applied:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# Apply a 3D real-to-complex (r2c) FFT.\ntransform = Transforms.RFFT()\n\n# Note that, for more control, one can instead separately specify the transforms along each dimension:\n# transform = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT())\n\n# Create plan\nplan = PencilFFTPlan(pen, transform)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"See the PencilFFTPlan constructor for details on the accepted options, and the Transforms module for the possible transforms. It is also possible to enable fine-grained performance measurements via the TimerOutputs package, as described in Measuring performance.","category":"page"},{"location":"tutorial/#Allocating-data","page":"Tutorial","title":"Allocating data","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Next, we want to apply the plan on some data. Transforms may only be applied on PencilArrays, which are array wrappers that include MPI decomposition information (in some sense, analogous to DistributedArrays in Julia's distributed computing approach). The helper function allocate_input can be used to allocate a PencilArray that is compatible with our plan:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# In our example, this returns a 3D PencilArray of real data (Float64).\nu = allocate_input(plan)\n\n# Fill the array with some (random) data\nusing Random\nrandn!(u)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"PencilArrays are a subtype of AbstractArray, and thus they support all common array operations.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Similarly, to preallocate output data, one can use allocate_output:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# In our example, this returns a 3D PencilArray of complex data (Complex{Float64}).\nv = allocate_output(plan)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"This is only required if one wants to apply the plans using a preallocated output (with mul!, see right below).","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The data types returned by allocate_input and allocate_output are slightly different when working with in-place transforms. See the in-place example for details.","category":"page"},{"location":"tutorial/#Applying-plans","page":"Tutorial","title":"Applying plans","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The interface to apply plans is consistent with that of AbstractFFTs. Namely, * and mul! are respectively used for forward transforms without and with preallocated output data. Similarly, \\ and ldiv! are used for backward transforms.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"using LinearAlgebra # for mul!, ldiv!\n\n# Apply plan on `u` with `v` as an output\nmul!(v, plan, u)\n\n# Apply backward plan on `v` with `w` as an output\nw = similar(u)\nldiv!(w, plan, v) # now w ≈ u","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Note that, consistently with AbstractFFTs, normalisation is performed at the end of a backward transform, so that the original data is recovered when applying a forward followed by a backward transform.","category":"page"},{"location":"tutorial/#Accessing-and-modifying-data","page":"Tutorial","title":"Accessing and modifying data","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For any given MPI process, a PencilArray holds the data associated to its local partition in the global geometry. PencilArrays are accessed using local indices that start at 1, regardless of the location of the local process in the MPI topology. Note that PencilArrays, being based on regular Arrays, support both linear and Cartesian indexing (see the Julia docs for details).","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For convenience, the global_view function can be used to generate an OffsetArray wrapper that takes global indices.","category":"page"},{"location":"tutorial/#tutorial:output_data_layout","page":"Tutorial","title":"Output data layout","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"In memory, the dimensions of the transform output are by default reversed with respect to the input. That is, if the order of indices in the input data is (x, y, z), then the output has order (z, y, x) in memory. This detail is hidden from the user, and output arrays are always accessed in the same order as the input data, regardless of the underlying output dimension permutation. This applies to PencilArrays and to OffsetArrays returned by global_view.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The reasoning behind dimension permutations, is that they allow to always perform FFTs along the fastest array dimension and to avoid a local data transposition, resulting in performance gains. A similar approach is followed by other parallel FFT libraries. FFTW itself, in its distributed-memory routines, includes a flag that enables a similar behaviour. In PencilFFTs, index permutation is the default, but it can be disabled via the permute_dims flag of PencilFFTPlan.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"A great deal of work has been spent in making generic index permutations as efficient as possible, both in intermediate and in the output state of the multidimensional transforms. This has been achieved, in part, by making sure that permutations such as (3, 2, 1) are compile-time constants.","category":"page"},{"location":"tutorial/#Further-reading","page":"Tutorial","title":"Further reading","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For details on working with PencilArrays see the PencilArrays docs.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The examples on the sidebar further illustrate the use of transforms and provide an introduction to working with MPI-distributed data in the form of PencilArrays. In particular, the gradient example illustrates different ways of computing things using Fourier-transformed distributed arrays. Then, the incompressible Navier–Stokes example is a more advanced and complete example of a possible application of the PencilFFTs package.","category":"page"},{"location":"benchmarks/#Benchmarks","page":"Benchmarks","title":"Benchmarks","text":"","category":"section"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The performance of PencilFFTs.jl is comparable to that of other open-source parallel FFT libraries implemented in lower-level languages. Below, we show comparisons with the Fortran implementation of P3DFFT, possibly the most popular of these libraries. The benchmarks were performed on the Jean–Zay cluster of the IDRIS French computing centre (CNRS).","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The figure below shows strong scaling benchmarks of 3D real-to-complex FFTs using 2D (\"pencil\") decomposition. The benchmarks were run for input arrays of dimensions N_x N_y N_z = 512^3, 1024^3 and 2048^3. Each timing is averaged over 100 repetitions.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"
\n \n \n
","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"As seen above, PencilFFTs generally outperforms P3DFFT in its default setting. This is largely explained by the choice of using non-blocking point-to-point MPI communications (via MPI_Isend and MPI_Irecv), while P3DFFT uses collective MPI_Alltoallv calls. This enables PencilFFTs to perform data reordering operations on the partially received data while waiting for the incoming data, leading to better performance. Moreover, in contrast with P3DFFT, the high performance and scalability of PencilFFTs results from a highly generic code, handling decompositions in arbitrary dimensions and a relatively large (and extensible) variety of transformations.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"Note that PencilFFTs can optionally use collective communications (using MPI_Alltoallv) instead of point-to-point communications. For details, see the docs for PencilFFTPlan and for PencilArray transpositions. As seen above, collective communications generally perform worse than point-to-point ones, and runtimes are nearly indistinguishable from those of P3DFFT.","category":"page"},{"location":"benchmarks/#Benchmark-details","page":"Benchmarks","title":"Benchmark details","text":"","category":"section"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The benchmarks were performed using Julia 1.7-beta3 and Intel MPI 2019. We used PencilFFTs v0.12.5 with FFTW.jl v1.4.3 and MPI.jl v0.19.0. We used the Fortran implementation of P3DFFT, version 2.7.6, which was built with Intel 2019 compilers and linked to FFTW 3.3.8. The cluster where the benchmarks were run has Intel Cascade Lake 6248 processors with 2×20 cores per node.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The number of MPI processes along each decomposed dimension, P_1 and P_2, was automatically determined by a call to MPI_Dims_create, which tends to create a balanced decomposition with P_1 P_2. For instance, a total of 1024 processes is divided into P_1 = P_2 = 32. Different results may be obtained with other combinations, but this was not benchmarked.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The source files used to generate this benchmark, as well as the raw benchmark results, are all available in the PencilFFTs repo.","category":"page"},{"location":"GlobalFFTParams/#Global-FFT-parameters","page":"Global FFT parameters","title":"Global FFT parameters","text":"","category":"section"},{"location":"GlobalFFTParams/","page":"Global FFT parameters","title":"Global FFT parameters","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"GlobalFFTParams/","page":"Global FFT parameters","title":"Global FFT parameters","text":"GlobalFFTParams","category":"page"},{"location":"GlobalFFTParams/#PencilFFTs.GlobalFFTParams","page":"Global FFT parameters","title":"PencilFFTs.GlobalFFTParams","text":"GlobalFFTParams{T, N, inplace}\n\nSpecifies the global parameters for an N-dimensional distributed transform. These include the element type T and global data sizes of input and output data, as well as the transform types to be performed along each dimension.\n\n\n\nGlobalFFTParams(size_global, transforms, [real_type=Float64])\n\nDefine parameters for N-dimensional transform.\n\ntransforms must be a tuple of length N specifying the transforms to be applied along each dimension. Each element must be a subtype of Transforms.AbstractTransform. For all the possible transforms, see Transform types.\n\nThe element type must be a real type accepted by FFTW, i.e. either Float32 or Float64.\n\nNote that the transforms are applied one dimension at a time, with the leftmost dimension first for forward transforms.\n\nExample\n\nTo perform a 3D FFT of real data, first a real-to-complex FFT must be applied along the first dimension, followed by two complex-to-complex FFTs along the other dimensions:\n\njulia> size_global = (64, 32, 128); # size of real input data\n\njulia> transforms = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT());\n\njulia> fft_params = PencilFFTs.GlobalFFTParams(size_global, transforms)\nTransforms: (RFFT, FFT, FFT)\nInput type: Float64\nGlobal dimensions: (64, 32, 128) -> (33, 32, 128)\n\n\n\n\n\n","category":"type"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/in-place.jl\"","category":"page"},{"location":"generated/in-place/#In-place-transforms","page":"In-place transforms","title":"In-place transforms","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Complex-to-complex and real-to-real transforms can be performed in-place, enabling important memory savings. The procedure is very similar to that of out-of-place transforms described in the tutorial. The differences are illustrated in the sections below.","category":"page"},{"location":"generated/in-place/#Creating-in-place-plans","page":"In-place transforms","title":"Creating in-place plans","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"To create an in-place plan, pass an in-place transform such as Transforms.FFT! or Transforms.R2R! to PencilFFTPlan. For instance:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"using PencilFFTs\nusing MPI\nMPI.Init()\n\ndims = (16, 32, 64)\n\n# Perform a 3D in-place complex-to-complex FFT.\ntransform = Transforms.FFT!()\n\n# Note that one can also combine different types of in-place transforms.\n# For instance:\n# transform = (\n# Transforms.R2R!(FFTW.REDFT01),\n# Transforms.FFT!(),\n# Transforms.R2R!(FFTW.DHT),\n# )\n\ncomm = MPI.COMM_WORLD\nNproc = MPI.Comm_size(comm)\nproc_dims = (Nproc, ) # let's perform a 1D decomposition\n\n# Create plan\nplan = PencilFFTPlan(dims, transform, proc_dims, comm)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Note that in-place real-to-complex transforms are not currently supported. (In other words, the RFFT! transform type is not defined.)","category":"page"},{"location":"generated/in-place/#Allocating-data","page":"In-place transforms","title":"Allocating data","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"As with out-of-place plans, data should be allocated using allocate_input. The difference is that, for in-place plans, this function returns a ManyPencilArray object, which is a container holding multiple PencilArray views sharing the same memory space.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"# Allocate data for the plan.\n# Since `plan` is in-place, this returns a `ManyPencilArray` container.\nA = allocate_input(plan)\nsummary(A)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Note that allocate_output also works for in-place plans, but it returns exactly the same thing as allocate_input.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"As shown in the next section, in-place plans must be applied on the returned ManyPencilArray. On the other hand, one usually wants to access and modify data, and for this one needs the PencilArray views contained in the ManyPencilArray. The input and output array views can be obtained by calling first(::ManyPencilArray) and last(::ManyPencilArray).","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"For instance, we can initialise the input array with some data before transforming:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"using Random\nu_in = first(A) # input data view\nrandn!(u_in)\nsummary(u_in)","category":"page"},{"location":"generated/in-place/#Applying-plans","page":"In-place transforms","title":"Applying plans","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Like in FFTW.jl, one can perform in-place transforms using the * and \\ operators. As mentioned above, in-place plans must be applied on the ManyPencilArray containers returned by allocate_input.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan * A; # performs in-place forward transform\nnothing #hide","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"After performing an in-place transform, we usually want to do operations on the output data. For instance, let's compute the global sum of the transformed data:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"u_out = last(A) # output data view\nsum(u_out) # sum of transformed data (note that `sum` reduces over all processes)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Finally, we can perform a backward transform and do stuff with the input view:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan \\ A; # perform in-place backward transform\n\n# Now we can again do stuff with the input view `u_in`...","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"This page was generated using Literate.jl.","category":"page"},{"location":"Transforms/#Available-transforms","page":"Available transforms","title":"Available transforms","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"CurrentModule = PencilFFTs.Transforms","category":"page"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"Transforms","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms","page":"Available transforms","title":"PencilFFTs.Transforms","text":"Defines different one-dimensional FFT-based transforms.\n\nThe transforms are all subtypes of an AbstractTransform type.\n\nWhen possible, the names of the transforms are kept consistent with the functions exported by AbstractFFTs.jl and FFTW.jl.\n\n\n\n\n\n","category":"module"},{"location":"Transforms/#Transform-types","page":"Available transforms","title":"Transform types","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"FFT\nFFT!\nBFFT\nBFFT!\n\nRFFT\nBRFFT\n\nR2R\nR2R!\n\nNoTransform\nNoTransform!","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.FFT","page":"Available transforms","title":"PencilFFTs.Transforms.FFT","text":"FFT()\n\nComplex-to-complex FFT.\n\nSee also AbstractFFTs.fft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.FFT!","page":"Available transforms","title":"PencilFFTs.Transforms.FFT!","text":"FFT!()\n\nIn-place version of FFT.\n\nSee also AbstractFFTs.fft!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BFFT","page":"Available transforms","title":"PencilFFTs.Transforms.BFFT","text":"BFFT()\n\nUnnormalised backward complex-to-complex FFT.\n\nLike AbstractFFTs.bfft, this transform is not normalised. To obtain the inverse transform, divide the output by the length of the transformed dimension.\n\nSee also AbstractFFTs.bfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BFFT!","page":"Available transforms","title":"PencilFFTs.Transforms.BFFT!","text":"BFFT()\n\nIn-place version of BFFT.\n\nSee also AbstractFFTs.bfft!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.RFFT","page":"Available transforms","title":"PencilFFTs.Transforms.RFFT","text":"RFFT()\n\nReal-to-complex FFT.\n\nSee also AbstractFFTs.rfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BRFFT","page":"Available transforms","title":"PencilFFTs.Transforms.BRFFT","text":"BRFFT(d::Integer)\nBRFFT((d1, d2, ..., dN))\n\nUnnormalised inverse of RFFT.\n\nTo obtain the inverse transform, divide the output by the length of the transformed dimension (of the real output array).\n\nAs described in the AbstractFFTs docs, the length of the output cannot be fully inferred from the input length. For this reason, the BRFFT constructor accepts an optional d argument indicating the output length.\n\nFor multidimensional datasets, a tuple of dimensions (d1, d2, ..., dN) may also be passed. This is equivalent to passing just dN. In this case, the last dimension (dN) is the one that changes size between the input and output. Note that this is the opposite of FFTW.brfft. The reason is that, in PencilFFTs, the last dimension is the one along which a complex-to-real transform is performed.\n\nSee also AbstractFFTs.brfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.R2R","page":"Available transforms","title":"PencilFFTs.Transforms.R2R","text":"R2R(kind)\n\nReal-to-real transform of type kind.\n\nThe possible values of kind are those described in the FFTW.r2r docs and the FFTW manual:\n\ndiscrete cosine transforms: FFTW.REDFT00, FFTW.REDFT01, FFTW.REDFFT10, FFTW.REDFFT11\ndiscrete sine transforms: FFTW.RODFT00, FFTW.RODFT01, FFTW.RODFFT10, FFTW.RODFFT11\ndiscrete Hartley transform: FFTW.DHT\n\nNote: half-complex format DFTs (FFTW.R2HC, FFTW.HC2R) are not currently supported.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.R2R!","page":"Available transforms","title":"PencilFFTs.Transforms.R2R!","text":"R2R!(kind)\n\nIn-place version of R2R.\n\nSee also FFTW.r2r!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.NoTransform","page":"Available transforms","title":"PencilFFTs.Transforms.NoTransform","text":"NoTransform()\n\nIdentity transform.\n\nSpecifies that no transformation should be applied.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.NoTransform!","page":"Available transforms","title":"PencilFFTs.Transforms.NoTransform!","text":"NoTransform!()\n\nIn-place version of NoTransform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#Internals","page":"Available transforms","title":"Internals","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"What follows is used internally in PencilFFTs.","category":"page"},{"location":"Transforms/#Types","page":"Available transforms","title":"Types","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"AbstractCustomPlan\nAbstractTransform\nIdentityPlan\nIdentityPlan!\nPlan","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.AbstractCustomPlan","page":"Available transforms","title":"PencilFFTs.Transforms.AbstractCustomPlan","text":"AbstractCustomPlan\n\nAbstract type defining a custom plan, to be used as an alternative to FFTW plans (FFTW.FFTWPlan).\n\nThe only custom plan defined in this module is IdentityPlan. The user can define other custom plans that are also subtypes of AbstractCustomPlan.\n\nNote that plan returns a subtype of either FFTW.FFTWPlan or AbstractCustomPlan.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.AbstractTransform","page":"Available transforms","title":"PencilFFTs.Transforms.AbstractTransform","text":"AbstractTransform\n\nSpecifies a one-dimensional FFT-based transform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.IdentityPlan","page":"Available transforms","title":"PencilFFTs.Transforms.IdentityPlan","text":"IdentityPlan\n\nType of plan associated to NoTransform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.IdentityPlan!","page":"Available transforms","title":"PencilFFTs.Transforms.IdentityPlan!","text":"IdentityPlan!\n\nType of plan associated to NoTransform!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.Plan","page":"Available transforms","title":"PencilFFTs.Transforms.Plan","text":"Plan = Union{FFTW.FFTWPlan, AbstractCustomPlan}\n\nUnion type representing any plan returned by plan.\n\nSee also AbstractCustomPlan.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#Functions","page":"Available transforms","title":"Functions","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"plan\n\nbinv\nscale_factor\n\neltype_input\neltype_output\nexpand_dims\nis_inplace\nkind\nlength_output","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.plan","page":"Available transforms","title":"PencilFFTs.Transforms.plan","text":"plan(transform::AbstractTransform, A, [dims];\n flags=FFTW.ESTIMATE, timelimit=Inf)\n\nCreate plan to transform array A along dimensions dims.\n\nIf dims is not specified, all dimensions of A are transformed.\n\nFor FFT plans, this function wraps the AbstractFFTs.jl and FFTW.jl plan creation functions. For more details on the function arguments, see AbstractFFTs.plan_fft.\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.binv","page":"Available transforms","title":"PencilFFTs.Transforms.binv","text":"binv(transform::AbstractTransform, d::Integer)\n\nReturns the backwards transform associated to the given transform.\n\nThe second argument must be the length of the first transformed dimension in the forward transform. It is used in particular when transform = RFFT(), to determine the length of the inverse (complex-to-real) transform. See the AbstractFFTs.irfft docs for details.\n\nThe backwards transform returned by this function is not normalised. The normalisation factor for a given array can be obtained by calling scale_factor.\n\nExample\n\njulia> binv(Transforms.FFT(), 42)\nBFFT\n\njulia> binv(Transforms.BRFFT(9), 42)\nRFFT\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.scale_factor","page":"Available transforms","title":"PencilFFTs.Transforms.scale_factor","text":"scale_factor(transform::AbstractTransform, A, [dims = 1:ndims(A)])\n\nGet factor required to normalise the given array after a transformation along dimensions dims (all dimensions by default).\n\nThe array A must have the dimensions of the transform input.\n\nImportant: the dimensions dims must be the same that were passed to plan.\n\nExamples\n\njulia> C = zeros(ComplexF32, 3, 4, 5);\n\njulia> scale_factor(Transforms.FFT(), C)\n60\n\njulia> scale_factor(Transforms.BFFT(), C)\n60\n\njulia> scale_factor(Transforms.BFFT(), C, 2:3)\n20\n\njulia> R = zeros(Float64, 3, 4, 5);\n\njulia> scale_factor(Transforms.RFFT(), R, 2)\n4\n\njulia> scale_factor(Transforms.RFFT(), R, 2:3)\n20\n\njulia> scale_factor(Transforms.BRFFT(8), C)\n96\n\njulia> scale_factor(Transforms.BRFFT(9), C)\n108\n\nThis will fail because the input of RFFT is real, and R is a complex array:\n\njulia> scale_factor(Transforms.RFFT(), C, 2:3)\nERROR: MethodError: no method matching scale_factor(::PencilFFTs.Transforms.RFFT, ::Array{ComplexF32, 3}, ::UnitRange{Int64})\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.eltype_input","page":"Available transforms","title":"PencilFFTs.Transforms.eltype_input","text":"eltype_input(transform::AbstractTransform, real_type<:AbstractFloat)\n\nDetermine input data type for a given transform given the floating point precision of the input data.\n\nSome transforms, such as R2R and NoTransform, can take both real and complex data. For those kinds of transforms, nothing is returned.\n\nExample\n\njulia> eltype_input(Transforms.FFT(), Float32)\nComplexF32 (alias for Complex{Float32})\n\njulia> eltype_input(Transforms.RFFT(), Float64)\nFloat64\n\njulia> eltype_input(Transforms.R2R(FFTW.REDFT01), Float64) # nothing\n\njulia> eltype_input(Transforms.NoTransform(), Float64) # nothing\n\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.eltype_output","page":"Available transforms","title":"PencilFFTs.Transforms.eltype_output","text":"eltype_output(transform::AbstractTransform, eltype_input)\n\nReturns the output data type for a given transform given the input type.\n\nThrows ArgumentError if the input data type is incompatible with the transform type.\n\nExample\n\njulia> eltype_output(Transforms.NoTransform(), Float32)\nFloat32\n\njulia> eltype_output(Transforms.RFFT(), Float64)\nComplexF64 (alias for Complex{Float64})\n\njulia> eltype_output(Transforms.BRFFT(4), ComplexF32)\nFloat32\n\njulia> eltype_output(Transforms.FFT(), Float64)\nERROR: ArgumentError: invalid input data type for PencilFFTs.Transforms.FFT: Float64\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.expand_dims","page":"Available transforms","title":"PencilFFTs.Transforms.expand_dims","text":"expand_dims(transform::AbstractTransform, Val(N))\n\nExpand a single multidimensional transform into one transform per dimension.\n\nExample\n\n# Expand a real-to-complex transform in 3 dimensions.\njulia> expand_dims(Transforms.RFFT(), Val(3))\n(RFFT, FFT, FFT)\n\njulia> expand_dims(Transforms.BRFFT(4), Val(3))\n(BFFT, BFFT, BRFFT{even})\n\njulia> expand_dims(Transforms.NoTransform(), Val(2))\n(NoTransform, NoTransform)\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.is_inplace","page":"Available transforms","title":"PencilFFTs.Transforms.is_inplace","text":"is_inplace(transform::AbstractTransform) -> Bool\nis_inplace(transforms::Vararg{AbtractTransform}) -> Union{Bool, Nothing}\n\nCheck whether a transform or a list of transforms is performed in-place.\n\nIf the list of transforms has a combination of in-place and out-of-place transforms, nothing is returned.\n\nExample\n\njulia> is_inplace(Transforms.RFFT())\nfalse\n\njulia> is_inplace(Transforms.NoTransform!())\ntrue\n\njulia> is_inplace(Transforms.FFT!(), Transforms.R2R!(FFTW.REDFT01))\ntrue\n\njulia> is_inplace(Transforms.FFT(), Transforms.R2R(FFTW.REDFT01))\nfalse\n\njulia> is_inplace(Transforms.FFT(), Transforms.R2R!(FFTW.REDFT01)) === nothing\ntrue\n\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.kind","page":"Available transforms","title":"PencilFFTs.Transforms.kind","text":"kind(transform::R2R)\n\nGet kind of real-to-real transform.\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.length_output","page":"Available transforms","title":"PencilFFTs.Transforms.length_output","text":"length_output(transform::AbstractTransform, length_in::Integer)\n\nReturns the length of the transform output, given the length of its input.\n\nThe input and output lengths are specified in terms of the respective input and output datatypes. For instance, for real-to-complex transforms, these are respectively the length of input real data and of output complex data.\n\n\n\n\n\n","category":"function"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/navier_stokes.jl\"","category":"page"},{"location":"generated/navier_stokes/#Navier–Stokes-equations","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"In this example, we numerically solve the incompressible Navier–Stokes equations","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"_t bmv + (bmv bm) bmv = -frac1ρ bm p + ν ^2 bmv\nquad bm bmv = 0","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where bmv(bmx t) and p(bmx t) are respectively the velocity and pressure fields, ν is the fluid kinematic viscosity and ρ is the fluid density.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We solve the above equations a 3D periodic domain using a standard Fourier pseudo-spectral method.","category":"page"},{"location":"generated/navier_stokes/#First-steps","page":"Navier–Stokes equations","title":"First steps","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We start by loading the required packages, initialising MPI and setting the simulation parameters.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using MPI\nusing PencilFFTs\n\nMPI.Init()\ncomm = MPI.COMM_WORLD\nprocid = MPI.Comm_rank(comm) + 1\n\n# Simulation parameters\nNs = (64, 64, 64) # = (Nx, Ny, Nz)\nLs = (2π, 2π, 2π) # = (Lx, Ly, Lz)\n\n# Collocation points (\"global\" = over all processes).\n# We include the endpoint (length = N + 1) for convenience.\nxs_global = map((N, L) -> range(0, L; length = N + 1), Ns, Ls) # = (x, y, z)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's check the number of MPI processes over which we're running our simulation:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"MPI.Comm_size(comm)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can now create a partitioning of the domain based on the number of grid points (Ns) and on the number of MPI processes. There are different ways to do this. For simplicity, here we do it automatically following the PencilArrays.jl docs:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"pen = Pencil(Ns, comm)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"The subdomain associated to the local MPI process can be obtained using range_local:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"range_local(pen)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We now construct a distributed vector field that follows the decomposition configuration we just created:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v⃗₀ = (\n PencilArray{Float64}(undef, pen), # vx\n PencilArray{Float64}(undef, pen), # vy\n PencilArray{Float64}(undef, pen), # vz\n)\nsummary(v⃗₀[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We still need to fill this array with interesting values that represent a physical velocity field.","category":"page"},{"location":"generated/navier_stokes/#Initial-condition","page":"Navier–Stokes equations","title":"Initial condition","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's set the initial condition in physical space. In this example, we choose the Taylor–Green vortex configuration as an initial condition:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"beginaligned\nv_x(x y z) = u₀ sin(k₀ x) cos(k₀ y) cos(k₀ z) \nv_y(x y z) = -u₀ cos(k₀ x) sin(k₀ y) cos(k₀ z) \nv_z(x y z) = 0\nendaligned","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where u₀ and k₀ are two parameters setting the amplitude and the period of the velocity field.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To set the initial condition, each MPI process needs to know which portion of the physical grid it has been attributed. For this, PencilArrays.jl includes a localgrid helper function:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"grid = localgrid(pen, xs_global)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can use this to initialise the velocity field:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"u₀ = 1.0\nk₀ = 2π / Ls[1] # should be integer if L = 2π (to preserve periodicity)\n\n@. v⃗₀[1] = u₀ * sin(k₀ * grid.x) * cos(k₀ * grid.y) * cos(k₀ * grid.z)\n@. v⃗₀[2] = -u₀ * cos(k₀ * grid.x) * sin(k₀ * grid.y) * cos(k₀ * grid.z)\n@. v⃗₀[3] = 0\nnothing # hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's plot a 2D slice of the velocity field managed by the local MPI process:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using GLMakie\n\n# Compute the norm of a vector field represented by a tuple of arrays.\nfunction vecnorm(v⃗::NTuple)\n vnorm = similar(v⃗[1])\n for n ∈ eachindex(v⃗[1])\n w = zero(eltype(vnorm))\n for v ∈ v⃗\n w += v[n]^2\n end\n vnorm[n] = sqrt(w)\n end\n vnorm\nend\n\nlet fig = Figure(resolution = (700, 600))\n ax = Axis3(fig[1, 1]; aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\")\n vnorm = vecnorm(v⃗₀)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, vnorm;\n alpha = 0.2, levels = 4,\n colormap = :viridis, colorrange = (0.0, 1.0),\n )\n cb = Colorbar(fig[1, 2], ct; label = \"Velocity magnitude\")\n fig\nend","category":"page"},{"location":"generated/navier_stokes/#Velocity-in-Fourier-space","page":"Navier–Stokes equations","title":"Velocity in Fourier space","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"In the Fourier pseudo-spectral method, the periodic velocity field is discretised in space as a truncated Fourier series","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"bmv(bmx t) =\n_bmk hatbmv_bmk(t) e^i bmk bmx","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where bmk = (k_x k_y k_z) are the discrete wave numbers.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"The wave numbers can be obtained using the fftfreq function. Since we perform a real-to-complex transform along the first dimension, we use rfftfreq instead for k_x:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using AbstractFFTs: fftfreq, rfftfreq\n\nks_global = (\n rfftfreq(Ns[1], 2π * Ns[1] / Ls[1]), # kx | real-to-complex\n fftfreq(Ns[2], 2π * Ns[2] / Ls[2]), # ky | complex-to-complex\n fftfreq(Ns[3], 2π * Ns[3] / Ls[3]), # kz | complex-to-complex\n)\n\nks_global[1]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ks_global[2]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ks_global[3]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To transform the velocity field to Fourier space, we first create a real-to-complex FFT plan to be applied to one of the velocity components:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"plan = PencilFFTPlan(v⃗₀[1], Transforms.RFFT())","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"See PencilFFTPlan for details on creating plans and on optional keyword arguments.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can now apply this plan to the three velocity components to obtain the respective Fourier coefficients hatbmv_bmk:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v̂s = plan .* v⃗₀\nsummary(v̂s[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Note that, in Fourier space, the domain decomposition is performed along the directions x and y:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"pencil(v̂s[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"This is because the 3D FFTs are performed one dimension at a time, with the x direction first and the z direction last. To efficiently perform an FFT along a given direction (taking advantage of serial FFT implementations like FFTW), all the data along that direction must be contained locally within a single MPI process. For that reason, data redistributions (or transpositions) among MPI processes are performed behind the scenes during each FFT computation. Such transpositions require important communications between MPI processes, and are usually the most time-consuming aspect of massively-parallel simulations using this kind of methods.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To solve the Navier–Stokes equations in Fourier space, we will also need the respective wave numbers bmk associated to the local MPI process. Similarly to the local grid points, these are obtained using the localgrid function:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"grid_fourier = localgrid(v̂s[1], ks_global)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"As an example, let's first use this to compute and plot the vorticity associated to the initial condition. The vorticity is defined as the curl of the velocity, bmω = bm bmv. In Fourier space, this becomes hatbmω = i bmk hatbmv.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using StaticArrays: SVector\nusing LinearAlgebra: ×\n\nfunction curl_fourier!(\n ω̂s::NTuple{N, <:PencilArray}, v̂s::NTuple{N, <:PencilArray}, grid_fourier,\n ) where {N}\n @inbounds for I ∈ eachindex(grid_fourier)\n # We use StaticArrays for the cross product between small vectors.\n ik⃗ = im * SVector(grid_fourier[I])\n v⃗ = SVector(getindex.(v̂s, Ref(I))) # = (v̂s[1][I], v̂s[2][I], ...)\n ω⃗ = ik⃗ × v⃗\n for n ∈ eachindex(ω⃗)\n ω̂s[n][I] = ω⃗[n]\n end\n end\n ω̂s\nend\n\nω̂s = similar.(v̂s)\ncurl_fourier!(ω̂s, v̂s, grid_fourier);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We finally transform back to physical space and plot the result:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ωs = plan .\\ ω̂s\n\nlet fig = Figure(resolution = (700, 600))\n ax = Axis3(fig[1, 1]; aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\")\n ω_norm = vecnorm(ωs)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, ω_norm;\n alpha = 0.1, levels = 0.8:0.2:2.0,\n colormap = :viridis, colorrange = (0.8, 2.0),\n )\n cb = Colorbar(fig[1, 2], ct; label = \"Vorticity magnitude\")\n fig\nend","category":"page"},{"location":"generated/navier_stokes/#Computing-the-non-linear-term","page":"Navier–Stokes equations","title":"Computing the non-linear term","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"One can show that, in Fourier space, the incompressible Navier–Stokes equations can be written as","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"_t hatbmv_bmk =\n- mathcalP_bmk left widehat(bmv bm) bmv right\n- ν bmk^2 hatbmv_bmk\nquad text with quad\nmathcalP_bmk(hatbmF_bmk) = left( I - fracbmk \nbmkbmk^2 right) hatbmF_bmk","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where mathcalP_bmk is a projection operator allowing to preserve the incompressibility condition bm bmv = 0. This operator encodes the action of the pressure gradient term, which serves precisely to enforce incompressibility. Note that, because of this, the pressure gradient dissapears from the equations.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Now that we have the wave numbers bmk, computing the linear viscous term in Fourier space is straighforward once we have the Fourier coefficients hatbmv_bmk of the velocity field. What is slightly more challenging (and much more costly) is the computation of the non-linear term in Fourier space, hatbmF_bmk = left widehat(bmv bm) bmv right_bmk. In the pseudo-spectral method, the quadratic nonlinearity is computed by collocation in physical space (i.e. this term is evaluated at grid points), while derivatives are computed in Fourier space. This requires transforming fields back and forth between both spaces.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Below we implement a function that computes the non-linear term in Fourier space based on its convective form (bmv bm) bmv = bm (bmv bmv). Note that this equivalence uses the incompressibility condition bm bmv = 0.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using LinearAlgebra: mul!, ldiv! # for applying FFT plans in-place\n\n# Compute non-linear term in Fourier space from velocity field in physical\n# space. Optional keyword arguments may be passed to avoid allocations.\nfunction ns_nonlinear!(\n F̂s, vs, plan, grid_fourier;\n vbuf = similar(vs[1]), v̂buf = similar(F̂s[1]),\n )\n # Compute F_i = ∂_j (v_i v_j) for each i.\n # In Fourier space: F̂_i = im * k_j * FFT(v_i * v_j)\n w, ŵ = vbuf, v̂buf\n @inbounds for (i, F̂i) ∈ enumerate(F̂s)\n F̂i .= 0\n vi = vs[i]\n for (j, vj) ∈ enumerate(vs)\n w .= vi .* vj # w = v_i * v_j in physical space\n mul!(ŵ, plan, w) # same in Fourier space\n # Add derivative in Fourier space\n for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n kj = k⃗[j]\n F̂i[I] += im * kj * ŵ[I]\n end\n end\n end\n F̂s\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"As an example, let's use this function on our initial velocity field:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"F̂s = similar.(v̂s)\nns_nonlinear!(F̂s, v⃗₀, plan, grid_fourier);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Strictly speaking, computing the non-linear term by collocation can lead to aliasing errors, as the quadratic term excites Fourier modes that fall beyond the range of resolved wave numbers. The typical solution is to apply Orzsag's 2/3 rule to zero-out the Fourier coefficients associated to the highest wave numbers. We define a function that applies this procedure below.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function dealias_twothirds!(ŵs::Tuple, grid_fourier, ks_global)\n ks_max = maximum.(abs, ks_global) # maximum stored wave numbers (kx_max, ky_max, kz_max)\n ks_lim = (2 / 3) .* ks_max\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I]\n if any(abs.(k⃗) .> ks_lim)\n for ŵ ∈ ŵs\n ŵ[I] = 0\n end\n end\n end\n ŵs\nend\n\n# We can apply this on the previously computed non-linear term:\ndealias_twothirds!(F̂s, grid_fourier, ks_global);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Finally, we implement the projection associated to the incompressibility condition:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function project_divergence_free!(ûs, grid_fourier)\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I]\n k² = sum(abs2, k⃗)\n iszero(k²) && continue # avoid division by zero\n û = getindex.(ûs, Ref(I)) # (ûs[1][I], ûs[2][I], ...)\n for i ∈ eachindex(û)\n ŵ = û[i]\n for j ∈ eachindex(û)\n ŵ -= k⃗[i] * k⃗[j] * û[j] / k²\n end\n ûs[i][I] = ŵ\n end\n end\n ûs\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can verify the correctness of the projection operator by checking that the initial velocity field is not modified by it, since it is already incompressible:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v̂s_proj = project_divergence_free!(copy.(v̂s), grid_fourier)\nv̂s_proj .≈ v̂s # the last one may be false because v_z = 0 initially","category":"page"},{"location":"generated/navier_stokes/#Putting-it-all-together","page":"Navier–Stokes equations","title":"Putting it all together","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To perform the time integration of the Navier–Stokes equations, we will use the timestepping routines implemented in the DifferentialEquations.jl suite. For simplicity, we use here an explicit Runge–Kutta scheme. In this case, we just need to write a function that computes the right-hand side of the Navier–Stokes equations in Fourier space:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function ns_rhs!(\n dvs::NTuple{N, <:PencilArray}, vs::NTuple{N, <:PencilArray}, p, t,\n ) where {N}\n # 1. Compute non-linear term and dealias it\n (; plan, cache, ks_global, grid_fourier) = p\n F̂s = cache.F̂s\n ns_nonlinear!(F̂s, vs, plan, grid_fourier; vbuf = dvs[1], v̂buf = cache.v̂s[1])\n dealias_twothirds!(F̂s, grid_fourier, ks_global)\n\n # 2. Project onto divergence-free space\n project_divergence_free!(F̂s, grid_fourier)\n\n # 3. Transform velocity to Fourier space\n v̂s = cache.v̂s\n map((v, v̂) -> mul!(v̂, plan, v), vs, v̂s)\n\n # 4. Add viscous term (and multiply projected non-linear term by -1)\n ν = p.ν\n for n ∈ eachindex(v̂s)\n v̂ = v̂s[n]\n F̂ = F̂s[n]\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n k² = sum(abs2, k⃗)\n F̂[I] = -F̂[I] - ν * k² * v̂[I]\n end\n end\n\n # 5. Transform RHS back to physical space\n map((dv, dv̂) -> ldiv!(dv, plan, dv̂), dvs, F̂s)\n\n nothing\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"For the time-stepping, we load OrdinaryDiffEq.jl from the DifferentialEquations.jl suite and set-up the simulation. Since DifferentialEquations.jl can't directly deal with tuples of arrays, we convert the input data to the ArrayPartition type and write an interface function to make things work with our functions defined above.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using OrdinaryDiffEq\nusing RecursiveArrayTools: ArrayPartition\n\nns_rhs!(dv::ArrayPartition, v::ArrayPartition, args...) = ns_rhs!(dv.x, v.x, args...)\nvs_init_ode = ArrayPartition(v⃗₀)\nsummary(vs_init_ode)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We now define solver parameters and temporary variables, and initialise the problem:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"params = (;\n ν = 5e-3, # kinematic viscosity\n plan, grid_fourier, ks_global,\n cache = (\n v̂s = similar.(v̂s),\n F̂s = similar.(v̂s),\n )\n)\n\ntspan = (0.0, 10.0)\nprob = ODEProblem(ns_rhs!, vs_init_ode, tspan, params)\nintegrator = init(prob, RK4(); dt = 1e-3, save_everystep = false);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We finally solve the problem over time and plot the vorticity associated to the solution. It is also useful to look at the energy spectrum E(k), to see if the small scales are correctly resolved. To obtain a turbulent flow, the viscosity ν must be small enough to allow the transient appearance of an energy cascade towards the small scales (i.e. from small to large k), while high enough to allow the small-scale motions to be correctly resolved.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function energy_spectrum!(Ek, ks, v̂s, grid_fourier)\n Nk = length(Ek)\n @assert Nk == length(ks)\n Ek .= 0\n for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n knorm = sqrt(sum(abs2, k⃗))\n i = searchsortedfirst(ks, knorm)\n i > Nk && continue\n v⃗ = getindex.(v̂s, Ref(I)) # = (v̂s[1][I], v̂s[2][I], ...)\n factor = k⃗[1] == 0 ? 1 : 2 # account for Hermitian symmetry and r2c transform\n Ek[i] += factor * sum(abs2, v⃗) / 2\n end\n MPI.Allreduce!(Ek, +, get_comm(v̂s[1])) # sum across all processes\n Ek\nend\n\nks = rfftfreq(Ns[1], 2π * Ns[1] / Ls[1])\nEk = similar(ks)\nv̂s = plan .* integrator.u.x\nenergy_spectrum!(Ek, ks, v̂s, grid_fourier)\nEk ./= scale_factor(plan)^2 # rescale energy\n\ncurl_fourier!(ω̂s, v̂s, grid_fourier)\nldiv!.(ωs, plan, ω̂s)\nω⃗_plot = Observable(ωs)\nk_plot = @view ks[2:end]\nE_plot = Observable(@view Ek[2:end])\nt_plot = Observable(integrator.t)\n\nfig = let\n fig = Figure(resolution = (1200, 600))\n ax = Axis3(\n fig[1, 1][1, 1]; title = @lift(\"t = $(round($t_plot, digits = 3))\"),\n aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\",\n )\n ω_mag = @lift vecnorm($ω⃗_plot)\n ω_mag_norm = @lift $ω_mag ./ maximum($ω_mag)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, ω_mag_norm;\n alpha = 0.3, levels = 3,\n colormap = :viridis, colorrange = (0.0, 1.0),\n )\n cb = Colorbar(fig[1, 1][1, 2], ct; label = \"Normalised vorticity magnitude\")\n ax_sp = Axis(\n fig[1, 2];\n xlabel = \"k\", ylabel = \"E(k)\", xscale = log2, yscale = log10,\n title = \"Kinetic energy spectrum\",\n )\n ylims!(ax_sp, 1e-8, 1e0)\n scatterlines!(ax_sp, k_plot, E_plot)\n ks_slope = exp.(range(log(2.5), log(25.0), length = 3))\n E_fivethirds = @. 0.3 * ks_slope^(-5/3)\n @views lines!(ax_sp, ks_slope, E_fivethirds; color = :black, linestyle = :dot)\n text!(ax_sp, L\"k^{-5/3}\"; position = (ks_slope[2], E_fivethirds[2]), align = (:left, :bottom))\n fig\nend\n\nusing Printf # hide\nwith_xvfb = ENV[\"DISPLAY\"] == \":99\" # hide\nnstep = 0 # hide\nconst tmpdir = mktempdir() # hide\nfilename_frame(procid, nstep) = joinpath(tmpdir, @sprintf(\"proc%d_%04d.png\", procid, nstep)) # hide\nrecord(fig, \"vorticity_proc$procid.mp4\"; framerate = 10) do io\n with_xvfb && recordframe!(io) # hide\n while integrator.t < 20\n dt = 0.001\n step!(integrator, dt)\n t_plot[] = integrator.t\n mul!.(v̂s, plan, integrator.u.x) # current velocity in Fourier space\n curl_fourier!(ω̂s, v̂s, grid_fourier)\n ldiv!.(ω⃗_plot[], plan, ω̂s)\n ω⃗_plot[] = ω⃗_plot[] # to force updating the plot\n energy_spectrum!(Ek, ks, v̂s, grid_fourier)\n Ek ./= scale_factor(plan)^2 # rescale energy\n E_plot[] = E_plot[]\n global nstep += 1 # hide\n with_xvfb ? # hide\n save(filename_frame(procid, nstep), fig) : # hide\n recordframe!(io)\n end\nend;\n\nif with_xvfb # hide\n run(pipeline(`ffmpeg -y -r 10 -i $tmpdir/proc$(procid)_%04d.png -c:v libx264 -vf \"fps=25,format=yuv420p\" vorticity_proc$procid.mp4`; stdout = \"ffmpeg.out\", stderr = \"ffmpeg.err\")) # hide\nend # hide\nnothing # hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"This page was generated using Literate.jl.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/gradient.jl\"","category":"page"},{"location":"generated/gradient/#Gradient-of-a-scalar-field","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This example shows different methods to compute the gradient of a real-valued 3D scalar field θ(bmx) in Fourier space, where bmx = (x y z). It is assumed that the field is periodic with period L = 2π along all dimensions.","category":"page"},{"location":"generated/gradient/#General-procedure","page":"Gradient of a scalar field","title":"General procedure","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The discrete Fourier expansion of θ writes","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"θ(bmx) = _bmk Z^3 hatθ(bmk) e^i bmk bmx","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"where bmk = (k_x k_y k_z) are the Fourier wave numbers and hatθ is the discrete Fourier transform of θ. Then, the spatial derivatives of θ are given by","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"frac θ(bmx) x_i =\n_bmk Z^3 i k_i hatθ(bmk) e^i bmk bmx","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"where the subscript i denotes one of the spatial components x, y or z.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In other words, to compute bm θ = (_x θ _y θ _z θ), one has to:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"transform θ to Fourier space to obtain hatθ,\nmultiply hatθ by i bmk,\ntransform the result back to physical space to obtain bm θ.","category":"page"},{"location":"generated/gradient/#Preparation","page":"Gradient of a scalar field","title":"Preparation","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In this section, we initialise a random real-valued scalar field θ and compute its FFT. For more details see the Tutorial.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"using MPI\nusing PencilFFTs\nusing Random\n\nMPI.Init()\n\n# Input data dimensions (Nx × Ny × Nz)\ndims = (64, 32, 64)\n\n# Apply a 3D real-to-complex (r2c) FFT.\ntransform = Transforms.RFFT()\n\n# Automatically create decomposition configuration\ncomm = MPI.COMM_WORLD\npen = Pencil(dims, comm)\n\n# Create plan\nplan = PencilFFTPlan(pen, transform)\n\n# Allocate data and initialise field\nθ = allocate_input(plan)\nrandn!(θ)\n\n# Perform distributed FFT\nθ_hat = plan * θ\nnothing # hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, we initialise the output that will hold ∇θ in Fourier space. Noting that ∇θ is a vector field, we choose to store it as a tuple of 3 PencilArrays.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"∇θ_hat = allocate_output(plan, Val(3))\n\n# This is equivalent:\n# ∇θ_hat = ntuple(d -> similar(θ_hat), Val(3))\n\nsummary(∇θ_hat)\n\n# Fourier wave numbers","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In general, the Fourier wave numbers are of the form k_i = 0 frac2πL_i frac4πL_i frac6πL_i , where L_i is the period along dimension i. When a real-to-complex Fourier transform is applied, roughly half of these wave numbers are redundant due to the Hermitian symmetry of the complex Fourier coefficients. In practice, this means that for the fastest dimension x (along which a real-to-complex transform is performed), the negative wave numbers are dropped, i.e. k_x = 0 frac2πL_x frac4πL_x .","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The AbstractFFTs package provides a convenient way to generate the Fourier wave numbers, using the functions fftfreq and rfftfreq. We can use these functions to initialise a \"grid\" of wave numbers associated to our 3D real-to-complex transform:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"using AbstractFFTs: fftfreq, rfftfreq\n\nbox_size = (2π, 2π, 2π) # Lx, Ly, Lz\nsample_rate = 2π .* dims ./ box_size\n\n# In our case (Lx = 2π and Nx even), this gives kx = [0, 1, 2, ..., Nx/2].\nkx = rfftfreq(dims[1], sample_rate[1])\n\n# In our case (Ly = 2π and Ny even), this gives\n# ky = [0, 1, 2, ..., Ny/2-1, -Ny/2, -Ny/2+1, ..., -1] (and similarly for kz).\nky = fftfreq(dims[2], sample_rate[2])\nkz = fftfreq(dims[3], sample_rate[3])\n\nkvec = (kx, ky, kz)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that kvec now contains the wave numbers associated to the global domain. In the following, we will only need the wave numbers associated to the portion of the domain handled by the local MPI process.","category":"page"},{"location":"generated/gradient/#gradient_method_global","page":"Gradient of a scalar field","title":"Method 1: global views","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"PencilArrays, returned for instance by allocate_input and allocate_output, take indices that start at 1, regardless of the location of the subdomain associated to the local process on the global grid. (In other words, PencilArrays take local indices.) On the other hand, we have defined the wave number vector kvec which, for each MPI process, is defined over the global domain, and as such it takes global indices.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"One straightforward way of making data arrays compatible with wave numbers is to use global views, i.e. arrays that take global indices. These are generated from PencilArrays by calling the global_view function. Note that, in general, global indices do not start at 1 for a given MPI process. A given process will own a range of data given by indices in (i1:i2, j1:j2, k1:k2).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"θ_glob = global_view(θ_hat)\n∇θ_glob = global_view.(∇θ_hat)\nsummary(θ_glob)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Once we have global views, we can combine data and wave numbers using the portion of global indices owned by the local MPI process, as shown below. We can use CartesianIndices to iterate over the global indices associated to the local process.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"for I in CartesianIndices(θ_glob)\n i, j, k = Tuple(I) # unpack indices\n\n # Wave number vector associated to current Cartesian index.\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n # Compute gradient in Fourier space.\n # Note that modifying ∇θ_glob also modifies the original PencilArray ∇θ_hat.\n ∇θ_glob[1][I] = im * kx * θ_glob[I]\n ∇θ_glob[2][I] = im * ky * θ_glob[I]\n ∇θ_glob[3][I] = im * kz * θ_glob[I]\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The above loop can be written in a slightly more efficient manner by precomputing im * θ_glob[I]:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@inbounds for I in CartesianIndices(θ_glob)\n i, j, k = Tuple(I)\n\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n u = im * θ_glob[I]\n\n ∇θ_glob[1][I] = kx * u\n ∇θ_glob[2][I] = ky * u\n ∇θ_glob[3][I] = kz * u\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Also note that the above can be easily written in a more generic way, e.g. for arbitrary dimensions, thanks in part to the use of CartesianIndices. Moreover, in the above there is no notion of the dimension permutations discussed in the tutorial, as it is all hidden behind the implementation of PencilArrays. And as seen later in the benchmarks, these (hidden) permutations have zero cost, as the speed is identical to that of a function that explicitly takes into account these permutations.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, we can perform a backwards transform to obtain bm θ in physical space:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"∇θ = plan \\ ∇θ_hat;\nnothing #hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that the transform is automatically broadcast over the three fields of the ∇θ_hat vector, and the result ∇θ is also a tuple of three PencilArrays.","category":"page"},{"location":"generated/gradient/#gradient_method_global_explicit","page":"Gradient of a scalar field","title":"Method 2: explicit global indexing","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Sometimes, one does not need to write generic code. In our case, one often knows the dimensionality of the problem and the memory layout of the data (i.e. the underlying index permutation).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Below is a reimplementation of the above loop, using explicit indices instead of CartesianIndices, and assuming that the underlying index permutation is (3, 2, 1), that is, data is stored in (z y x) order. As discussed in the tutorial, this is the default for transformed arrays. This example also serves as a more explicit explanation for what is going on in the first method.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"# Get local data range in the global grid.\nrng = axes(θ_glob) # = (i1:i2, j1:j2, k1:k2)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"For the loop below, we're assuming that the permutation is (3, 2, 1). In other words, the fastest index is the last one, and not the first one as it is usually in Julia. If the permutation is not (3, 2, 1), things will still work (well, except for the assertion below!), but the loop order will not be optimal.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@assert permutation(θ_hat) === Permutation(3, 2, 1)\n\n@inbounds for i in rng[1], j in rng[2], k in rng[3]\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n # Note that we still access the arrays in (i, j, k) order.\n # (The permutation happens behind the scenes!)\n u = im * θ_glob[i, j, k]\n\n ∇θ_glob[1][i, j, k] = kx * u\n ∇θ_glob[2][i, j, k] = ky * u\n ∇θ_glob[3][i, j, k] = kz * u\nend","category":"page"},{"location":"generated/gradient/#gradient_method_local","page":"Gradient of a scalar field","title":"Method 3: using local indices","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Alternatively, we can avoid global views and work directly on PencilArrays using local indices that start at 1. In this case, part of the strategy is to construct a \"local\" grid of wave numbers that can also be accessed with local indices. This can be conveniently done using the localgrid function of the PencilArrays.jl package, which accepts a PencilArray (or its associated Pencil) and the global coordinates (here kvec):","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"grid_fourier = localgrid(θ_hat, kvec)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that one can directly iterate on the returned grid object:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@inbounds for I in CartesianIndices(grid_fourier)\n # Wave number vector associated to current Cartesian index.\n local k⃗ # hide\n k⃗ = grid_fourier[I]\n u = im * θ_hat[I]\n ∇θ_hat[1][I] = k⃗[1] * u\n ∇θ_hat[2][I] = k⃗[2] * u\n ∇θ_hat[3][I] = k⃗[3] * u\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This implementation is as efficient as the other examples, while being slightly shorter to write. Moreover, it is quite generic, and can be made independent of the number of dimensions with little effort.","category":"page"},{"location":"generated/gradient/#Method-4:-using-broadcasting","page":"Gradient of a scalar field","title":"Method 4: using broadcasting","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, note that the local grid object returned by localgrid makes it is possible to compute the gradient using broadcasting, thus fully avoiding scalar indexing. This can be quite convenient in some cases, and can also be very useful if one is working on GPUs (where scalar indexing is prohibitively expensive). Using broadcasting, the above examples simply become:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@. ∇θ_hat[1] = im * grid_fourier[1] * θ_hat\n@. ∇θ_hat[2] = im * grid_fourier[2] * θ_hat\n@. ∇θ_hat[3] = im * grid_fourier[3] * θ_hat\nnothing # hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Once again, as shown in the benchmarks further below, this method performs quite similarly to the other ones.","category":"page"},{"location":"generated/gradient/#Summary","page":"Gradient of a scalar field","title":"Summary","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The PencilArrays module provides different alternatives to deal with MPI-distributed data that may be subject to dimension permutations. In particular, one can choose to work with global indices (first two examples) or with local indices (third example).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"If one wants to stay generic, making sure that the same code will work for arbitrary dimensions and will be efficient regardless of the underlying dimension permutation, methods 1 and 3 should be preferred. These use CartesianIndices and make no assumptions on the permutations (actually, permutations are completely invisible in the implementations).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The second method uses explicit (i, j, k) indices. It assumes that the underlying permutation is (3, 2, 1) to loop with i as the slowest index and k as the fastest, which is the optimal order in this case given the permutation. As such, the implementation is less generic than the others, and differences in performance are negligible with respect to more generic variants.","category":"page"},{"location":"generated/gradient/#gradient_benchmarks","page":"Gradient of a scalar field","title":"Benchmark results","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The following are the benchmark results obtained from running examples/gradient.jl on a laptop, using 2 MPI processes and Julia 1.7.2, with an input array of global dimensions 64 32 64. The different methods detailed above are marked on the right. The \"lazy\" marks indicate runs where the wave numbers were represented by lazy Frequencies objects (returned by rfftfreq and fftfreq). Otherwise, they were collected into Vectors. For some reason, plain Vectors are faster when working with grids generated by localgrid.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In the script, additional implementations can be found which rely on a more advanced understanding of permutations and on the internals of the PencilArrays package. For instance, gradient_local_parent! directly works with the raw data stored in Julia Arrays, while gradient_local_linear! completely avoids CartesianIndices while staying generic and efficient. Nevertheless, these display roughly the same performance as the above examples.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":" gradient_global_view!... 89.900 μs\n gradient_global_view! (lazy)... 92.060 μs [Method 1]\n gradient_global_view_explicit!... 88.958 μs\n gradient_global_view_explicit! (lazy)... 81.055 μs [Method 2]\n gradient_local!... 92.305 μs\n gradient_grid!... 92.770 μs\n gradient_grid! (lazy)... 101.388 μs [Method 3]\n gradient_grid_broadcast!... 88.606 μs\n gradient_grid_broadcast! (lazy)... 151.020 μs [Method 4]\n gradient_local_parent!... 92.248 μs\n gradient_local_linear!... 91.212 μs\n gradient_local_linear_explicit!... 90.992 μs","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This page was generated using Literate.jl.","category":"page"},{"location":"PencilFFTs_timers/#PencilFFTs.measuring_performance","page":"Measuring performance","title":"Measuring performance","text":"","category":"section"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"It is possible to measure the time spent in different sections of the distributed transforms using the TimerOutputs package. This has a (very small) performance overhead, so it is disabled by default. To enable time measurements, call TimerOutputs.enable_debug_timings after loading PencilFFTs (see below for an example). For more details see the TimerOutputs docs.","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"Minimal example:","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"using MPI\nusing PencilFFTs\nusing TimerOutputs\n\n# Enable timing of `PencilFFTs` functions\nTimerOutputs.enable_debug_timings(PencilFFTs)\nTimerOutputs.enable_debug_timings(PencilArrays)\nTimerOutputs.enable_debug_timings(Transpositions)\n\nMPI.Init()\n\nplan = PencilFFTPlan(#= args... =#)\n\n# [do stuff with `plan`...]\n\n# Retrieve and print timing data associated to `plan`\nto = timer(plan)\nprint_timer(to)","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"By default, each PencilFFTPlan has its own TimerOutput. If you already have a TimerOutput, you can pass it to the PencilFFTPlan constructor:","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"to = TimerOutput()\nplan = PencilFFTPlan(..., timer=to)\n\n# [do stuff with `plan`...]\n\nprint_timer(to)","category":"page"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"#PencilFFTs","page":"Home","title":"PencilFFTs","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Fast Fourier transforms of MPI-distributed Julia arrays.","category":"page"},{"location":"#Introduction","page":"Home","title":"Introduction","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package provides multidimensional FFTs and related transforms on MPI-distributed Julia arrays via the PencilArrays package.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The name of this package originates from the decomposition of 3D domains along two out of three dimensions, sometimes called pencil decomposition. This is illustrated by the figure below,[1] where each coloured block is managed by a different MPI process. Typically, one wants to compute FFTs on a scalar or vector field along the three spatial dimensions. In the case of a pencil decomposition, 3D FFTs are performed one dimension at a time, along the non-decomposed direction. Transforms must then be interleaved with global data transpositions to switch between pencil configurations. In high-performance computing environments, such data transpositions are generally the most expensive part of a parallel FFT computation, due to the large cost of communications between computing nodes.","category":"page"},{"location":"","page":"Home","title":"Home","text":"
\n \n
","category":"page"},{"location":"","page":"Home","title":"Home","text":"More generally, PencilFFTs allows to decompose and perform FFTs on geometries of arbitrary dimension N. The decompositions can be performed along an arbitrary number M N of dimensions.[2] Moreover, the transforms applied along each dimension can be arbitrarily chosen (and combined) among those supported by FFTW.jl, including complex-to-complex, real-to-complex and real-to-real transforms.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The generic and efficient implementation of this package is greatly enabled by the use of zero-cost abstractions in Julia. As shown in the Benchmarks section, PencilFFTs scales well to large numbers of processes, and performs similarly to the Fortran implementation of P3DFFT, possibly the most popular library for computing parallel FFTs using 2D domain decomposition.","category":"page"},{"location":"#Features","page":"Home","title":"Features","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"distributed N-dimensional FFTs of MPI-distributed Julia arrays, using the PencilArrays package;\nFFTs and related transforms (e.g. DCTs / Chebyshev transforms) may be arbitrarily combined along different dimensions;\nin-place and out-of-place transforms;\nhigh scalability up to (at least) tens of thousands of MPI processes.","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"PencilFFTs can be installed using the Julia package manager:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> ] add PencilFFTs","category":"page"},{"location":"#Similar-projects","page":"Home","title":"Similar projects","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"FFTW3 implements distributed-memory transforms using MPI, but these are limited to 1D decompositions. Also, this functionality is not currently included in the FFTW.jl wrappers.\nPFFT is a very general parallel FFT library written in C.\nP3DFFT implements parallel 3D FFTs using pencil decomposition in Fortran and C++.\n2DECOMP&FFT is another parallel 3D FFT library using pencil decomposition written in Fortran.","category":"page"},{"location":"","page":"Home","title":"Home","text":"[1]: Figure adapted from this PhD thesis.","category":"page"},{"location":"","page":"Home","title":"Home","text":"[2]: For the pencil decomposition represented in the figure, N = 3 and M = 2.","category":"page"},{"location":"PencilFFTs/#Distributed-FFT-plans","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"Distributed FFTs are implemented in the PencilFFTs module, and are built on top of the PencilArrays package.","category":"page"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"PencilFFTs/#Creating-plans","page":"Distributed FFT plans","title":"Creating plans","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"PencilFFTPlan","category":"page"},{"location":"PencilFFTs/#PencilFFTs.PencilFFTPlan","page":"Distributed FFT plans","title":"PencilFFTs.PencilFFTPlan","text":"PencilFFTPlan{T,N} <: AbstractFFTs.Plan{T}\n\nPlan for N-dimensional FFT-based transform on MPI-distributed data, where input data has type T.\n\n\n\nPencilFFTPlan(\n A::PencilArray, transforms;\n fftw_flags = FFTW.ESTIMATE,\n fftw_timelimit = FFTW.NO_TIMELIMIT,\n permute_dims = Val(true),\n transpose_method = Transpositions.PointToPoint(),\n timer = TimerOutput(),\n)\n\nPencilFFTPlan(p::Pencil, transforms; kwargs...)\n\nCreate plan for N-dimensional transform on MPI-distributed PencilArrays.\n\nExtended help\n\nThis creates a PencilFFTPlan for arrays sharing the same properties as A (dimensions, MPI decomposition, memory layout, ...), which describe data on an N-dimensional domain.\n\nAlternatively, the second form creates a PencilFFTPlan for distributed arrays following a given Pencil configuration.\n\nTransforms\n\nThe transforms to be applied along each dimension are specified by the transforms argument. Possible transforms are defined as subtypes of Transforms.AbstractTransform, and are listed in Transform types. This argument may be either:\n\na tuple of N transforms to be applied along each dimension. For instance, transforms = (Transforms.R2R(FFTW.REDFT01), Transforms.RFFT(), Transforms.FFT());\na single transform to be applied along all dimensions. The input is automatically expanded into N equivalent transforms. For instance, for a three-dimensional array, transforms = Transforms.RFFT() specifies a 3D real-to-complex transform, and is equivalent to passing (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT()).\n\nNote that forward transforms are applied from left to right. In the last example, this means that a real-to-complex transform (RFFT) is first performed along the first dimension. This is followed by complex-to-complex transforms (FFT) along the second and third dimensions.\n\nInput data layout\n\nThe input PencilArray must satisfy the following constraints:\n\narray dimensions must not be permuted. This is the default when constructing PencilArrays.\nfor an M-dimensional domain decomposition (with M < N), the input array must be decomposed along the last M dimensions. For example, for a 2D decomposition of 3D data, the decomposed dimensions must be (2, 3). In particular, the first array dimension must not be distributed among different MPI processes.\nIn the PencilArrays package, the decomposed dimensions are specified at the moment of constructing a Pencil.\nthe element type must be compatible with the specified transform. For instance, real-to-complex transforms (Transforms.RFFT) require the input to be real floating point values. Other transforms, such as Transforms.R2R, accept both real and complex data.\n\nKeyword arguments\n\nThe keyword arguments fftw_flags and fftw_timelimit are passed to the FFTW plan creation functions (see AbstractFFTs docs).\npermute_dims determines whether the indices of the output data should be reversed. For instance, if the input data has global dimensions (Nx, Ny, Nz), then the output of a complex-to-complex FFT would have dimensions (Nz, Ny, Nx). This enables FFTs to always be performed along the first (i.e. fastest) array dimension, which could lead to performance gains. This option is enabled by default. For type inference reasons, it must be a value type (Val(true) or Val(false)).\ntranspose_method allows to select between implementations of the global data transpositions. See PencilArrays docs docs for details.\ntimer should be a TimerOutput object. See Measuring performance for details.\n\n\n\nPencilFFTPlan(\n size_global::Dims{N}, transforms, proc_dims::Dims{M}, comm::MPI.Comm,\n [real_type = Float64]; extra_dims = (), kws...\n)\n\nCreate plan for N-dimensional transform.\n\nExtended help\n\nInstead of taking a PencilArray or a Pencil, this constructor requires the global dimensions of the input data, passed via the size_global argument.\n\nThe data is distributed over the MPI processes in the comm communicator. The distribution is performed over M dimensions (with M < N) according to the values in proc_dims, which specifies the number of MPI processes to put along each dimension.\n\nPencilArrays that may be transformed with the returned plan can be created using allocate_input.\n\nOptional arguments\n\nThe floating point precision can be selected by setting real_type parameter, which is Float64 by default.\nextra_dims may be used to specify the sizes of one or more extra dimensions that should not be transformed. These dimensions will be added to the rightmost (i.e. slowest) indices of the arrays. See Extra dimensions below for usage hints.\nsee the other constructor for more keyword arguments.\n\nExtra dimensions\n\nOne possible application of extra_dims is for describing the components of a vector or tensor field. However, this means that different PencilFFTPlans would need to be created for each kind of field (scalar, vector, ...). To avoid the creation of multiple plans, a possibly better alternative is to create tuples (or arrays) of PencilArrays using allocate_input and allocate_output.\n\nAnother more legitimate usage of extra_dims is to specify one or more Cartesian dimensions that should not be transformed nor split among MPI processes.\n\nExample\n\nSuppose we want to perform a 3D FFT of real data. The data is to be decomposed along two dimensions, over 8 MPI processes:\n\nsize_global = (64, 32, 128) # size of real input data\n\n# Perform real-to-complex transform along the first dimension, then\n# complex-to-complex transforms along the other dimensions.\ntransforms = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT())\n# transforms = Transforms.RFFT() # this is equivalent to the above line\n\nproc_dims = (4, 2) # 2D decomposition\ncomm = MPI.COMM_WORLD\n\nplan = PencilFFTPlan(size_global, transforms, proc_dims, comm)\n\n\n\n\n\n","category":"type"},{"location":"PencilFFTs/#Allocating-data","page":"Distributed FFT plans","title":"Allocating data","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"allocate_input\nallocate_output","category":"page"},{"location":"PencilFFTs/#PencilFFTs.allocate_input","page":"Distributed FFT plans","title":"PencilFFTs.allocate_input","text":"allocate_input(p::PencilFFTPlan) -> PencilArray\nallocate_input(p::PencilFFTPlan, dims...) -> Array{PencilArray}\nallocate_input(p::PencilFFTPlan, Val(N)) -> NTuple{N, PencilArray}\n\nAllocate uninitialised PencilArray that can hold input data for the given plan.\n\nThe second and third forms respectively allocate an array of PencilArrays of size dims, and a tuple of N PencilArrays.\n\nnote: In-place plans\nIf p is an in-place plan, a ManyPencilArray is allocated. This type holds PencilArray wrappers for the input and output transforms (as well as for intermediate transforms) which share the same space in memory. The input and output PencilArrays should be respectively accessed by calling first(::ManyPencilArray) and last(::ManyPencilArray).ExampleSuppose p is an in-place PencilFFTPlan. Then,@assert is_inplace(p)\nA = allocate_input(p) :: ManyPencilArray\nv_in = first(A) :: PencilArray # input data view\nv_out = last(A) :: PencilArray # output data viewAlso note that in-place plans must be performed directly on the returned ManyPencilArray, and not on the contained PencilArray views:p * A # perform forward transform in-place\np \\ A # perform backward transform in-place\n# p * v_in # not allowed!!\n\n\n\n\n\n","category":"function"},{"location":"PencilFFTs/#PencilFFTs.allocate_output","page":"Distributed FFT plans","title":"PencilFFTs.allocate_output","text":"allocate_output(p::PencilFFTPlan) -> PencilArray\nallocate_output(p::PencilFFTPlan, dims...) -> Array{PencilArray}\nallocate_output(p::PencilFFTPlan, Val(N)) -> NTuple{N, PencilArray}\n\nAllocate uninitialised PencilArray that can hold output data for the given plan.\n\nIf p is an in-place plan, a ManyPencilArray is allocated.\n\nSee allocate_input for details.\n\n\n\n\n\n","category":"function"},{"location":"PencilFFTs/#Methods","page":"Distributed FFT plans","title":"Methods","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"get_comm(::PencilFFTPlan)\nscale_factor(::PencilFFTPlan)\ntimer(::PencilFFTPlan)\nis_inplace(::PencilFFTPlan)","category":"page"},{"location":"PencilFFTs/#PencilArrays.Pencils.MPITopologies.get_comm-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilArrays.Pencils.MPITopologies.get_comm","text":"get_comm(p::PencilFFTPlan)\n\nGet MPI communicator associated to a PencilFFTPlan.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilFFTs.Transforms.scale_factor-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilFFTs.Transforms.scale_factor","text":"scale_factor(p::PencilFFTPlan)\n\nGet scale factor associated to a PencilFFTPlan.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilArrays.Pencils.timer-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilArrays.Pencils.timer","text":"timer(p::PencilFFTPlan)\n\nGet TimerOutput attached to a PencilFFTPlan.\n\nSee Measuring performance for details.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilFFTs.Transforms.is_inplace-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilFFTs.Transforms.is_inplace","text":"Transforms.is_inplace(p::PencilFFTPlan)\n\nReturns true if the given plan operates in-place on the input data, false otherwise.\n\n\n\n\n\n","category":"method"}]
-}
diff --git a/previews/PR39/siteinfo.js b/previews/PR39/siteinfo.js
deleted file mode 100644
index c08dc8b2..00000000
--- a/previews/PR39/siteinfo.js
+++ /dev/null
@@ -1 +0,0 @@
-var DOCUMENTER_CURRENT_VERSION = "previews/PR39";
diff --git a/previews/PR40/GlobalFFTParams/index.html b/previews/PR40/GlobalFFTParams/index.html
deleted file mode 100644
index 6cb2e710..00000000
--- a/previews/PR40/GlobalFFTParams/index.html
+++ /dev/null
@@ -1,9 +0,0 @@
-
-Global FFT parameters · PencilFFTs.jl
Specifies the global parameters for an N-dimensional distributed transform. These include the element type T and global data sizes of input and output data, as well as the transform types to be performed along each dimension.
transforms must be a tuple of length N specifying the transforms to be applied along each dimension. Each element must be a subtype of Transforms.AbstractTransform. For all the possible transforms, see Transform types.
The element type must be a real type accepted by FFTW, i.e. either Float32 or Float64.
Note that the transforms are applied one dimension at a time, with the leftmost dimension first for forward transforms.
Example
To perform a 3D FFT of real data, first a real-to-complex FFT must be applied along the first dimension, followed by two complex-to-complex FFTs along the other dimensions:
Plan for N-dimensional FFT-based transform on MPI-distributed data, where input data has type T.
PencilFFTPlan(p::Pencil, transforms; kwargs...)
Create a PencilFFTPlan for distributed arrays following a given Pencil configuration. See variant below for details on the specification of transforms and on possible keyword arguments.
Create plan for N-dimensional transform on MPI-distributed PencilArrays.
Extended help
This creates a PencilFFTPlan for arrays sharing the same properties as A (dimensions, MPI decomposition, memory layout, ...), which describe data on an N-dimensional domain.
Transforms
The transforms to be applied along each dimension are specified by the transforms argument. Possible transforms are defined as subtypes of Transforms.AbstractTransform, and are listed in Transform types. This argument may be either:
a tuple of N transforms to be applied along each dimension. For instance, transforms = (Transforms.R2R(FFTW.REDFT01), Transforms.RFFT(), Transforms.FFT());
a single transform to be applied along all dimensions. The input is automatically expanded into N equivalent transforms. For instance, for a three-dimensional array, transforms = Transforms.RFFT() specifies a 3D real-to-complex transform, and is equivalent to passing (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT()).
Note that forward transforms are applied from left to right. In the last example, this means that a real-to-complex transform (RFFT) is first performed along the first dimension. This is followed by complex-to-complex transforms (FFT) along the second and third dimensions.
Input data layout
The input PencilArray must satisfy the following constraints:
array dimensions must not be permuted. This is the default when constructing PencilArrays.
for an M-dimensional domain decomposition (with M < N), the input array must be decomposed along the last M dimensions. For example, for a 2D decomposition of 3D data, the decomposed dimensions must be (2, 3). In particular, the first array dimension must not be distributed among different MPI processes.
In the PencilArrays package, the decomposed dimensions are specified at the moment of constructing a Pencil.
the element type must be compatible with the specified transform. For instance, real-to-complex transforms (Transforms.RFFT) require the input to be real floating point values. Other transforms, such as Transforms.R2R, accept both real and complex data.
Keyword arguments
The keyword arguments fftw_flags and fftw_timelimit are passed to the FFTW plan creation functions (see AbstractFFTs docs).
permute_dims determines whether the indices of the output data should be reversed. For instance, if the input data has global dimensions (Nx, Ny, Nz), then the output of a complex-to-complex FFT would have dimensions (Nz, Ny, Nx). This enables FFTs to always be performed along the first (i.e. fastest) array dimension, which could lead to performance gains. This option is enabled by default. For type inference reasons, it must be a value type (Val(true) or Val(false)).
transpose_method allows to select between implementations of the global data transpositions. See PencilArrays docs docs for details.
Instead of taking a PencilArray or a Pencil, this constructor requires the global dimensions of the input data, passed via the size_global argument.
The data is distributed over the MPI processes in the comm communicator. The distribution is performed over M dimensions (with M < N) according to the values in proc_dims, which specifies the number of MPI processes to put along each dimension.
PencilArrays that may be transformed with the returned plan can be created using allocate_input.
Optional arguments
The floating point precision can be selected by setting real_type parameter, which is Float64 by default.
extra_dims may be used to specify the sizes of one or more extra dimensions that should not be transformed. These dimensions will be added to the rightmost (i.e. slowest) indices of the arrays. See Extra dimensions below for usage hints.
see the other constructor for more keyword arguments.
Extra dimensions
One possible application of extra_dims is for describing the components of a vector or tensor field. However, this means that different PencilFFTPlans would need to be created for each kind of field (scalar, vector, ...). To avoid the creation of multiple plans, a possibly better alternative is to create tuples (or arrays) of PencilArrays using allocate_input and allocate_output.
Another more legitimate usage of extra_dims is to specify one or more Cartesian dimensions that should not be transformed nor split among MPI processes.
Example
Suppose we want to perform a 3D FFT of real data. The data is to be decomposed along two dimensions, over 8 MPI processes:
size_global = (64, 32, 128) # size of real input data
-
-# Perform real-to-complex transform along the first dimension, then
-# complex-to-complex transforms along the other dimensions.
-transforms = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT())
-# transforms = Transforms.RFFT() # this is equivalent to the above line
-
-proc_dims = (4, 2) # 2D decomposition
-comm = MPI.COMM_WORLD
-
-plan = PencilFFTPlan(size_global, transforms, proc_dims, comm)
Allocate uninitialised PencilArray that can hold input data for the given plan.
The second and third forms respectively allocate an array of PencilArrays of size dims, and a tuple of NPencilArrays.
In-place plans
If p is an in-place plan, a ManyPencilArray is allocated. This type holds PencilArray wrappers for the input and output transforms (as well as for intermediate transforms) which share the same space in memory. The input and output PencilArrays should be respectively accessed by calling first(::ManyPencilArray) and last(::ManyPencilArray).
Example
Suppose p is an in-place PencilFFTPlan. Then,
@assert is_inplace(p)
-A = allocate_input(p) :: ManyPencilArray
-v_in = first(A) :: PencilArray # input data view
-v_out = last(A) :: PencilArray # output data view
Also note that in-place plans must be performed directly on the returned ManyPencilArray, and not on the contained PencilArray views:
p * A # perform forward transform in-place
-p \ A # perform backward transform in-place
-# p * v_in # not allowed!!
It is possible to measure the time spent in different sections of the distributed transforms using the TimerOutputs package. This has a (very small) performance overhead, so it is disabled by default. To enable time measurements, call TimerOutputs.enable_debug_timings after loading PencilFFTs (see below for an example). For more details see the TimerOutputs docs.
Minimal example:
using MPI
-using PencilFFTs
-using TimerOutputs
-
-# Enable timing of `PencilFFTs` functions
-TimerOutputs.enable_debug_timings(PencilFFTs)
-TimerOutputs.enable_debug_timings(PencilArrays)
-TimerOutputs.enable_debug_timings(Transpositions)
-
-MPI.Init()
-
-plan = PencilFFTPlan(#= args... =#)
-
-# [do stuff with `plan`...]
-
-# Retrieve and print timing data associated to `plan`
-to = timer(plan)
-print_timer(to)
By default, each PencilFFTPlan has its own TimerOutput. If you already have a TimerOutput, you can pass it to the PencilFFTPlan constructor:
to = TimerOutput()
-plan = PencilFFTPlan(..., timer=to)
-
-# [do stuff with `plan`...]
-
-print_timer(to)
Settings
This document was generated with Documenter.jl version 0.27.16 on Thursday 21 April 2022. Using Julia version 1.7.2.
Like AbstractFFTs.bfft, this transform is not normalised. To obtain the inverse transform, divide the output by the length of the transformed dimension.
To obtain the inverse transform, divide the output by the length of the transformed dimension (of the real output array).
As described in the AbstractFFTs docs, the length of the output cannot be fully inferred from the input length. For this reason, the BRFFT constructor accepts an optional d argument indicating the output length.
For multidimensional datasets, a tuple of dimensions (d1, d2, ..., dN) may also be passed. This is equivalent to passing just dN. In this case, the last dimension (dN) is the one that changes size between the input and output. Note that this is the opposite of FFTW.brfft. The reason is that, in PencilFFTs, the last dimension is the one along which a complex-to-real transform is performed.
plan(transform::AbstractTransform, A, [dims];
- flags=FFTW.ESTIMATE, timelimit=Inf)
Create plan to transform array A along dimensions dims.
If dims is not specified, all dimensions of A are transformed.
For FFT plans, this function wraps the AbstractFFTs.jl and FFTW.jl plan creation functions. For more details on the function arguments, see AbstractFFTs.plan_fft.
Returns the backwards transform associated to the given transform.
The second argument must be the length of the first transformed dimension in the forward transform. It is used in particular when transform = RFFT(), to determine the length of the inverse (complex-to-real) transform. See the AbstractFFTs.irfft docs for details.
The backwards transform returned by this function is not normalised. The normalisation factor for a given array can be obtained by calling scale_factor.
Returns the length of the transform output, given the length of its input.
The input and output lengths are specified in terms of the respective input and output datatypes. For instance, for real-to-complex transforms, these are respectively the length of input real data and of output complex data.
The performance of PencilFFTs.jl is comparable to that of other open-source parallel FFT libraries implemented in lower-level languages. Below, we show comparisons with the Fortran implementation of P3DFFT, possibly the most popular of these libraries. The benchmarks were performed on the Jean–Zay cluster of the IDRIS French computing centre (CNRS).
The figure below shows strong scaling benchmarks of 3D real-to-complex FFTs using 2D ("pencil") decomposition. The benchmarks were run for input arrays of dimensions $N_x × N_y × N_z = 512^3$, $1024^3$ and $2048^3$. Each timing is averaged over 100 repetitions.
-
-
-
As seen above, PencilFFTs generally outperforms P3DFFT in its default setting. This is largely explained by the choice of using non-blocking point-to-point MPI communications (via MPI_Isend and MPI_Irecv), while P3DFFT uses collective MPI_Alltoallv calls. This enables PencilFFTs to perform data reordering operations on the partially received data while waiting for the incoming data, leading to better performance. Moreover, in contrast with P3DFFT, the high performance and scalability of PencilFFTs results from a highly generic code, handling decompositions in arbitrary dimensions and a relatively large (and extensible) variety of transformations.
Note that PencilFFTs can optionally use collective communications (using MPI_Alltoallv) instead of point-to-point communications. For details, see the docs for PencilFFTPlan and for PencilArray transpositions. As seen above, collective communications generally perform worse than point-to-point ones, and runtimes are nearly indistinguishable from those of P3DFFT.
The benchmarks were performed using Julia 1.7-beta3 and Intel MPI 2019. We used PencilFFTs v0.12.5 with FFTW.jl v1.4.3 and MPI.jl v0.19.0. We used the Fortran implementation of P3DFFT, version 2.7.6, which was built with Intel 2019 compilers and linked to FFTW 3.3.8. The cluster where the benchmarks were run has Intel Cascade Lake 6248 processors with 2×20 cores per node.
The number of MPI processes along each decomposed dimension, $P_1$ and $P_2$, was automatically determined by a call to MPI_Dims_create, which tends to create a balanced decomposition with $P_1 ≈ P_2$. For instance, a total of 1024 processes is divided into $P_1 = P_2 = 32$. Different results may be obtained with other combinations, but this was not benchmarked.
The source files used to generate this benchmark, as well as the raw benchmark results, are all available in the PencilFFTs repo.
Settings
This document was generated with Documenter.jl version 0.27.16 on Thursday 21 April 2022. Using Julia version 1.7.2.
This example shows different methods to compute the gradient of a real-valued 3D scalar field $θ(\bm{x})$ in Fourier space, where $\bm{x} = (x, y, z)$. It is assumed that the field is periodic with period $L = 2π$ along all dimensions.
where $\bm{k} = (k_x, k_y, k_z)$ are the Fourier wave numbers and $\hat{θ}$ is the discrete Fourier transform of $θ$. Then, the spatial derivatives of $θ$ are given by
In this section, we initialise a random real-valued scalar field $θ$ and compute its FFT. For more details see the Tutorial.
using MPI
-using PencilFFTs
-using Random
-
-MPI.Init()
-
-# Input data dimensions (Nx × Ny × Nz)
-dims = (64, 32, 64)
-
-# Apply a 3D real-to-complex (r2c) FFT.
-transform = Transforms.RFFT()
-
-# Automatically create decomposition configuration
-comm = MPI.COMM_WORLD
-pen = Pencil(dims, comm)
-
-# Create plan
-plan = PencilFFTPlan(pen, transform)
-
-# Allocate data and initialise field
-θ = allocate_input(plan)
-randn!(θ)
-
-# Perform distributed FFT
-θ_hat = plan * θ
Finally, we initialise the output that will hold ∇θ in Fourier space. Noting that ∇θ is a vector field, we choose to store it as a tuple of 3 PencilArrays.
In general, the Fourier wave numbers are of the form $k_i = 0, ±\frac{2π}{L_i}, ±\frac{4π}{L_i}, ±\frac{6π}{L_i}, …$, where $L_i$ is the period along dimension $i$. When a real-to-complex Fourier transform is applied, roughly half of these wave numbers are redundant due to the Hermitian symmetry of the complex Fourier coefficients. In practice, this means that for the fastest dimension $x$ (along which a real-to-complex transform is performed), the negative wave numbers are dropped, i.e. $k_x = 0, \frac{2π}{L_x}, \frac{4π}{L_x}, …$.
The AbstractFFTs package provides a convenient way to generate the Fourier wave numbers, using the functions fftfreq and rfftfreq. We can use these functions to initialise a "grid" of wave numbers associated to our 3D real-to-complex transform:
using AbstractFFTs: fftfreq, rfftfreq
-
-box_size = (2π, 2π, 2π) # Lx, Ly, Lz
-sample_rate = 2π .* dims ./ box_size
-
-# In our case (Lx = 2π and Nx even), this gives kx = [0, 1, 2, ..., Nx/2].
-kx = rfftfreq(dims[1], sample_rate[1])
-
-# In our case (Ly = 2π and Ny even), this gives
-# ky = [0, 1, 2, ..., Ny/2-1, -Ny/2, -Ny/2+1, ..., -1] (and similarly for kz).
-ky = fftfreq(dims[2], sample_rate[2])
-kz = fftfreq(dims[3], sample_rate[3])
-
-kvec = (kx, ky, kz)
Note that kvec now contains the wave numbers associated to the global domain. In the following, we will only need the wave numbers associated to the portion of the domain handled by the local MPI process.
PencilArrays, returned for instance by allocate_input and allocate_output, take indices that start at 1, regardless of the location of the subdomain associated to the local process on the global grid. (In other words, PencilArrays take local indices.) On the other hand, we have defined the wave number vector kvec which, for each MPI process, is defined over the global domain, and as such it takes global indices.
One straightforward way of making data arrays compatible with wave numbers is to use global views, i.e. arrays that take global indices. These are generated from PencilArrays by calling the global_view function. Note that, in general, global indices do not start at 1 for a given MPI process. A given process will own a range of data given by indices in (i1:i2, j1:j2, k1:k2).
"16×32×64 OffsetArray(::PencilArray{ComplexF64, 3}, 1:16, 1:32, 1:64) with eltype ComplexF64 with indices 1:16×1:32×1:64"
Once we have global views, we can combine data and wave numbers using the portion of global indices owned by the local MPI process, as shown below. We can use CartesianIndices to iterate over the global indices associated to the local process.
for I in CartesianIndices(θ_glob)
- i, j, k = Tuple(I) # unpack indices
-
- # Wave number vector associated to current Cartesian index.
- kx = kvec[1][i]
- ky = kvec[2][j]
- kz = kvec[3][k]
-
- # Compute gradient in Fourier space.
- # Note that modifying ∇θ_glob also modifies the original PencilArray ∇θ_hat.
- ∇θ_glob[1][I] = im * kx * θ_glob[I]
- ∇θ_glob[2][I] = im * ky * θ_glob[I]
- ∇θ_glob[3][I] = im * kz * θ_glob[I]
-end
The above loop can be written in a slightly more efficient manner by precomputing im * θ_glob[I]:
@inbounds for I in CartesianIndices(θ_glob)
- i, j, k = Tuple(I)
-
- kx = kvec[1][i]
- ky = kvec[2][j]
- kz = kvec[3][k]
-
- u = im * θ_glob[I]
-
- ∇θ_glob[1][I] = kx * u
- ∇θ_glob[2][I] = ky * u
- ∇θ_glob[3][I] = kz * u
-end
Also note that the above can be easily written in a more generic way, e.g. for arbitrary dimensions, thanks in part to the use of CartesianIndices. Moreover, in the above there is no notion of the dimension permutations discussed in the tutorial, as it is all hidden behind the implementation of PencilArrays. And as seen later in the benchmarks, these (hidden) permutations have zero cost, as the speed is identical to that of a function that explicitly takes into account these permutations.
Finally, we can perform a backwards transform to obtain $\bm{∇} θ$ in physical space:
∇θ = plan \ ∇θ_hat;
Note that the transform is automatically broadcast over the three fields of the ∇θ_hat vector, and the result ∇θ is also a tuple of three PencilArrays.
Sometimes, one does not need to write generic code. In our case, one often knows the dimensionality of the problem and the memory layout of the data (i.e. the underlying index permutation).
Below is a reimplementation of the above loop, using explicit indices instead of CartesianIndices, and assuming that the underlying index permutation is (3, 2, 1), that is, data is stored in $(z, y, x)$ order. As discussed in the tutorial, this is the default for transformed arrays. This example also serves as a more explicit explanation for what is going on in the first method.
# Get local data range in the global grid.
-rng = axes(θ_glob) # = (i1:i2, j1:j2, k1:k2)
For the loop below, we're assuming that the permutation is (3, 2, 1). In other words, the fastest index is the last one, and not the first one as it is usually in Julia. If the permutation is not (3, 2, 1), things will still work (well, except for the assertion below!), but the loop order will not be optimal.
@assert permutation(θ_hat) === Permutation(3, 2, 1)
-
-@inbounds for i in rng[1], j in rng[2], k in rng[3]
- kx = kvec[1][i]
- ky = kvec[2][j]
- kz = kvec[3][k]
-
- # Note that we still access the arrays in (i, j, k) order.
- # (The permutation happens behind the scenes!)
- u = im * θ_glob[i, j, k]
-
- ∇θ_glob[1][i, j, k] = kx * u
- ∇θ_glob[2][i, j, k] = ky * u
- ∇θ_glob[3][i, j, k] = kz * u
-end
Alternatively, we can avoid global views and work directly on PencilArrays using local indices that start at 1. In this case, part of the strategy is to construct a "local" grid of wave numbers that can also be accessed with local indices. This can be conveniently done using the localgrid function of the PencilArrays.jl package, which accepts a PencilArray (or its associated Pencil) and the global coordinates (here kvec):
Note that one can directly iterate on the returned grid object:
@inbounds for I in CartesianIndices(grid_fourier)
- # Wave number vector associated to current Cartesian index.
- k⃗ = grid_fourier[I]
- u = im * θ_hat[I]
- ∇θ_hat[1][I] = k⃗[1] * u
- ∇θ_hat[2][I] = k⃗[2] * u
- ∇θ_hat[3][I] = k⃗[3] * u
-end
This implementation is as efficient as the other examples, while being slightly shorter to write. Moreover, it is quite generic, and can be made independent of the number of dimensions with little effort.
Finally, note that the local grid object returned by localgrid makes it is possible to compute the gradient using broadcasting, thus fully avoiding scalar indexing. This can be quite convenient in some cases, and can also be very useful if one is working on GPUs (where scalar indexing is prohibitively expensive). Using broadcasting, the above examples simply become:
@. ∇θ_hat[1] = im * grid_fourier[1] * θ_hat
-@. ∇θ_hat[2] = im * grid_fourier[2] * θ_hat
-@. ∇θ_hat[3] = im * grid_fourier[3] * θ_hat
Once again, as shown in the benchmarks further below, this method performs quite similarly to the other ones.
The PencilArrays module provides different alternatives to deal with MPI-distributed data that may be subject to dimension permutations. In particular, one can choose to work with global indices (first two examples), with local indices (third example), or to avoid scalar indexing altogether (fourth example).
If one wants to stay generic, making sure that the same code will work for arbitrary dimensions and will be efficient regardless of the underlying dimension permutation, methods 1, 3 or 4 should be preferred. These use CartesianIndices and make no assumptions on possible dimension permutations, which are by default enabled in the output of PencilFFTs transforms. In fact, such permutations are completely invisible in the implementations.
The second method uses explicit (i, j, k) indices. It assumes that the underlying permutation is (3, 2, 1) to loop with i as the slowest index and k as the fastest, which is the optimal order in this case given the permutation. As such, the implementation is less generic than the others, and differences in performance are negligible with respect to more generic variants.
The following are the benchmark results obtained from running examples/gradient.jl on a laptop, using 2 MPI processes and Julia 1.7.2, with an input array of global dimensions $64 × 32 × 64$. The different methods detailed above are marked on the right. The "lazy" marks indicate runs where the wave numbers were represented by lazy Frequencies objects (returned by rfftfreq and fftfreq). Otherwise, they were collected into Vectors. For some reason, plain Vectors are faster when working with grids generated by localgrid.
In the script, additional implementations can be found which rely on a more advanced understanding of permutations and on the internals of the PencilArrays package. For instance, gradient_local_parent! directly works with the raw data stored in Julia Arrays, while gradient_local_linear! completely avoids CartesianIndices while staying generic and efficient. Nevertheless, these display roughly the same performance as the above examples.
where $\bm{v}(\bm{x}, t)$ and $p(\bm{x}, t)$ are respectively the velocity and pressure fields, $ν$ is the fluid kinematic viscosity and $ρ$ is the fluid density.
We solve the above equations a 3D periodic domain using a standard Fourier pseudo-spectral method.
Let's check the number of MPI processes over which we're running our simulation:
MPI.Comm_size(comm)
2
We can now create a partitioning of the domain based on the number of grid points (Ns) and on the number of MPI processes. There are different ways to do this. For simplicity, here we do it automatically following the PencilArrays.jl docs:
pen = Pencil(Ns, comm)
Decomposition of 3D data
- Data dimensions: (64, 64, 64)
- Decomposed dimensions: (2, 3)
- Data permutation: NoPermutation()
- Array type: Array
The subdomain associated to the local MPI process can be obtained using range_local:
range_local(pen)
(1:64, 1:32, 1:64)
We now construct a distributed vector field that follows the decomposition configuration we just created:
where $u₀$ and $k₀$ are two parameters setting the amplitude and the period of the velocity field.
To set the initial condition, each MPI process needs to know which portion of the physical grid it has been attributed. For this, PencilArrays.jl includes a localgrid helper function:
where $\bm{k} = (k_x, k_y, k_z)$ are the discrete wave numbers.
The wave numbers can be obtained using the fftfreq function. Since we perform a real-to-complex transform along the first dimension, we use rfftfreq instead for $k_x$:
Note that, in Fourier space, the domain decomposition is performed along the directions $x$ and $y$:
pencil(v̂s[1])
Decomposition of 3D data
- Data dimensions: (33, 64, 64)
- Decomposed dimensions: (1, 2)
- Data permutation: Permutation(3, 2, 1)
- Array type: Array
This is because the 3D FFTs are performed one dimension at a time, with the $x$ direction first and the $z$ direction last. To efficiently perform an FFT along a given direction (taking advantage of serial FFT implementations like FFTW), all the data along that direction must be contained locally within a single MPI process. For that reason, data redistributions (or transpositions) among MPI processes are performed behind the scenes during each FFT computation. Such transpositions require important communications between MPI processes, and are usually the most time-consuming aspect of massively-parallel simulations using this kind of methods.
To solve the Navier–Stokes equations in Fourier space, we will also need the respective wave numbers $\bm{k}$ associated to the local MPI process. Similarly to the local grid points, these are obtained using the localgrid function:
As an example, let's first use this to compute and plot the vorticity associated to the initial condition. The vorticity is defined as the curl of the velocity, $\bm{ω} = \bm{∇} × \bm{v}$. In Fourier space, this becomes $\hat{\bm{ω}} = i \bm{k} × \hat{\bm{v}}$.
using StaticArrays: SVector
-using LinearAlgebra: ×
-
-function curl_fourier!(
- ω̂s::NTuple{N, <:PencilArray}, v̂s::NTuple{N, <:PencilArray}, grid_fourier,
- ) where {N}
- @inbounds for I ∈ eachindex(grid_fourier)
- # We use StaticArrays for the cross product between small vectors.
- ik⃗ = im * SVector(grid_fourier[I])
- v⃗ = SVector(getindex.(v̂s, Ref(I))) # = (v̂s[1][I], v̂s[2][I], ...)
- ω⃗ = ik⃗ × v⃗
- for n ∈ eachindex(ω⃗)
- ω̂s[n][I] = ω⃗[n]
- end
- end
- ω̂s
-end
-
-ω̂s = similar.(v̂s)
-curl_fourier!(ω̂s, v̂s, grid_fourier);
We finally transform back to physical space and plot the result:
where $\mathcal{P}_{\bm{k}}$ is a projection operator allowing to preserve the incompressibility condition $\bm{∇} ⋅ \bm{v} = 0$. This operator encodes the action of the pressure gradient term, which serves precisely to enforce incompressibility. Note that, because of this, the pressure gradient dissapears from the equations.
Now that we have the wave numbers $\bm{k}$, computing the linear viscous term in Fourier space is straighforward once we have the Fourier coefficients $\hat{\bm{v}}_{\bm{k}}$ of the velocity field. What is slightly more challenging (and much more costly) is the computation of the non-linear term in Fourier space, $\hat{\bm{F}}_{\bm{k}} = \left[ \widehat{(\bm{v} ⋅ \bm{∇}) \bm{v}} \right]_{\bm{k}}$. In the pseudo-spectral method, the quadratic nonlinearity is computed by collocation in physical space (i.e. this term is evaluated at grid points), while derivatives are computed in Fourier space. This requires transforming fields back and forth between both spaces.
Below we implement a function that computes the non-linear term in Fourier space based on its convective form $(\bm{v} ⋅ \bm{∇}) \bm{v} = \bm{∇} ⋅ (\bm{v} ⊗ \bm{v})$. Note that this equivalence uses the incompressibility condition $\bm{∇} ⋅ \bm{v} = 0$.
using LinearAlgebra: mul!, ldiv! # for applying FFT plans in-place
-
-# Compute non-linear term in Fourier space from velocity field in physical
-# space. Optional keyword arguments may be passed to avoid allocations.
-function ns_nonlinear!(
- F̂s, vs, plan, grid_fourier;
- vbuf = similar(vs[1]), v̂buf = similar(F̂s[1]),
- )
- # Compute F_i = ∂_j (v_i v_j) for each i.
- # In Fourier space: F̂_i = im * k_j * FFT(v_i * v_j)
- w, ŵ = vbuf, v̂buf
- @inbounds for (i, F̂i) ∈ enumerate(F̂s)
- F̂i .= 0
- vi = vs[i]
- for (j, vj) ∈ enumerate(vs)
- w .= vi .* vj # w = v_i * v_j in physical space
- mul!(ŵ, plan, w) # same in Fourier space
- # Add derivative in Fourier space
- for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I] # = (kx, ky, kz)
- kj = k⃗[j]
- F̂i[I] += im * kj * ŵ[I]
- end
- end
- end
- F̂s
-end
ns_nonlinear! (generic function with 1 method)
As an example, let's use this function on our initial velocity field:
Strictly speaking, computing the non-linear term by collocation can lead to aliasing errors, as the quadratic term excites Fourier modes that fall beyond the range of resolved wave numbers. The typical solution is to apply Orzsag's 2/3 rule to zero-out the Fourier coefficients associated to the highest wave numbers. We define a function that applies this procedure below.
function dealias_twothirds!(ŵs::Tuple, grid_fourier, ks_global)
- ks_max = maximum.(abs, ks_global) # maximum stored wave numbers (kx_max, ky_max, kz_max)
- ks_lim = (2 / 3) .* ks_max
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I]
- if any(abs.(k⃗) .> ks_lim)
- for ŵ ∈ ŵs
- ŵ[I] = 0
- end
- end
- end
- ŵs
-end
-
-# We can apply this on the previously computed non-linear term:
-dealias_twothirds!(F̂s, grid_fourier, ks_global);
Finally, we implement the projection associated to the incompressibility condition:
function project_divergence_free!(ûs, grid_fourier)
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I]
- k² = sum(abs2, k⃗)
- iszero(k²) && continue # avoid division by zero
- û = getindex.(ûs, Ref(I)) # (ûs[1][I], ûs[2][I], ...)
- for i ∈ eachindex(û)
- ŵ = û[i]
- for j ∈ eachindex(û)
- ŵ -= k⃗[i] * k⃗[j] * û[j] / k²
- end
- ûs[i][I] = ŵ
- end
- end
- ûs
-end
project_divergence_free! (generic function with 1 method)
We can verify the correctness of the projection operator by checking that the initial velocity field is not modified by it, since it is already incompressible:
v̂s_proj = project_divergence_free!(copy.(v̂s), grid_fourier)
-v̂s_proj .≈ v̂s # the last one may be false because v_z = 0 initially
To perform the time integration of the Navier–Stokes equations, we will use the timestepping routines implemented in the DifferentialEquations.jl suite. For simplicity, we use here an explicit Runge–Kutta scheme. In this case, we just need to write a function that computes the right-hand side of the Navier–Stokes equations in Fourier space:
function ns_rhs!(
- dvs::NTuple{N, <:PencilArray}, vs::NTuple{N, <:PencilArray}, p, t,
- ) where {N}
- # 1. Compute non-linear term and dealias it
- (; plan, cache, ks_global, grid_fourier) = p
- F̂s = cache.F̂s
- ns_nonlinear!(F̂s, vs, plan, grid_fourier; vbuf = dvs[1], v̂buf = cache.v̂s[1])
- dealias_twothirds!(F̂s, grid_fourier, ks_global)
-
- # 2. Project onto divergence-free space
- project_divergence_free!(F̂s, grid_fourier)
-
- # 3. Transform velocity to Fourier space
- v̂s = cache.v̂s
- map((v, v̂) -> mul!(v̂, plan, v), vs, v̂s)
-
- # 4. Add viscous term (and multiply projected non-linear term by -1)
- ν = p.ν
- for n ∈ eachindex(v̂s)
- v̂ = v̂s[n]
- F̂ = F̂s[n]
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I] # = (kx, ky, kz)
- k² = sum(abs2, k⃗)
- F̂[I] = -F̂[I] - ν * k² * v̂[I]
- end
- end
-
- # 5. Transform RHS back to physical space
- map((dv, dv̂) -> ldiv!(dv, plan, dv̂), dvs, F̂s)
-
- nothing
-end
ns_rhs! (generic function with 1 method)
For the time-stepping, we load OrdinaryDiffEq.jl from the DifferentialEquations.jl suite and set-up the simulation. Since DifferentialEquations.jl can't directly deal with tuples of arrays, we convert the input data to the ArrayPartition type and write an interface function to make things work with our functions defined above.
We finally solve the problem over time and plot the vorticity associated to the solution. It is also useful to look at the energy spectrum $E(k)$, to see if the small scales are correctly resolved. To obtain a turbulent flow, the viscosity $ν$ must be small enough to allow the transient appearance of an energy cascade towards the small scales (i.e. from small to large $k$), while high enough to allow the small-scale motions to be correctly resolved.
This package provides multidimensional FFTs and related transforms on MPI-distributed Julia arrays via the PencilArrays package.
The name of this package originates from the decomposition of 3D domains along two out of three dimensions, sometimes called pencil decomposition. This is illustrated by the figure below,[1] where each coloured block is managed by a different MPI process. Typically, one wants to compute FFTs on a scalar or vector field along the three spatial dimensions. In the case of a pencil decomposition, 3D FFTs are performed one dimension at a time, along the non-decomposed direction. Transforms must then be interleaved with global data transpositions to switch between pencil configurations. In high-performance computing environments, such data transpositions are generally the most expensive part of a parallel FFT computation, due to the large cost of communications between computing nodes.
-
-
More generally, PencilFFTs allows to decompose and perform FFTs on geometries of arbitrary dimension $N$. The decompositions can be performed along an arbitrary number $M < N$ of dimensions.[2] Moreover, the transforms applied along each dimension can be arbitrarily chosen (and combined) among those supported by FFTW.jl, including complex-to-complex, real-to-complex and real-to-real transforms.
The generic and efficient implementation of this package is greatly enabled by the use of zero-cost abstractions in Julia. As shown in the Benchmarks section, PencilFFTs scales well to large numbers of processes, and performs similarly to the Fortran implementation of P3DFFT, possibly the most popular library for computing parallel FFTs using 2D domain decomposition.
FFTW3 implements distributed-memory transforms using MPI, but these are limited to 1D decompositions. Also, this functionality is not currently included in the FFTW.jl wrappers.
PFFT is a very general parallel FFT library written in C.
P3DFFT implements parallel 3D FFTs using pencil decomposition in Fortran and C++.
2DECOMP&FFT is another parallel 3D FFT library using pencil decomposition written in Fortran.
This document was generated with Documenter.jl version 0.27.16 on Thursday 21 April 2022. Using Julia version 1.7.2.
diff --git a/previews/PR40/search_index.js b/previews/PR40/search_index.js
deleted file mode 100644
index 4146bd68..00000000
--- a/previews/PR40/search_index.js
+++ /dev/null
@@ -1,3 +0,0 @@
-var documenterSearchIndex = {"docs":
-[{"location":"tutorial/#Tutorial","page":"Tutorial","title":"Tutorial","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The following tutorial shows how to perform a 3D FFT of real periodic data defined on a grid of N_x N_y N_z points.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"
\n \n \n
","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"By default, the domain is distributed on a 2D MPI topology of dimensions N_1 N_2. As an example, the above figure shows such a topology with N_1 = 4 and N_2 = 3, for a total of 12 MPI processes.","category":"page"},{"location":"tutorial/#tutorial:creating_plans","page":"Tutorial","title":"Creating plans","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The first thing to do is to create a domain decomposition configuration for the given dataset dimensions N_x N_y N_z. In the framework of PencilArrays, such a configuration is described by a Pencil object. As described in the PencilArrays docs, we can let the Pencil constructor automatically determine such a configuration. For this, only an MPI communicator and the dataset dimensions are needed:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"using MPI\nusing PencilFFTs\n\nMPI.Init()\ncomm = MPI.COMM_WORLD\n\n# Input data dimensions (Nx × Ny × Nz)\ndims = (16, 32, 64)\npen = Pencil(dims, comm)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"By default this creates a 2D decomposition (for the case of a 3D dataset), but one can change this as detailed in the PencilArrays documentation linked above.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"We can now create a PencilFFTPlan, which requires information on decomposition configuration (the Pencil object) and on the transforms that will be applied:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# Apply a 3D real-to-complex (r2c) FFT.\ntransform = Transforms.RFFT()\n\n# Note that, for more control, one can instead separately specify the transforms along each dimension:\n# transform = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT())\n\n# Create plan\nplan = PencilFFTPlan(pen, transform)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"See the PencilFFTPlan constructor for details on the accepted options, and the Transforms module for the possible transforms. It is also possible to enable fine-grained performance measurements via the TimerOutputs package, as described in Measuring performance.","category":"page"},{"location":"tutorial/#Allocating-data","page":"Tutorial","title":"Allocating data","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Next, we want to apply the plan on some data. Transforms may only be applied on PencilArrays, which are array wrappers that include MPI decomposition information (in some sense, analogous to DistributedArrays in Julia's distributed computing approach). The helper function allocate_input can be used to allocate a PencilArray that is compatible with our plan:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# In our example, this returns a 3D PencilArray of real data (Float64).\nu = allocate_input(plan)\n\n# Fill the array with some (random) data\nusing Random\nrandn!(u)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"PencilArrays are a subtype of AbstractArray, and thus they support all common array operations.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Similarly, to preallocate output data, one can use allocate_output:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# In our example, this returns a 3D PencilArray of complex data (Complex{Float64}).\nv = allocate_output(plan)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"This is only required if one wants to apply the plans using a preallocated output (with mul!, see right below).","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The data types returned by allocate_input and allocate_output are slightly different when working with in-place transforms. See the in-place example for details.","category":"page"},{"location":"tutorial/#Applying-plans","page":"Tutorial","title":"Applying plans","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The interface to apply plans is consistent with that of AbstractFFTs. Namely, * and mul! are respectively used for forward transforms without and with preallocated output data. Similarly, \\ and ldiv! are used for backward transforms.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"using LinearAlgebra # for mul!, ldiv!\n\n# Apply plan on `u` with `v` as an output\nmul!(v, plan, u)\n\n# Apply backward plan on `v` with `w` as an output\nw = similar(u)\nldiv!(w, plan, v) # now w ≈ u","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Note that, consistently with AbstractFFTs, normalisation is performed at the end of a backward transform, so that the original data is recovered when applying a forward followed by a backward transform.","category":"page"},{"location":"tutorial/#Accessing-and-modifying-data","page":"Tutorial","title":"Accessing and modifying data","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For any given MPI process, a PencilArray holds the data associated to its local partition in the global geometry. PencilArrays are accessed using local indices that start at 1, regardless of the location of the local process in the MPI topology. Note that PencilArrays, being based on regular Arrays, support both linear and Cartesian indexing (see the Julia docs for details).","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For convenience, the global_view function can be used to generate an OffsetArray wrapper that takes global indices.","category":"page"},{"location":"tutorial/#tutorial:output_data_layout","page":"Tutorial","title":"Output data layout","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"In memory, the dimensions of the transform output are by default reversed with respect to the input. That is, if the order of indices in the input data is (x, y, z), then the output has order (z, y, x) in memory. This detail is hidden from the user, and output arrays are always accessed in the same order as the input data, regardless of the underlying output dimension permutation. This applies to PencilArrays and to OffsetArrays returned by global_view.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The reasoning behind dimension permutations, is that they allow to always perform FFTs along the fastest array dimension and to avoid a local data transposition, resulting in performance gains. A similar approach is followed by other parallel FFT libraries. FFTW itself, in its distributed-memory routines, includes a flag that enables a similar behaviour. In PencilFFTs, index permutation is the default, but it can be disabled via the permute_dims flag of PencilFFTPlan.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"A great deal of work has been spent in making generic index permutations as efficient as possible, both in intermediate and in the output state of the multidimensional transforms. This has been achieved, in part, by making sure that permutations such as (3, 2, 1) are compile-time constants.","category":"page"},{"location":"tutorial/#Further-reading","page":"Tutorial","title":"Further reading","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For details on working with PencilArrays see the PencilArrays docs.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The examples on the sidebar further illustrate the use of transforms and provide an introduction to working with MPI-distributed data in the form of PencilArrays. In particular, the gradient example illustrates different ways of computing things using Fourier-transformed distributed arrays. Then, the incompressible Navier–Stokes example is a more advanced and complete example of a possible application of the PencilFFTs package.","category":"page"},{"location":"benchmarks/#Benchmarks","page":"Benchmarks","title":"Benchmarks","text":"","category":"section"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The performance of PencilFFTs.jl is comparable to that of other open-source parallel FFT libraries implemented in lower-level languages. Below, we show comparisons with the Fortran implementation of P3DFFT, possibly the most popular of these libraries. The benchmarks were performed on the Jean–Zay cluster of the IDRIS French computing centre (CNRS).","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The figure below shows strong scaling benchmarks of 3D real-to-complex FFTs using 2D (\"pencil\") decomposition. The benchmarks were run for input arrays of dimensions N_x N_y N_z = 512^3, 1024^3 and 2048^3. Each timing is averaged over 100 repetitions.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"
\n \n \n
","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"As seen above, PencilFFTs generally outperforms P3DFFT in its default setting. This is largely explained by the choice of using non-blocking point-to-point MPI communications (via MPI_Isend and MPI_Irecv), while P3DFFT uses collective MPI_Alltoallv calls. This enables PencilFFTs to perform data reordering operations on the partially received data while waiting for the incoming data, leading to better performance. Moreover, in contrast with P3DFFT, the high performance and scalability of PencilFFTs results from a highly generic code, handling decompositions in arbitrary dimensions and a relatively large (and extensible) variety of transformations.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"Note that PencilFFTs can optionally use collective communications (using MPI_Alltoallv) instead of point-to-point communications. For details, see the docs for PencilFFTPlan and for PencilArray transpositions. As seen above, collective communications generally perform worse than point-to-point ones, and runtimes are nearly indistinguishable from those of P3DFFT.","category":"page"},{"location":"benchmarks/#Benchmark-details","page":"Benchmarks","title":"Benchmark details","text":"","category":"section"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The benchmarks were performed using Julia 1.7-beta3 and Intel MPI 2019. We used PencilFFTs v0.12.5 with FFTW.jl v1.4.3 and MPI.jl v0.19.0. We used the Fortran implementation of P3DFFT, version 2.7.6, which was built with Intel 2019 compilers and linked to FFTW 3.3.8. The cluster where the benchmarks were run has Intel Cascade Lake 6248 processors with 2×20 cores per node.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The number of MPI processes along each decomposed dimension, P_1 and P_2, was automatically determined by a call to MPI_Dims_create, which tends to create a balanced decomposition with P_1 P_2. For instance, a total of 1024 processes is divided into P_1 = P_2 = 32. Different results may be obtained with other combinations, but this was not benchmarked.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The source files used to generate this benchmark, as well as the raw benchmark results, are all available in the PencilFFTs repo.","category":"page"},{"location":"GlobalFFTParams/#Global-FFT-parameters","page":"Global FFT parameters","title":"Global FFT parameters","text":"","category":"section"},{"location":"GlobalFFTParams/","page":"Global FFT parameters","title":"Global FFT parameters","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"GlobalFFTParams/","page":"Global FFT parameters","title":"Global FFT parameters","text":"GlobalFFTParams","category":"page"},{"location":"GlobalFFTParams/#PencilFFTs.GlobalFFTParams","page":"Global FFT parameters","title":"PencilFFTs.GlobalFFTParams","text":"GlobalFFTParams{T, N, inplace}\n\nSpecifies the global parameters for an N-dimensional distributed transform. These include the element type T and global data sizes of input and output data, as well as the transform types to be performed along each dimension.\n\n\n\nGlobalFFTParams(size_global, transforms, [real_type=Float64])\n\nDefine parameters for N-dimensional transform.\n\ntransforms must be a tuple of length N specifying the transforms to be applied along each dimension. Each element must be a subtype of Transforms.AbstractTransform. For all the possible transforms, see Transform types.\n\nThe element type must be a real type accepted by FFTW, i.e. either Float32 or Float64.\n\nNote that the transforms are applied one dimension at a time, with the leftmost dimension first for forward transforms.\n\nExample\n\nTo perform a 3D FFT of real data, first a real-to-complex FFT must be applied along the first dimension, followed by two complex-to-complex FFTs along the other dimensions:\n\njulia> size_global = (64, 32, 128); # size of real input data\n\njulia> transforms = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT());\n\njulia> fft_params = PencilFFTs.GlobalFFTParams(size_global, transforms)\nTransforms: (RFFT, FFT, FFT)\nInput type: Float64\nGlobal dimensions: (64, 32, 128) -> (33, 32, 128)\n\n\n\n\n\n","category":"type"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/in-place.jl\"","category":"page"},{"location":"generated/in-place/#In-place-transforms","page":"In-place transforms","title":"In-place transforms","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Complex-to-complex and real-to-real transforms can be performed in-place, enabling important memory savings. The procedure is very similar to that of out-of-place transforms described in the tutorial. The differences are illustrated in the sections below.","category":"page"},{"location":"generated/in-place/#Creating-a-domain-partition","page":"In-place transforms","title":"Creating a domain partition","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"We start by partitioning a domain of dimensions 163264 along all available MPI processes.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"using PencilFFTs\nusing MPI\nMPI.Init()\n\ndims_global = (16, 32, 64) # global dimensions","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Such a partitioning is described by a Pencil object. Here we choose to decompose the domain along the last two dimensions. In this case, the actual number of processes along each of these dimensions is chosen automatically.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"decomp_dims = (2, 3)\ncomm = MPI.COMM_WORLD\npen = Pencil(dims_global, decomp_dims, comm)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"warning: Allowed decompositions\nDistributed transforms using PencilFFTs.jl require that the first dimension is not decomposed. In other words, if one wants to perform transforms, then decomp_dims above must not contain 1.","category":"page"},{"location":"generated/in-place/#Creating-in-place-plans","page":"In-place transforms","title":"Creating in-place plans","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"To create an in-place plan, pass an in-place transform such as Transforms.FFT! or Transforms.R2R! to PencilFFTPlan. For instance:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"# Perform a 3D in-place complex-to-complex FFT.\ntransform = Transforms.FFT!()\n\n# Note that one can also combine different types of in-place transforms.\n# For instance:\n# transform = (\n# Transforms.R2R!(FFTW.REDFT01),\n# Transforms.FFT!(),\n# Transforms.R2R!(FFTW.DHT),\n# )","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"We can now create a distributed plan from the previously-created domain partition and the chosen transform.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan = PencilFFTPlan(pen, transform)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Note that in-place real-to-complex transforms are not currently supported. (In other words, the RFFT! transform type is not defined.)","category":"page"},{"location":"generated/in-place/#Allocating-data","page":"In-place transforms","title":"Allocating data","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"As with out-of-place plans, data should be allocated using allocate_input. The difference is that, for in-place plans, this function returns a ManyPencilArray object, which is a container holding multiple PencilArray views sharing the same memory space.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"# Allocate data for the plan.\n# Since `plan` is in-place, this returns a `ManyPencilArray` container.\nA = allocate_input(plan)\nsummary(A)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Note that allocate_output also works for in-place plans. In this case, it returns exactly the same thing as allocate_input.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"As shown in the next section, in-place plans must be applied on the returned ManyPencilArray. On the other hand, one usually wants to access and modify data, and for this one needs the PencilArray views contained in the ManyPencilArray. The input and output array views can be obtained by calling first(::ManyPencilArray) and last(::ManyPencilArray).","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"For instance, we can initialise the input array with some data before transforming:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"using Random\nu_in = first(A) # input data view\nrandn!(u_in)\nsummary(u_in)","category":"page"},{"location":"generated/in-place/#Applying-plans","page":"In-place transforms","title":"Applying plans","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Like in FFTW.jl, one can perform in-place transforms using the * and \\ operators. As mentioned above, in-place plans must be applied on the ManyPencilArray containers returned by allocate_input.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan * A; # performs in-place forward transform\nnothing #hide","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"After performing an in-place transform, data contained in u_in has been overwritten and has no \"physical\" meaning. In other words, u_in should not be used at this point. To access the transformed data, one should retrieve the output data view using last(A).","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"For instance, to compute the global sum of the transformed data:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"u_out = last(A) # output data view\nsum(u_out) # sum of transformed data (note that `sum` reduces over all processes)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Finally, we can perform a backward transform and do stuff with the input view:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan \\ A; # perform in-place backward transform\nnothing #hide","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"At this point, the data can be once again found in the input view u_in, while u_out should not be accessed.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"This page was generated using Literate.jl.","category":"page"},{"location":"Transforms/#Available-transforms","page":"Available transforms","title":"Available transforms","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"CurrentModule = PencilFFTs.Transforms","category":"page"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"Transforms","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms","page":"Available transforms","title":"PencilFFTs.Transforms","text":"Defines different one-dimensional FFT-based transforms.\n\nThe transforms are all subtypes of an AbstractTransform type.\n\nWhen possible, the names of the transforms are kept consistent with the functions exported by AbstractFFTs.jl and FFTW.jl.\n\n\n\n\n\n","category":"module"},{"location":"Transforms/#Transform-types","page":"Available transforms","title":"Transform types","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"FFT\nFFT!\nBFFT\nBFFT!\n\nRFFT\nBRFFT\n\nR2R\nR2R!\n\nNoTransform\nNoTransform!","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.FFT","page":"Available transforms","title":"PencilFFTs.Transforms.FFT","text":"FFT()\n\nComplex-to-complex FFT.\n\nSee also AbstractFFTs.fft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.FFT!","page":"Available transforms","title":"PencilFFTs.Transforms.FFT!","text":"FFT!()\n\nIn-place version of FFT.\n\nSee also AbstractFFTs.fft!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BFFT","page":"Available transforms","title":"PencilFFTs.Transforms.BFFT","text":"BFFT()\n\nUnnormalised backward complex-to-complex FFT.\n\nLike AbstractFFTs.bfft, this transform is not normalised. To obtain the inverse transform, divide the output by the length of the transformed dimension.\n\nSee also AbstractFFTs.bfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BFFT!","page":"Available transforms","title":"PencilFFTs.Transforms.BFFT!","text":"BFFT()\n\nIn-place version of BFFT.\n\nSee also AbstractFFTs.bfft!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.RFFT","page":"Available transforms","title":"PencilFFTs.Transforms.RFFT","text":"RFFT()\n\nReal-to-complex FFT.\n\nSee also AbstractFFTs.rfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BRFFT","page":"Available transforms","title":"PencilFFTs.Transforms.BRFFT","text":"BRFFT(d::Integer)\nBRFFT((d1, d2, ..., dN))\n\nUnnormalised inverse of RFFT.\n\nTo obtain the inverse transform, divide the output by the length of the transformed dimension (of the real output array).\n\nAs described in the AbstractFFTs docs, the length of the output cannot be fully inferred from the input length. For this reason, the BRFFT constructor accepts an optional d argument indicating the output length.\n\nFor multidimensional datasets, a tuple of dimensions (d1, d2, ..., dN) may also be passed. This is equivalent to passing just dN. In this case, the last dimension (dN) is the one that changes size between the input and output. Note that this is the opposite of FFTW.brfft. The reason is that, in PencilFFTs, the last dimension is the one along which a complex-to-real transform is performed.\n\nSee also AbstractFFTs.brfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.R2R","page":"Available transforms","title":"PencilFFTs.Transforms.R2R","text":"R2R(kind)\n\nReal-to-real transform of type kind.\n\nThe possible values of kind are those described in the FFTW.r2r docs and the FFTW manual:\n\ndiscrete cosine transforms: FFTW.REDFT00, FFTW.REDFT01, FFTW.REDFFT10, FFTW.REDFFT11\ndiscrete sine transforms: FFTW.RODFT00, FFTW.RODFT01, FFTW.RODFFT10, FFTW.RODFFT11\ndiscrete Hartley transform: FFTW.DHT\n\nNote: half-complex format DFTs (FFTW.R2HC, FFTW.HC2R) are not currently supported.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.R2R!","page":"Available transforms","title":"PencilFFTs.Transforms.R2R!","text":"R2R!(kind)\n\nIn-place version of R2R.\n\nSee also FFTW.r2r!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.NoTransform","page":"Available transforms","title":"PencilFFTs.Transforms.NoTransform","text":"NoTransform()\n\nIdentity transform.\n\nSpecifies that no transformation should be applied.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.NoTransform!","page":"Available transforms","title":"PencilFFTs.Transforms.NoTransform!","text":"NoTransform!()\n\nIn-place version of NoTransform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#Internals","page":"Available transforms","title":"Internals","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"What follows is used internally in PencilFFTs.","category":"page"},{"location":"Transforms/#Types","page":"Available transforms","title":"Types","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"AbstractCustomPlan\nAbstractTransform\nIdentityPlan\nIdentityPlan!\nPlan","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.AbstractCustomPlan","page":"Available transforms","title":"PencilFFTs.Transforms.AbstractCustomPlan","text":"AbstractCustomPlan\n\nAbstract type defining a custom plan, to be used as an alternative to FFTW plans (FFTW.FFTWPlan).\n\nThe only custom plan defined in this module is IdentityPlan. The user can define other custom plans that are also subtypes of AbstractCustomPlan.\n\nNote that plan returns a subtype of either FFTW.FFTWPlan or AbstractCustomPlan.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.AbstractTransform","page":"Available transforms","title":"PencilFFTs.Transforms.AbstractTransform","text":"AbstractTransform\n\nSpecifies a one-dimensional FFT-based transform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.IdentityPlan","page":"Available transforms","title":"PencilFFTs.Transforms.IdentityPlan","text":"IdentityPlan\n\nType of plan associated to NoTransform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.IdentityPlan!","page":"Available transforms","title":"PencilFFTs.Transforms.IdentityPlan!","text":"IdentityPlan!\n\nType of plan associated to NoTransform!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.Plan","page":"Available transforms","title":"PencilFFTs.Transforms.Plan","text":"Plan = Union{FFTW.FFTWPlan, AbstractCustomPlan}\n\nUnion type representing any plan returned by plan.\n\nSee also AbstractCustomPlan.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#Functions","page":"Available transforms","title":"Functions","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"plan\n\nbinv\nscale_factor\n\neltype_input\neltype_output\nexpand_dims\nis_inplace\nkind\nlength_output","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.plan","page":"Available transforms","title":"PencilFFTs.Transforms.plan","text":"plan(transform::AbstractTransform, A, [dims];\n flags=FFTW.ESTIMATE, timelimit=Inf)\n\nCreate plan to transform array A along dimensions dims.\n\nIf dims is not specified, all dimensions of A are transformed.\n\nFor FFT plans, this function wraps the AbstractFFTs.jl and FFTW.jl plan creation functions. For more details on the function arguments, see AbstractFFTs.plan_fft.\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.binv","page":"Available transforms","title":"PencilFFTs.Transforms.binv","text":"binv(transform::AbstractTransform, d::Integer)\n\nReturns the backwards transform associated to the given transform.\n\nThe second argument must be the length of the first transformed dimension in the forward transform. It is used in particular when transform = RFFT(), to determine the length of the inverse (complex-to-real) transform. See the AbstractFFTs.irfft docs for details.\n\nThe backwards transform returned by this function is not normalised. The normalisation factor for a given array can be obtained by calling scale_factor.\n\nExample\n\njulia> binv(Transforms.FFT(), 42)\nBFFT\n\njulia> binv(Transforms.BRFFT(9), 42)\nRFFT\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.scale_factor","page":"Available transforms","title":"PencilFFTs.Transforms.scale_factor","text":"scale_factor(transform::AbstractTransform, A, [dims = 1:ndims(A)])\n\nGet factor required to normalise the given array after a transformation along dimensions dims (all dimensions by default).\n\nThe array A must have the dimensions of the transform input.\n\nImportant: the dimensions dims must be the same that were passed to plan.\n\nExamples\n\njulia> C = zeros(ComplexF32, 3, 4, 5);\n\njulia> scale_factor(Transforms.FFT(), C)\n60\n\njulia> scale_factor(Transforms.BFFT(), C)\n60\n\njulia> scale_factor(Transforms.BFFT(), C, 2:3)\n20\n\njulia> R = zeros(Float64, 3, 4, 5);\n\njulia> scale_factor(Transforms.RFFT(), R, 2)\n4\n\njulia> scale_factor(Transforms.RFFT(), R, 2:3)\n20\n\njulia> scale_factor(Transforms.BRFFT(8), C)\n96\n\njulia> scale_factor(Transforms.BRFFT(9), C)\n108\n\nThis will fail because the input of RFFT is real, and R is a complex array:\n\njulia> scale_factor(Transforms.RFFT(), C, 2:3)\nERROR: MethodError: no method matching scale_factor(::PencilFFTs.Transforms.RFFT, ::Array{ComplexF32, 3}, ::UnitRange{Int64})\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.eltype_input","page":"Available transforms","title":"PencilFFTs.Transforms.eltype_input","text":"eltype_input(transform::AbstractTransform, real_type<:AbstractFloat)\n\nDetermine input data type for a given transform given the floating point precision of the input data.\n\nSome transforms, such as R2R and NoTransform, can take both real and complex data. For those kinds of transforms, nothing is returned.\n\nExample\n\njulia> eltype_input(Transforms.FFT(), Float32)\nComplexF32 (alias for Complex{Float32})\n\njulia> eltype_input(Transforms.RFFT(), Float64)\nFloat64\n\njulia> eltype_input(Transforms.R2R(FFTW.REDFT01), Float64) # nothing\n\njulia> eltype_input(Transforms.NoTransform(), Float64) # nothing\n\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.eltype_output","page":"Available transforms","title":"PencilFFTs.Transforms.eltype_output","text":"eltype_output(transform::AbstractTransform, eltype_input)\n\nReturns the output data type for a given transform given the input type.\n\nThrows ArgumentError if the input data type is incompatible with the transform type.\n\nExample\n\njulia> eltype_output(Transforms.NoTransform(), Float32)\nFloat32\n\njulia> eltype_output(Transforms.RFFT(), Float64)\nComplexF64 (alias for Complex{Float64})\n\njulia> eltype_output(Transforms.BRFFT(4), ComplexF32)\nFloat32\n\njulia> eltype_output(Transforms.FFT(), Float64)\nERROR: ArgumentError: invalid input data type for PencilFFTs.Transforms.FFT: Float64\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.expand_dims","page":"Available transforms","title":"PencilFFTs.Transforms.expand_dims","text":"expand_dims(transform::AbstractTransform, Val(N))\n\nExpand a single multidimensional transform into one transform per dimension.\n\nExample\n\n# Expand a real-to-complex transform in 3 dimensions.\njulia> expand_dims(Transforms.RFFT(), Val(3))\n(RFFT, FFT, FFT)\n\njulia> expand_dims(Transforms.BRFFT(4), Val(3))\n(BFFT, BFFT, BRFFT{even})\n\njulia> expand_dims(Transforms.NoTransform(), Val(2))\n(NoTransform, NoTransform)\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.is_inplace","page":"Available transforms","title":"PencilFFTs.Transforms.is_inplace","text":"is_inplace(transform::AbstractTransform) -> Bool\nis_inplace(transforms::Vararg{AbtractTransform}) -> Union{Bool, Nothing}\n\nCheck whether a transform or a list of transforms is performed in-place.\n\nIf the list of transforms has a combination of in-place and out-of-place transforms, nothing is returned.\n\nExample\n\njulia> is_inplace(Transforms.RFFT())\nfalse\n\njulia> is_inplace(Transforms.NoTransform!())\ntrue\n\njulia> is_inplace(Transforms.FFT!(), Transforms.R2R!(FFTW.REDFT01))\ntrue\n\njulia> is_inplace(Transforms.FFT(), Transforms.R2R(FFTW.REDFT01))\nfalse\n\njulia> is_inplace(Transforms.FFT(), Transforms.R2R!(FFTW.REDFT01)) === nothing\ntrue\n\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.kind","page":"Available transforms","title":"PencilFFTs.Transforms.kind","text":"kind(transform::R2R)\n\nGet kind of real-to-real transform.\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.length_output","page":"Available transforms","title":"PencilFFTs.Transforms.length_output","text":"length_output(transform::AbstractTransform, length_in::Integer)\n\nReturns the length of the transform output, given the length of its input.\n\nThe input and output lengths are specified in terms of the respective input and output datatypes. For instance, for real-to-complex transforms, these are respectively the length of input real data and of output complex data.\n\n\n\n\n\n","category":"function"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/navier_stokes.jl\"","category":"page"},{"location":"generated/navier_stokes/#Navier–Stokes-equations","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"In this example, we numerically solve the incompressible Navier–Stokes equations","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"_t bmv + (bmv bm) bmv = -frac1ρ bm p + ν ^2 bmv\nquad bm bmv = 0","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where bmv(bmx t) and p(bmx t) are respectively the velocity and pressure fields, ν is the fluid kinematic viscosity and ρ is the fluid density.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We solve the above equations a 3D periodic domain using a standard Fourier pseudo-spectral method.","category":"page"},{"location":"generated/navier_stokes/#First-steps","page":"Navier–Stokes equations","title":"First steps","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We start by loading the required packages, initialising MPI and setting the simulation parameters.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using MPI\nusing PencilFFTs\n\nMPI.Init()\ncomm = MPI.COMM_WORLD\nprocid = MPI.Comm_rank(comm) + 1\n\n# Simulation parameters\nNs = (64, 64, 64) # = (Nx, Ny, Nz)\nLs = (2π, 2π, 2π) # = (Lx, Ly, Lz)\n\n# Collocation points (\"global\" = over all processes).\n# We include the endpoint (length = N + 1) for convenience.\nxs_global = map((N, L) -> range(0, L; length = N + 1), Ns, Ls) # = (x, y, z)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's check the number of MPI processes over which we're running our simulation:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"MPI.Comm_size(comm)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can now create a partitioning of the domain based on the number of grid points (Ns) and on the number of MPI processes. There are different ways to do this. For simplicity, here we do it automatically following the PencilArrays.jl docs:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"pen = Pencil(Ns, comm)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"The subdomain associated to the local MPI process can be obtained using range_local:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"range_local(pen)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We now construct a distributed vector field that follows the decomposition configuration we just created:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v⃗₀ = (\n PencilArray{Float64}(undef, pen), # vx\n PencilArray{Float64}(undef, pen), # vy\n PencilArray{Float64}(undef, pen), # vz\n)\nsummary(v⃗₀[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We still need to fill this array with interesting values that represent a physical velocity field.","category":"page"},{"location":"generated/navier_stokes/#Initial-condition","page":"Navier–Stokes equations","title":"Initial condition","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's set the initial condition in physical space. In this example, we choose the Taylor–Green vortex configuration as an initial condition:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"beginaligned\nv_x(x y z) = u₀ sin(k₀ x) cos(k₀ y) cos(k₀ z) \nv_y(x y z) = -u₀ cos(k₀ x) sin(k₀ y) cos(k₀ z) \nv_z(x y z) = 0\nendaligned","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where u₀ and k₀ are two parameters setting the amplitude and the period of the velocity field.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To set the initial condition, each MPI process needs to know which portion of the physical grid it has been attributed. For this, PencilArrays.jl includes a localgrid helper function:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"grid = localgrid(pen, xs_global)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can use this to initialise the velocity field:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"u₀ = 1.0\nk₀ = 2π / Ls[1] # should be integer if L = 2π (to preserve periodicity)\n\n@. v⃗₀[1] = u₀ * sin(k₀ * grid.x) * cos(k₀ * grid.y) * cos(k₀ * grid.z)\n@. v⃗₀[2] = -u₀ * cos(k₀ * grid.x) * sin(k₀ * grid.y) * cos(k₀ * grid.z)\n@. v⃗₀[3] = 0\nnothing # hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's plot a 2D slice of the velocity field managed by the local MPI process:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using GLMakie\n\n# Compute the norm of a vector field represented by a tuple of arrays.\nfunction vecnorm(v⃗::NTuple)\n vnorm = similar(v⃗[1])\n for n ∈ eachindex(v⃗[1])\n w = zero(eltype(vnorm))\n for v ∈ v⃗\n w += v[n]^2\n end\n vnorm[n] = sqrt(w)\n end\n vnorm\nend\n\nlet fig = Figure(resolution = (700, 600))\n ax = Axis3(fig[1, 1]; aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\")\n vnorm = vecnorm(v⃗₀)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, vnorm;\n alpha = 0.2, levels = 4,\n colormap = :viridis, colorrange = (0.0, 1.0),\n )\n cb = Colorbar(fig[1, 2], ct; label = \"Velocity magnitude\")\n fig\nend","category":"page"},{"location":"generated/navier_stokes/#Velocity-in-Fourier-space","page":"Navier–Stokes equations","title":"Velocity in Fourier space","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"In the Fourier pseudo-spectral method, the periodic velocity field is discretised in space as a truncated Fourier series","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"bmv(bmx t) =\n_bmk hatbmv_bmk(t) e^i bmk bmx","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where bmk = (k_x k_y k_z) are the discrete wave numbers.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"The wave numbers can be obtained using the fftfreq function. Since we perform a real-to-complex transform along the first dimension, we use rfftfreq instead for k_x:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using AbstractFFTs: fftfreq, rfftfreq\n\nks_global = (\n rfftfreq(Ns[1], 2π * Ns[1] / Ls[1]), # kx | real-to-complex\n fftfreq(Ns[2], 2π * Ns[2] / Ls[2]), # ky | complex-to-complex\n fftfreq(Ns[3], 2π * Ns[3] / Ls[3]), # kz | complex-to-complex\n)\n\nks_global[1]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ks_global[2]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ks_global[3]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To transform the velocity field to Fourier space, we first create a real-to-complex FFT plan to be applied to one of the velocity components:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"plan = PencilFFTPlan(v⃗₀[1], Transforms.RFFT())","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"See PencilFFTPlan for details on creating plans and on optional keyword arguments.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can now apply this plan to the three velocity components to obtain the respective Fourier coefficients hatbmv_bmk:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v̂s = plan .* v⃗₀\nsummary(v̂s[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Note that, in Fourier space, the domain decomposition is performed along the directions x and y:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"pencil(v̂s[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"This is because the 3D FFTs are performed one dimension at a time, with the x direction first and the z direction last. To efficiently perform an FFT along a given direction (taking advantage of serial FFT implementations like FFTW), all the data along that direction must be contained locally within a single MPI process. For that reason, data redistributions (or transpositions) among MPI processes are performed behind the scenes during each FFT computation. Such transpositions require important communications between MPI processes, and are usually the most time-consuming aspect of massively-parallel simulations using this kind of methods.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To solve the Navier–Stokes equations in Fourier space, we will also need the respective wave numbers bmk associated to the local MPI process. Similarly to the local grid points, these are obtained using the localgrid function:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"grid_fourier = localgrid(v̂s[1], ks_global)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"As an example, let's first use this to compute and plot the vorticity associated to the initial condition. The vorticity is defined as the curl of the velocity, bmω = bm bmv. In Fourier space, this becomes hatbmω = i bmk hatbmv.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using StaticArrays: SVector\nusing LinearAlgebra: ×\n\nfunction curl_fourier!(\n ω̂s::NTuple{N, <:PencilArray}, v̂s::NTuple{N, <:PencilArray}, grid_fourier,\n ) where {N}\n @inbounds for I ∈ eachindex(grid_fourier)\n # We use StaticArrays for the cross product between small vectors.\n ik⃗ = im * SVector(grid_fourier[I])\n v⃗ = SVector(getindex.(v̂s, Ref(I))) # = (v̂s[1][I], v̂s[2][I], ...)\n ω⃗ = ik⃗ × v⃗\n for n ∈ eachindex(ω⃗)\n ω̂s[n][I] = ω⃗[n]\n end\n end\n ω̂s\nend\n\nω̂s = similar.(v̂s)\ncurl_fourier!(ω̂s, v̂s, grid_fourier);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We finally transform back to physical space and plot the result:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ωs = plan .\\ ω̂s\n\nlet fig = Figure(resolution = (700, 600))\n ax = Axis3(fig[1, 1]; aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\")\n ω_norm = vecnorm(ωs)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, ω_norm;\n alpha = 0.1, levels = 0.8:0.2:2.0,\n colormap = :viridis, colorrange = (0.8, 2.0),\n )\n cb = Colorbar(fig[1, 2], ct; label = \"Vorticity magnitude\")\n fig\nend","category":"page"},{"location":"generated/navier_stokes/#Computing-the-non-linear-term","page":"Navier–Stokes equations","title":"Computing the non-linear term","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"One can show that, in Fourier space, the incompressible Navier–Stokes equations can be written as","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"_t hatbmv_bmk =\n- mathcalP_bmk left widehat(bmv bm) bmv right\n- ν bmk^2 hatbmv_bmk\nquad text with quad\nmathcalP_bmk(hatbmF_bmk) = left( I - fracbmk \nbmkbmk^2 right) hatbmF_bmk","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where mathcalP_bmk is a projection operator allowing to preserve the incompressibility condition bm bmv = 0. This operator encodes the action of the pressure gradient term, which serves precisely to enforce incompressibility. Note that, because of this, the pressure gradient dissapears from the equations.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Now that we have the wave numbers bmk, computing the linear viscous term in Fourier space is straighforward once we have the Fourier coefficients hatbmv_bmk of the velocity field. What is slightly more challenging (and much more costly) is the computation of the non-linear term in Fourier space, hatbmF_bmk = left widehat(bmv bm) bmv right_bmk. In the pseudo-spectral method, the quadratic nonlinearity is computed by collocation in physical space (i.e. this term is evaluated at grid points), while derivatives are computed in Fourier space. This requires transforming fields back and forth between both spaces.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Below we implement a function that computes the non-linear term in Fourier space based on its convective form (bmv bm) bmv = bm (bmv bmv). Note that this equivalence uses the incompressibility condition bm bmv = 0.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using LinearAlgebra: mul!, ldiv! # for applying FFT plans in-place\n\n# Compute non-linear term in Fourier space from velocity field in physical\n# space. Optional keyword arguments may be passed to avoid allocations.\nfunction ns_nonlinear!(\n F̂s, vs, plan, grid_fourier;\n vbuf = similar(vs[1]), v̂buf = similar(F̂s[1]),\n )\n # Compute F_i = ∂_j (v_i v_j) for each i.\n # In Fourier space: F̂_i = im * k_j * FFT(v_i * v_j)\n w, ŵ = vbuf, v̂buf\n @inbounds for (i, F̂i) ∈ enumerate(F̂s)\n F̂i .= 0\n vi = vs[i]\n for (j, vj) ∈ enumerate(vs)\n w .= vi .* vj # w = v_i * v_j in physical space\n mul!(ŵ, plan, w) # same in Fourier space\n # Add derivative in Fourier space\n for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n kj = k⃗[j]\n F̂i[I] += im * kj * ŵ[I]\n end\n end\n end\n F̂s\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"As an example, let's use this function on our initial velocity field:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"F̂s = similar.(v̂s)\nns_nonlinear!(F̂s, v⃗₀, plan, grid_fourier);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Strictly speaking, computing the non-linear term by collocation can lead to aliasing errors, as the quadratic term excites Fourier modes that fall beyond the range of resolved wave numbers. The typical solution is to apply Orzsag's 2/3 rule to zero-out the Fourier coefficients associated to the highest wave numbers. We define a function that applies this procedure below.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function dealias_twothirds!(ŵs::Tuple, grid_fourier, ks_global)\n ks_max = maximum.(abs, ks_global) # maximum stored wave numbers (kx_max, ky_max, kz_max)\n ks_lim = (2 / 3) .* ks_max\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I]\n if any(abs.(k⃗) .> ks_lim)\n for ŵ ∈ ŵs\n ŵ[I] = 0\n end\n end\n end\n ŵs\nend\n\n# We can apply this on the previously computed non-linear term:\ndealias_twothirds!(F̂s, grid_fourier, ks_global);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Finally, we implement the projection associated to the incompressibility condition:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function project_divergence_free!(ûs, grid_fourier)\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I]\n k² = sum(abs2, k⃗)\n iszero(k²) && continue # avoid division by zero\n û = getindex.(ûs, Ref(I)) # (ûs[1][I], ûs[2][I], ...)\n for i ∈ eachindex(û)\n ŵ = û[i]\n for j ∈ eachindex(û)\n ŵ -= k⃗[i] * k⃗[j] * û[j] / k²\n end\n ûs[i][I] = ŵ\n end\n end\n ûs\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can verify the correctness of the projection operator by checking that the initial velocity field is not modified by it, since it is already incompressible:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v̂s_proj = project_divergence_free!(copy.(v̂s), grid_fourier)\nv̂s_proj .≈ v̂s # the last one may be false because v_z = 0 initially","category":"page"},{"location":"generated/navier_stokes/#Putting-it-all-together","page":"Navier–Stokes equations","title":"Putting it all together","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To perform the time integration of the Navier–Stokes equations, we will use the timestepping routines implemented in the DifferentialEquations.jl suite. For simplicity, we use here an explicit Runge–Kutta scheme. In this case, we just need to write a function that computes the right-hand side of the Navier–Stokes equations in Fourier space:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function ns_rhs!(\n dvs::NTuple{N, <:PencilArray}, vs::NTuple{N, <:PencilArray}, p, t,\n ) where {N}\n # 1. Compute non-linear term and dealias it\n (; plan, cache, ks_global, grid_fourier) = p\n F̂s = cache.F̂s\n ns_nonlinear!(F̂s, vs, plan, grid_fourier; vbuf = dvs[1], v̂buf = cache.v̂s[1])\n dealias_twothirds!(F̂s, grid_fourier, ks_global)\n\n # 2. Project onto divergence-free space\n project_divergence_free!(F̂s, grid_fourier)\n\n # 3. Transform velocity to Fourier space\n v̂s = cache.v̂s\n map((v, v̂) -> mul!(v̂, plan, v), vs, v̂s)\n\n # 4. Add viscous term (and multiply projected non-linear term by -1)\n ν = p.ν\n for n ∈ eachindex(v̂s)\n v̂ = v̂s[n]\n F̂ = F̂s[n]\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n k² = sum(abs2, k⃗)\n F̂[I] = -F̂[I] - ν * k² * v̂[I]\n end\n end\n\n # 5. Transform RHS back to physical space\n map((dv, dv̂) -> ldiv!(dv, plan, dv̂), dvs, F̂s)\n\n nothing\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"For the time-stepping, we load OrdinaryDiffEq.jl from the DifferentialEquations.jl suite and set-up the simulation. Since DifferentialEquations.jl can't directly deal with tuples of arrays, we convert the input data to the ArrayPartition type and write an interface function to make things work with our functions defined above.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using OrdinaryDiffEq\nusing RecursiveArrayTools: ArrayPartition\n\nns_rhs!(dv::ArrayPartition, v::ArrayPartition, args...) = ns_rhs!(dv.x, v.x, args...)\nvs_init_ode = ArrayPartition(v⃗₀)\nsummary(vs_init_ode)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We now define solver parameters and temporary variables, and initialise the problem:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"params = (;\n ν = 5e-3, # kinematic viscosity\n plan, grid_fourier, ks_global,\n cache = (\n v̂s = similar.(v̂s),\n F̂s = similar.(v̂s),\n )\n)\n\ntspan = (0.0, 10.0)\nprob = ODEProblem(ns_rhs!, vs_init_ode, tspan, params)\nintegrator = init(prob, RK4(); dt = 1e-3, save_everystep = false);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We finally solve the problem over time and plot the vorticity associated to the solution. It is also useful to look at the energy spectrum E(k), to see if the small scales are correctly resolved. To obtain a turbulent flow, the viscosity ν must be small enough to allow the transient appearance of an energy cascade towards the small scales (i.e. from small to large k), while high enough to allow the small-scale motions to be correctly resolved.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function energy_spectrum!(Ek, ks, v̂s, grid_fourier)\n Nk = length(Ek)\n @assert Nk == length(ks)\n Ek .= 0\n for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n knorm = sqrt(sum(abs2, k⃗))\n i = searchsortedfirst(ks, knorm)\n i > Nk && continue\n v⃗ = getindex.(v̂s, Ref(I)) # = (v̂s[1][I], v̂s[2][I], ...)\n factor = k⃗[1] == 0 ? 1 : 2 # account for Hermitian symmetry and r2c transform\n Ek[i] += factor * sum(abs2, v⃗) / 2\n end\n MPI.Allreduce!(Ek, +, get_comm(v̂s[1])) # sum across all processes\n Ek\nend\n\nks = rfftfreq(Ns[1], 2π * Ns[1] / Ls[1])\nEk = similar(ks)\nv̂s = plan .* integrator.u.x\nenergy_spectrum!(Ek, ks, v̂s, grid_fourier)\nEk ./= scale_factor(plan)^2 # rescale energy\n\ncurl_fourier!(ω̂s, v̂s, grid_fourier)\nldiv!.(ωs, plan, ω̂s)\nω⃗_plot = Observable(ωs)\nk_plot = @view ks[2:end]\nE_plot = Observable(@view Ek[2:end])\nt_plot = Observable(integrator.t)\n\nfig = let\n fig = Figure(resolution = (1200, 600))\n ax = Axis3(\n fig[1, 1][1, 1]; title = @lift(\"t = $(round($t_plot, digits = 3))\"),\n aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\",\n )\n ω_mag = @lift vecnorm($ω⃗_plot)\n ω_mag_norm = @lift $ω_mag ./ maximum($ω_mag)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, ω_mag_norm;\n alpha = 0.3, levels = 3,\n colormap = :viridis, colorrange = (0.0, 1.0),\n )\n cb = Colorbar(fig[1, 1][1, 2], ct; label = \"Normalised vorticity magnitude\")\n ax_sp = Axis(\n fig[1, 2];\n xlabel = \"k\", ylabel = \"E(k)\", xscale = log2, yscale = log10,\n title = \"Kinetic energy spectrum\",\n )\n ylims!(ax_sp, 1e-8, 1e0)\n scatterlines!(ax_sp, k_plot, E_plot)\n ks_slope = exp.(range(log(2.5), log(25.0), length = 3))\n E_fivethirds = @. 0.3 * ks_slope^(-5/3)\n @views lines!(ax_sp, ks_slope, E_fivethirds; color = :black, linestyle = :dot)\n text!(ax_sp, L\"k^{-5/3}\"; position = (ks_slope[2], E_fivethirds[2]), align = (:left, :bottom))\n fig\nend\n\nusing Printf # hide\nwith_xvfb = ENV[\"DISPLAY\"] == \":99\" # hide\nnstep = 0 # hide\nconst tmpdir = mktempdir() # hide\nfilename_frame(procid, nstep) = joinpath(tmpdir, @sprintf(\"proc%d_%04d.png\", procid, nstep)) # hide\nrecord(fig, \"vorticity_proc$procid.mp4\"; framerate = 10) do io\n with_xvfb && recordframe!(io) # hide\n while integrator.t < 20\n dt = 0.001\n step!(integrator, dt)\n t_plot[] = integrator.t\n mul!.(v̂s, plan, integrator.u.x) # current velocity in Fourier space\n curl_fourier!(ω̂s, v̂s, grid_fourier)\n ldiv!.(ω⃗_plot[], plan, ω̂s)\n ω⃗_plot[] = ω⃗_plot[] # to force updating the plot\n energy_spectrum!(Ek, ks, v̂s, grid_fourier)\n Ek ./= scale_factor(plan)^2 # rescale energy\n E_plot[] = E_plot[]\n global nstep += 1 # hide\n with_xvfb ? # hide\n save(filename_frame(procid, nstep), fig) : # hide\n recordframe!(io)\n end\nend;\n\nif with_xvfb # hide\n run(pipeline(`ffmpeg -y -r 10 -i $tmpdir/proc$(procid)_%04d.png -c:v libx264 -vf \"fps=25,format=yuv420p\" vorticity_proc$procid.mp4`; stdout = \"ffmpeg.out\", stderr = \"ffmpeg.err\")) # hide\nend # hide\nnothing # hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"This page was generated using Literate.jl.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/gradient.jl\"","category":"page"},{"location":"generated/gradient/#Gradient-of-a-scalar-field","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This example shows different methods to compute the gradient of a real-valued 3D scalar field θ(bmx) in Fourier space, where bmx = (x y z). It is assumed that the field is periodic with period L = 2π along all dimensions.","category":"page"},{"location":"generated/gradient/#General-procedure","page":"Gradient of a scalar field","title":"General procedure","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The discrete Fourier expansion of θ writes","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"θ(bmx) = _bmk Z^3 hatθ(bmk) e^i bmk bmx","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"where bmk = (k_x k_y k_z) are the Fourier wave numbers and hatθ is the discrete Fourier transform of θ. Then, the spatial derivatives of θ are given by","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"frac θ(bmx) x_i =\n_bmk Z^3 i k_i hatθ(bmk) e^i bmk bmx","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"where the subscript i denotes one of the spatial components x, y or z.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In other words, to compute bm θ = (_x θ _y θ _z θ), one has to:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"transform θ to Fourier space to obtain hatθ,\nmultiply hatθ by i bmk,\ntransform the result back to physical space to obtain bm θ.","category":"page"},{"location":"generated/gradient/#Preparation","page":"Gradient of a scalar field","title":"Preparation","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In this section, we initialise a random real-valued scalar field θ and compute its FFT. For more details see the Tutorial.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"using MPI\nusing PencilFFTs\nusing Random\n\nMPI.Init()\n\n# Input data dimensions (Nx × Ny × Nz)\ndims = (64, 32, 64)\n\n# Apply a 3D real-to-complex (r2c) FFT.\ntransform = Transforms.RFFT()\n\n# Automatically create decomposition configuration\ncomm = MPI.COMM_WORLD\npen = Pencil(dims, comm)\n\n# Create plan\nplan = PencilFFTPlan(pen, transform)\n\n# Allocate data and initialise field\nθ = allocate_input(plan)\nrandn!(θ)\n\n# Perform distributed FFT\nθ_hat = plan * θ\nnothing # hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, we initialise the output that will hold ∇θ in Fourier space. Noting that ∇θ is a vector field, we choose to store it as a tuple of 3 PencilArrays.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"∇θ_hat = allocate_output(plan, Val(3))\n\n# This is equivalent:\n# ∇θ_hat = ntuple(d -> similar(θ_hat), Val(3))\n\nsummary(∇θ_hat)\n\n# Fourier wave numbers","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In general, the Fourier wave numbers are of the form k_i = 0 frac2πL_i frac4πL_i frac6πL_i , where L_i is the period along dimension i. When a real-to-complex Fourier transform is applied, roughly half of these wave numbers are redundant due to the Hermitian symmetry of the complex Fourier coefficients. In practice, this means that for the fastest dimension x (along which a real-to-complex transform is performed), the negative wave numbers are dropped, i.e. k_x = 0 frac2πL_x frac4πL_x .","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The AbstractFFTs package provides a convenient way to generate the Fourier wave numbers, using the functions fftfreq and rfftfreq. We can use these functions to initialise a \"grid\" of wave numbers associated to our 3D real-to-complex transform:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"using AbstractFFTs: fftfreq, rfftfreq\n\nbox_size = (2π, 2π, 2π) # Lx, Ly, Lz\nsample_rate = 2π .* dims ./ box_size\n\n# In our case (Lx = 2π and Nx even), this gives kx = [0, 1, 2, ..., Nx/2].\nkx = rfftfreq(dims[1], sample_rate[1])\n\n# In our case (Ly = 2π and Ny even), this gives\n# ky = [0, 1, 2, ..., Ny/2-1, -Ny/2, -Ny/2+1, ..., -1] (and similarly for kz).\nky = fftfreq(dims[2], sample_rate[2])\nkz = fftfreq(dims[3], sample_rate[3])\n\nkvec = (kx, ky, kz)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that kvec now contains the wave numbers associated to the global domain. In the following, we will only need the wave numbers associated to the portion of the domain handled by the local MPI process.","category":"page"},{"location":"generated/gradient/#gradient_method_global","page":"Gradient of a scalar field","title":"Method 1: global views","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"PencilArrays, returned for instance by allocate_input and allocate_output, take indices that start at 1, regardless of the location of the subdomain associated to the local process on the global grid. (In other words, PencilArrays take local indices.) On the other hand, we have defined the wave number vector kvec which, for each MPI process, is defined over the global domain, and as such it takes global indices.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"One straightforward way of making data arrays compatible with wave numbers is to use global views, i.e. arrays that take global indices. These are generated from PencilArrays by calling the global_view function. Note that, in general, global indices do not start at 1 for a given MPI process. A given process will own a range of data given by indices in (i1:i2, j1:j2, k1:k2).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"θ_glob = global_view(θ_hat)\n∇θ_glob = global_view.(∇θ_hat)\nsummary(θ_glob)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Once we have global views, we can combine data and wave numbers using the portion of global indices owned by the local MPI process, as shown below. We can use CartesianIndices to iterate over the global indices associated to the local process.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"for I in CartesianIndices(θ_glob)\n i, j, k = Tuple(I) # unpack indices\n\n # Wave number vector associated to current Cartesian index.\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n # Compute gradient in Fourier space.\n # Note that modifying ∇θ_glob also modifies the original PencilArray ∇θ_hat.\n ∇θ_glob[1][I] = im * kx * θ_glob[I]\n ∇θ_glob[2][I] = im * ky * θ_glob[I]\n ∇θ_glob[3][I] = im * kz * θ_glob[I]\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The above loop can be written in a slightly more efficient manner by precomputing im * θ_glob[I]:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@inbounds for I in CartesianIndices(θ_glob)\n i, j, k = Tuple(I)\n\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n u = im * θ_glob[I]\n\n ∇θ_glob[1][I] = kx * u\n ∇θ_glob[2][I] = ky * u\n ∇θ_glob[3][I] = kz * u\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Also note that the above can be easily written in a more generic way, e.g. for arbitrary dimensions, thanks in part to the use of CartesianIndices. Moreover, in the above there is no notion of the dimension permutations discussed in the tutorial, as it is all hidden behind the implementation of PencilArrays. And as seen later in the benchmarks, these (hidden) permutations have zero cost, as the speed is identical to that of a function that explicitly takes into account these permutations.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, we can perform a backwards transform to obtain bm θ in physical space:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"∇θ = plan \\ ∇θ_hat;\nnothing #hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that the transform is automatically broadcast over the three fields of the ∇θ_hat vector, and the result ∇θ is also a tuple of three PencilArrays.","category":"page"},{"location":"generated/gradient/#gradient_method_global_explicit","page":"Gradient of a scalar field","title":"Method 2: explicit global indexing","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Sometimes, one does not need to write generic code. In our case, one often knows the dimensionality of the problem and the memory layout of the data (i.e. the underlying index permutation).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Below is a reimplementation of the above loop, using explicit indices instead of CartesianIndices, and assuming that the underlying index permutation is (3, 2, 1), that is, data is stored in (z y x) order. As discussed in the tutorial, this is the default for transformed arrays. This example also serves as a more explicit explanation for what is going on in the first method.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"# Get local data range in the global grid.\nrng = axes(θ_glob) # = (i1:i2, j1:j2, k1:k2)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"For the loop below, we're assuming that the permutation is (3, 2, 1). In other words, the fastest index is the last one, and not the first one as it is usually in Julia. If the permutation is not (3, 2, 1), things will still work (well, except for the assertion below!), but the loop order will not be optimal.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@assert permutation(θ_hat) === Permutation(3, 2, 1)\n\n@inbounds for i in rng[1], j in rng[2], k in rng[3]\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n # Note that we still access the arrays in (i, j, k) order.\n # (The permutation happens behind the scenes!)\n u = im * θ_glob[i, j, k]\n\n ∇θ_glob[1][i, j, k] = kx * u\n ∇θ_glob[2][i, j, k] = ky * u\n ∇θ_glob[3][i, j, k] = kz * u\nend","category":"page"},{"location":"generated/gradient/#gradient_method_local","page":"Gradient of a scalar field","title":"Method 3: using local indices","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Alternatively, we can avoid global views and work directly on PencilArrays using local indices that start at 1. In this case, part of the strategy is to construct a \"local\" grid of wave numbers that can also be accessed with local indices. This can be conveniently done using the localgrid function of the PencilArrays.jl package, which accepts a PencilArray (or its associated Pencil) and the global coordinates (here kvec):","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"grid_fourier = localgrid(θ_hat, kvec)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that one can directly iterate on the returned grid object:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@inbounds for I in CartesianIndices(grid_fourier)\n # Wave number vector associated to current Cartesian index.\n local k⃗ # hide\n k⃗ = grid_fourier[I]\n u = im * θ_hat[I]\n ∇θ_hat[1][I] = k⃗[1] * u\n ∇θ_hat[2][I] = k⃗[2] * u\n ∇θ_hat[3][I] = k⃗[3] * u\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This implementation is as efficient as the other examples, while being slightly shorter to write. Moreover, it is quite generic, and can be made independent of the number of dimensions with little effort.","category":"page"},{"location":"generated/gradient/#gradient_method_broadcast","page":"Gradient of a scalar field","title":"Method 4: using broadcasting","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, note that the local grid object returned by localgrid makes it is possible to compute the gradient using broadcasting, thus fully avoiding scalar indexing. This can be quite convenient in some cases, and can also be very useful if one is working on GPUs (where scalar indexing is prohibitively expensive). Using broadcasting, the above examples simply become:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@. ∇θ_hat[1] = im * grid_fourier[1] * θ_hat\n@. ∇θ_hat[2] = im * grid_fourier[2] * θ_hat\n@. ∇θ_hat[3] = im * grid_fourier[3] * θ_hat\nnothing # hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Once again, as shown in the benchmarks further below, this method performs quite similarly to the other ones.","category":"page"},{"location":"generated/gradient/#Summary","page":"Gradient of a scalar field","title":"Summary","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The PencilArrays module provides different alternatives to deal with MPI-distributed data that may be subject to dimension permutations. In particular, one can choose to work with global indices (first two examples), with local indices (third example), or to avoid scalar indexing altogether (fourth example).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"If one wants to stay generic, making sure that the same code will work for arbitrary dimensions and will be efficient regardless of the underlying dimension permutation, methods 1, 3 or 4 should be preferred. These use CartesianIndices and make no assumptions on possible dimension permutations, which are by default enabled in the output of PencilFFTs transforms. In fact, such permutations are completely invisible in the implementations.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The second method uses explicit (i, j, k) indices. It assumes that the underlying permutation is (3, 2, 1) to loop with i as the slowest index and k as the fastest, which is the optimal order in this case given the permutation. As such, the implementation is less generic than the others, and differences in performance are negligible with respect to more generic variants.","category":"page"},{"location":"generated/gradient/#gradient_benchmarks","page":"Gradient of a scalar field","title":"Benchmark results","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The following are the benchmark results obtained from running examples/gradient.jl on a laptop, using 2 MPI processes and Julia 1.7.2, with an input array of global dimensions 64 32 64. The different methods detailed above are marked on the right. The \"lazy\" marks indicate runs where the wave numbers were represented by lazy Frequencies objects (returned by rfftfreq and fftfreq). Otherwise, they were collected into Vectors. For some reason, plain Vectors are faster when working with grids generated by localgrid.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In the script, additional implementations can be found which rely on a more advanced understanding of permutations and on the internals of the PencilArrays package. For instance, gradient_local_parent! directly works with the raw data stored in Julia Arrays, while gradient_local_linear! completely avoids CartesianIndices while staying generic and efficient. Nevertheless, these display roughly the same performance as the above examples.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":" gradient_global_view!... 89.900 μs\n gradient_global_view! (lazy)... 92.060 μs [Method 1]\n gradient_global_view_explicit!... 88.958 μs\n gradient_global_view_explicit! (lazy)... 81.055 μs [Method 2]\n gradient_local!... 92.305 μs\n gradient_grid!... 92.770 μs\n gradient_grid! (lazy)... 101.388 μs [Method 3]\n gradient_grid_broadcast!... 88.606 μs\n gradient_grid_broadcast! (lazy)... 151.020 μs [Method 4]\n gradient_local_parent!... 92.248 μs\n gradient_local_linear!... 91.212 μs\n gradient_local_linear_explicit!... 90.992 μs","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This page was generated using Literate.jl.","category":"page"},{"location":"PencilFFTs_timers/#PencilFFTs.measuring_performance","page":"Measuring performance","title":"Measuring performance","text":"","category":"section"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"It is possible to measure the time spent in different sections of the distributed transforms using the TimerOutputs package. This has a (very small) performance overhead, so it is disabled by default. To enable time measurements, call TimerOutputs.enable_debug_timings after loading PencilFFTs (see below for an example). For more details see the TimerOutputs docs.","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"Minimal example:","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"using MPI\nusing PencilFFTs\nusing TimerOutputs\n\n# Enable timing of `PencilFFTs` functions\nTimerOutputs.enable_debug_timings(PencilFFTs)\nTimerOutputs.enable_debug_timings(PencilArrays)\nTimerOutputs.enable_debug_timings(Transpositions)\n\nMPI.Init()\n\nplan = PencilFFTPlan(#= args... =#)\n\n# [do stuff with `plan`...]\n\n# Retrieve and print timing data associated to `plan`\nto = timer(plan)\nprint_timer(to)","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"By default, each PencilFFTPlan has its own TimerOutput. If you already have a TimerOutput, you can pass it to the PencilFFTPlan constructor:","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"to = TimerOutput()\nplan = PencilFFTPlan(..., timer=to)\n\n# [do stuff with `plan`...]\n\nprint_timer(to)","category":"page"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"#PencilFFTs","page":"Home","title":"PencilFFTs","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Fast Fourier transforms of MPI-distributed Julia arrays.","category":"page"},{"location":"#Introduction","page":"Home","title":"Introduction","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package provides multidimensional FFTs and related transforms on MPI-distributed Julia arrays via the PencilArrays package.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The name of this package originates from the decomposition of 3D domains along two out of three dimensions, sometimes called pencil decomposition. This is illustrated by the figure below,[1] where each coloured block is managed by a different MPI process. Typically, one wants to compute FFTs on a scalar or vector field along the three spatial dimensions. In the case of a pencil decomposition, 3D FFTs are performed one dimension at a time, along the non-decomposed direction. Transforms must then be interleaved with global data transpositions to switch between pencil configurations. In high-performance computing environments, such data transpositions are generally the most expensive part of a parallel FFT computation, due to the large cost of communications between computing nodes.","category":"page"},{"location":"","page":"Home","title":"Home","text":"
\n \n
","category":"page"},{"location":"","page":"Home","title":"Home","text":"More generally, PencilFFTs allows to decompose and perform FFTs on geometries of arbitrary dimension N. The decompositions can be performed along an arbitrary number M N of dimensions.[2] Moreover, the transforms applied along each dimension can be arbitrarily chosen (and combined) among those supported by FFTW.jl, including complex-to-complex, real-to-complex and real-to-real transforms.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The generic and efficient implementation of this package is greatly enabled by the use of zero-cost abstractions in Julia. As shown in the Benchmarks section, PencilFFTs scales well to large numbers of processes, and performs similarly to the Fortran implementation of P3DFFT, possibly the most popular library for computing parallel FFTs using 2D domain decomposition.","category":"page"},{"location":"#Features","page":"Home","title":"Features","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"distributed N-dimensional FFTs of MPI-distributed Julia arrays, using the PencilArrays package;\nFFTs and related transforms (e.g. DCTs / Chebyshev transforms) may be arbitrarily combined along different dimensions;\nin-place and out-of-place transforms;\nhigh scalability up to (at least) tens of thousands of MPI processes.","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"PencilFFTs can be installed using the Julia package manager:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> ] add PencilFFTs","category":"page"},{"location":"#Similar-projects","page":"Home","title":"Similar projects","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"FFTW3 implements distributed-memory transforms using MPI, but these are limited to 1D decompositions. Also, this functionality is not currently included in the FFTW.jl wrappers.\nPFFT is a very general parallel FFT library written in C.\nP3DFFT implements parallel 3D FFTs using pencil decomposition in Fortran and C++.\n2DECOMP&FFT is another parallel 3D FFT library using pencil decomposition written in Fortran.","category":"page"},{"location":"","page":"Home","title":"Home","text":"[1]: Figure adapted from this PhD thesis.","category":"page"},{"location":"","page":"Home","title":"Home","text":"[2]: For the pencil decomposition represented in the figure, N = 3 and M = 2.","category":"page"},{"location":"PencilFFTs/#Distributed-FFT-plans","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"Distributed FFTs are implemented in the PencilFFTs module, and are built on top of the PencilArrays package.","category":"page"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"PencilFFTs/#Creating-plans","page":"Distributed FFT plans","title":"Creating plans","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"PencilFFTPlan","category":"page"},{"location":"PencilFFTs/#PencilFFTs.PencilFFTPlan","page":"Distributed FFT plans","title":"PencilFFTs.PencilFFTPlan","text":"PencilFFTPlan{T,N} <: AbstractFFTs.Plan{T}\n\nPlan for N-dimensional FFT-based transform on MPI-distributed data, where input data has type T.\n\n\n\nPencilFFTPlan(p::Pencil, transforms; kwargs...)\n\nCreate a PencilFFTPlan for distributed arrays following a given Pencil configuration. See variant below for details on the specification of transforms and on possible keyword arguments.\n\n\n\nPencilFFTPlan(\n A::PencilArray, transforms;\n fftw_flags = FFTW.ESTIMATE,\n fftw_timelimit = FFTW.NO_TIMELIMIT,\n permute_dims = Val(true),\n transpose_method = Transpositions.PointToPoint(),\n timer = TimerOutput(),\n)\n\nCreate plan for N-dimensional transform on MPI-distributed PencilArrays.\n\nExtended help\n\nThis creates a PencilFFTPlan for arrays sharing the same properties as A (dimensions, MPI decomposition, memory layout, ...), which describe data on an N-dimensional domain.\n\nTransforms\n\nThe transforms to be applied along each dimension are specified by the transforms argument. Possible transforms are defined as subtypes of Transforms.AbstractTransform, and are listed in Transform types. This argument may be either:\n\na tuple of N transforms to be applied along each dimension. For instance, transforms = (Transforms.R2R(FFTW.REDFT01), Transforms.RFFT(), Transforms.FFT());\na single transform to be applied along all dimensions. The input is automatically expanded into N equivalent transforms. For instance, for a three-dimensional array, transforms = Transforms.RFFT() specifies a 3D real-to-complex transform, and is equivalent to passing (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT()).\n\nNote that forward transforms are applied from left to right. In the last example, this means that a real-to-complex transform (RFFT) is first performed along the first dimension. This is followed by complex-to-complex transforms (FFT) along the second and third dimensions.\n\nInput data layout\n\nThe input PencilArray must satisfy the following constraints:\n\narray dimensions must not be permuted. This is the default when constructing PencilArrays.\nfor an M-dimensional domain decomposition (with M < N), the input array must be decomposed along the last M dimensions. For example, for a 2D decomposition of 3D data, the decomposed dimensions must be (2, 3). In particular, the first array dimension must not be distributed among different MPI processes.\nIn the PencilArrays package, the decomposed dimensions are specified at the moment of constructing a Pencil.\nthe element type must be compatible with the specified transform. For instance, real-to-complex transforms (Transforms.RFFT) require the input to be real floating point values. Other transforms, such as Transforms.R2R, accept both real and complex data.\n\nKeyword arguments\n\nThe keyword arguments fftw_flags and fftw_timelimit are passed to the FFTW plan creation functions (see AbstractFFTs docs).\npermute_dims determines whether the indices of the output data should be reversed. For instance, if the input data has global dimensions (Nx, Ny, Nz), then the output of a complex-to-complex FFT would have dimensions (Nz, Ny, Nx). This enables FFTs to always be performed along the first (i.e. fastest) array dimension, which could lead to performance gains. This option is enabled by default. For type inference reasons, it must be a value type (Val(true) or Val(false)).\ntranspose_method allows to select between implementations of the global data transpositions. See PencilArrays docs docs for details.\ntimer should be a TimerOutput object. See Measuring performance for details.\n\n\n\nPencilFFTPlan(\n dims_global::Dims{N}, transforms, proc_dims::Dims{M}, comm::MPI.Comm,\n [real_type = Float64]; extra_dims = (), kws...\n)\n\nCreate plan for N-dimensional transform.\n\nExtended help\n\nInstead of taking a PencilArray or a Pencil, this constructor requires the global dimensions of the input data, passed via the size_global argument.\n\nThe data is distributed over the MPI processes in the comm communicator. The distribution is performed over M dimensions (with M < N) according to the values in proc_dims, which specifies the number of MPI processes to put along each dimension.\n\nPencilArrays that may be transformed with the returned plan can be created using allocate_input.\n\nOptional arguments\n\nThe floating point precision can be selected by setting real_type parameter, which is Float64 by default.\nextra_dims may be used to specify the sizes of one or more extra dimensions that should not be transformed. These dimensions will be added to the rightmost (i.e. slowest) indices of the arrays. See Extra dimensions below for usage hints.\nsee the other constructor for more keyword arguments.\n\nExtra dimensions\n\nOne possible application of extra_dims is for describing the components of a vector or tensor field. However, this means that different PencilFFTPlans would need to be created for each kind of field (scalar, vector, ...). To avoid the creation of multiple plans, a possibly better alternative is to create tuples (or arrays) of PencilArrays using allocate_input and allocate_output.\n\nAnother more legitimate usage of extra_dims is to specify one or more Cartesian dimensions that should not be transformed nor split among MPI processes.\n\nExample\n\nSuppose we want to perform a 3D FFT of real data. The data is to be decomposed along two dimensions, over 8 MPI processes:\n\nsize_global = (64, 32, 128) # size of real input data\n\n# Perform real-to-complex transform along the first dimension, then\n# complex-to-complex transforms along the other dimensions.\ntransforms = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT())\n# transforms = Transforms.RFFT() # this is equivalent to the above line\n\nproc_dims = (4, 2) # 2D decomposition\ncomm = MPI.COMM_WORLD\n\nplan = PencilFFTPlan(size_global, transforms, proc_dims, comm)\n\n\n\n\n\n","category":"type"},{"location":"PencilFFTs/#Allocating-data","page":"Distributed FFT plans","title":"Allocating data","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"allocate_input\nallocate_output","category":"page"},{"location":"PencilFFTs/#PencilFFTs.allocate_input","page":"Distributed FFT plans","title":"PencilFFTs.allocate_input","text":"allocate_input(p::PencilFFTPlan) -> PencilArray\nallocate_input(p::PencilFFTPlan, dims...) -> Array{PencilArray}\nallocate_input(p::PencilFFTPlan, Val(N)) -> NTuple{N, PencilArray}\n\nAllocate uninitialised PencilArray that can hold input data for the given plan.\n\nThe second and third forms respectively allocate an array of PencilArrays of size dims, and a tuple of N PencilArrays.\n\nnote: In-place plans\nIf p is an in-place plan, a ManyPencilArray is allocated. This type holds PencilArray wrappers for the input and output transforms (as well as for intermediate transforms) which share the same space in memory. The input and output PencilArrays should be respectively accessed by calling first(::ManyPencilArray) and last(::ManyPencilArray).ExampleSuppose p is an in-place PencilFFTPlan. Then,@assert is_inplace(p)\nA = allocate_input(p) :: ManyPencilArray\nv_in = first(A) :: PencilArray # input data view\nv_out = last(A) :: PencilArray # output data viewAlso note that in-place plans must be performed directly on the returned ManyPencilArray, and not on the contained PencilArray views:p * A # perform forward transform in-place\np \\ A # perform backward transform in-place\n# p * v_in # not allowed!!\n\n\n\n\n\n","category":"function"},{"location":"PencilFFTs/#PencilFFTs.allocate_output","page":"Distributed FFT plans","title":"PencilFFTs.allocate_output","text":"allocate_output(p::PencilFFTPlan) -> PencilArray\nallocate_output(p::PencilFFTPlan, dims...) -> Array{PencilArray}\nallocate_output(p::PencilFFTPlan, Val(N)) -> NTuple{N, PencilArray}\n\nAllocate uninitialised PencilArray that can hold output data for the given plan.\n\nIf p is an in-place plan, a ManyPencilArray is allocated.\n\nSee allocate_input for details.\n\n\n\n\n\n","category":"function"},{"location":"PencilFFTs/#Methods","page":"Distributed FFT plans","title":"Methods","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"get_comm(::PencilFFTPlan)\nscale_factor(::PencilFFTPlan)\ntimer(::PencilFFTPlan)\nis_inplace(::PencilFFTPlan)","category":"page"},{"location":"PencilFFTs/#PencilArrays.Pencils.MPITopologies.get_comm-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilArrays.Pencils.MPITopologies.get_comm","text":"get_comm(p::PencilFFTPlan)\n\nGet MPI communicator associated to a PencilFFTPlan.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilFFTs.Transforms.scale_factor-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilFFTs.Transforms.scale_factor","text":"scale_factor(p::PencilFFTPlan)\n\nGet scale factor associated to a PencilFFTPlan.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilArrays.Pencils.timer-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilArrays.Pencils.timer","text":"timer(p::PencilFFTPlan)\n\nGet TimerOutput attached to a PencilFFTPlan.\n\nSee Measuring performance for details.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilFFTs.Transforms.is_inplace-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilFFTs.Transforms.is_inplace","text":"Transforms.is_inplace(p::PencilFFTPlan)\n\nReturns true if the given plan operates in-place on the input data, false otherwise.\n\n\n\n\n\n","category":"method"}]
-}
diff --git a/previews/PR40/siteinfo.js b/previews/PR40/siteinfo.js
deleted file mode 100644
index 46530e3e..00000000
--- a/previews/PR40/siteinfo.js
+++ /dev/null
@@ -1 +0,0 @@
-var DOCUMENTER_CURRENT_VERSION = "previews/PR40";
diff --git a/previews/PR48/GlobalFFTParams/index.html b/previews/PR48/GlobalFFTParams/index.html
deleted file mode 100644
index 0b9d70ac..00000000
--- a/previews/PR48/GlobalFFTParams/index.html
+++ /dev/null
@@ -1,9 +0,0 @@
-
-Global FFT parameters · PencilFFTs.jl
Specifies the global parameters for an N-dimensional distributed transform. These include the element type T and global data sizes of input and output data, as well as the transform types to be performed along each dimension.
transforms must be a tuple of length N specifying the transforms to be applied along each dimension. Each element must be a subtype of Transforms.AbstractTransform. For all the possible transforms, see Transform types.
The element type must be a real type accepted by FFTW, i.e. either Float32 or Float64.
Note that the transforms are applied one dimension at a time, with the leftmost dimension first for forward transforms.
Example
To perform a 3D FFT of real data, first a real-to-complex FFT must be applied along the first dimension, followed by two complex-to-complex FFTs along the other dimensions:
It is possible to measure the time spent in different sections of the distributed transforms using the TimerOutputs package. This has a (very small) performance overhead, so it is disabled by default. To enable time measurements, call TimerOutputs.enable_debug_timings after loading PencilFFTs (see below for an example). For more details see the TimerOutputs docs.
Minimal example:
using MPI
-using PencilFFTs
-using TimerOutputs
-
-# Enable timing of `PencilFFTs` functions
-TimerOutputs.enable_debug_timings(PencilFFTs)
-TimerOutputs.enable_debug_timings(PencilArrays)
-TimerOutputs.enable_debug_timings(Transpositions)
-
-MPI.Init()
-
-plan = PencilFFTPlan(#= args... =#)
-
-# [do stuff with `plan`...]
-
-# Retrieve and print timing data associated to `plan`
-to = timer(plan)
-print_timer(to)
By default, each PencilFFTPlan has its own TimerOutput. If you already have a TimerOutput, you can pass it to the PencilFFTPlan constructor:
to = TimerOutput()
-plan = PencilFFTPlan(..., timer=to)
-
-# [do stuff with `plan`...]
-
-print_timer(to)
Settings
This document was generated with Documenter.jl version 0.27.19 on Tuesday 21 June 2022. Using Julia version 1.7.3.
Like AbstractFFTs.bfft, this transform is not normalised. To obtain the inverse transform, divide the output by the length of the transformed dimension.
To obtain the inverse transform, divide the output by the length of the transformed dimension (of the real output array).
As described in the AbstractFFTs docs, the length of the output cannot be fully inferred from the input length. For this reason, the BRFFT constructor accepts an optional d argument indicating the output length.
For multidimensional datasets, a tuple of dimensions (d1, d2, ..., dN) may also be passed. This is equivalent to passing just dN. In this case, the last dimension (dN) is the one that changes size between the input and output. Note that this is the opposite of FFTW.brfft. The reason is that, in PencilFFTs, the last dimension is the one along which a complex-to-real transform is performed.
plan(transform::AbstractTransform, A, [dims];
- flags=FFTW.ESTIMATE, timelimit=Inf)
Create plan to transform array A along dimensions dims.
If dims is not specified, all dimensions of A are transformed.
For FFT plans, this function wraps the AbstractFFTs.jl and FFTW.jl plan creation functions. For more details on the function arguments, see AbstractFFTs.plan_fft.
Returns the backwards transform associated to the given transform.
The second argument must be the length of the first transformed dimension in the forward transform. It is used in particular when transform = RFFT(), to determine the length of the inverse (complex-to-real) transform. See the AbstractFFTs.irfft docs for details.
The backwards transform returned by this function is not normalised. The normalisation factor for a given array can be obtained by calling scale_factor.
Returns the length of the transform output, given the length of its input.
The input and output lengths are specified in terms of the respective input and output datatypes. For instance, for real-to-complex transforms, these are respectively the length of input real data and of output complex data.
The performance of PencilFFTs.jl is comparable to that of other open-source parallel FFT libraries implemented in lower-level languages. Below, we show comparisons with the Fortran implementation of P3DFFT, possibly the most popular of these libraries. The benchmarks were performed on the Jean–Zay cluster of the IDRIS French computing centre (CNRS).
The figure below shows strong scaling benchmarks of 3D real-to-complex FFTs using 2D ("pencil") decomposition. The benchmarks were run for input arrays of dimensions $N_x × N_y × N_z = 512^3$, $1024^3$ and $2048^3$. Each timing is averaged over 100 repetitions.
-
-
-
As seen above, PencilFFTs generally outperforms P3DFFT in its default setting. This is largely explained by the choice of using non-blocking point-to-point MPI communications (via MPI_Isend and MPI_Irecv), while P3DFFT uses collective MPI_Alltoallv calls. This enables PencilFFTs to perform data reordering operations on the partially received data while waiting for the incoming data, leading to better performance. Moreover, in contrast with P3DFFT, the high performance and scalability of PencilFFTs results from a highly generic code, handling decompositions in arbitrary dimensions and a relatively large (and extensible) variety of transformations.
Note that PencilFFTs can optionally use collective communications (using MPI_Alltoallv) instead of point-to-point communications. For details, see the docs for PencilFFTPlan and for PencilArray transpositions. As seen above, collective communications generally perform worse than point-to-point ones, and runtimes are nearly indistinguishable from those of P3DFFT.
The benchmarks were performed using Julia 1.7-beta3 and Intel MPI 2019. We used PencilFFTs v0.12.5 with FFTW.jl v1.4.3 and MPI.jl v0.19.0. We used the Fortran implementation of P3DFFT, version 2.7.6, which was built with Intel 2019 compilers and linked to FFTW 3.3.8. The cluster where the benchmarks were run has Intel Cascade Lake 6248 processors with 2×20 cores per node.
The number of MPI processes along each decomposed dimension, $P_1$ and $P_2$, was automatically determined by a call to MPI_Dims_create, which tends to create a balanced decomposition with $P_1 ≈ P_2$. For instance, a total of 1024 processes is divided into $P_1 = P_2 = 32$. Different results may be obtained with other combinations, but this was not benchmarked.
The source files used to generate this benchmark, as well as the raw benchmark results, are all available in the PencilFFTs repo.
Settings
This document was generated with Documenter.jl version 0.27.19 on Tuesday 21 June 2022. Using Julia version 1.7.3.
Complex-to-complex and real-to-real transforms can be performed in-place, enabling important memory savings. The procedure is very similar to that of out-of-place transforms described in the tutorial. The differences are illustrated in the sections below.
We start by partitioning a domain of dimensions $16×32×64$ along all available MPI processes.
using PencilFFTs
-using MPI
-MPI.Init()
-
-dims_global = (16, 32, 64) # global dimensions
(16, 32, 64)
Such a partitioning is described by a Pencil object. Here we choose to decompose the domain along the last two dimensions. In this case, the actual number of processes along each of these dimensions is chosen automatically.
Decomposition of 3D data
- Data dimensions: (16, 32, 64)
- Decomposed dimensions: (2, 3)
- Data permutation: NoPermutation()
- Array type: Array
Allowed decompositions
Distributed transforms using PencilFFTs.jl require that the first dimension is not decomposed. In other words, if one wants to perform transforms, then decomp_dims above must not contain 1.
# Perform a 3D in-place complex-to-complex FFT.
-transform = Transforms.FFT!()
-
-# Note that one can also combine different types of in-place transforms.
-# For instance:
-# transform = (
-# Transforms.R2R!(FFTW.REDFT01),
-# Transforms.FFT!(),
-# Transforms.R2R!(FFTW.DHT),
-# )
FFT!
We can now create a distributed plan from the previously-created domain partition and the chosen transform.
As with out-of-place plans, data should be allocated using allocate_input. The difference is that, for in-place plans, this function returns a ManyPencilArray object, which is a container holding multiple PencilArray views sharing the same memory space.
# Allocate data for the plan.
-# Since `plan` is in-place, this returns a `ManyPencilArray` container.
-A = allocate_input(plan)
-summary(A)
Note that allocate_output also works for in-place plans. In this case, it returns exactly the same thing as allocate_input.
As shown in the next section, in-place plans must be applied on the returned ManyPencilArray. On the other hand, one usually wants to access and modify data, and for this one needs the PencilArray views contained in the ManyPencilArray. The input and output array views can be obtained by calling first(::ManyPencilArray) and last(::ManyPencilArray).
For instance, we can initialise the input array with some data before transforming:
using Random
-u_in = first(A) # input data view
-randn!(u_in)
-summary(u_in)
Like in FFTW.jl, one can perform in-place transforms using the * and \ operators. As mentioned above, in-place plans must be applied on the ManyPencilArray containers returned by allocate_input.
plan * A; # performs in-place forward transform
After performing an in-place transform, data contained in u_in has been overwritten and has no "physical" meaning. In other words, u_in should not be used at this point. To access the transformed data, one should retrieve the output data view using last(A).
For instance, to compute the global sum of the transformed data:
u_out = last(A) # output data view
-sum(u_out) # sum of transformed data (note that `sum` reduces over all processes)
5342.662262046821 + 49991.35283143533im
Finally, we can perform a backward transform and do stuff with the input view:
plan \ A; # perform in-place backward transform
At this point, the data can be once again found in the input view u_in, while u_out should not be accessed.
where $\bm{v}(\bm{x}, t)$ and $p(\bm{x}, t)$ are respectively the velocity and pressure fields, $ν$ is the fluid kinematic viscosity and $ρ$ is the fluid density.
We solve the above equations a 3D periodic domain using a standard Fourier pseudo-spectral method.
Let's check the number of MPI processes over which we're running our simulation:
MPI.Comm_size(comm)
2
We can now create a partitioning of the domain based on the number of grid points (Ns) and on the number of MPI processes. There are different ways to do this. For simplicity, here we do it automatically following the PencilArrays.jl docs:
pen = Pencil(Ns, comm)
Decomposition of 3D data
- Data dimensions: (64, 64, 64)
- Decomposed dimensions: (2, 3)
- Data permutation: NoPermutation()
- Array type: Array
The subdomain associated to the local MPI process can be obtained using range_local:
range_local(pen)
(1:64, 1:32, 1:64)
We now construct a distributed vector field that follows the decomposition configuration we just created:
where $u₀$ and $k₀$ are two parameters setting the amplitude and the period of the velocity field.
To set the initial condition, each MPI process needs to know which portion of the physical grid it has been attributed. For this, PencilArrays.jl includes a localgrid helper function:
where $\bm{k} = (k_x, k_y, k_z)$ are the discrete wave numbers.
The wave numbers can be obtained using the fftfreq function. Since we perform a real-to-complex transform along the first dimension, we use rfftfreq instead for $k_x$:
Note that, in Fourier space, the domain decomposition is performed along the directions $x$ and $y$:
pencil(v̂s[1])
Decomposition of 3D data
- Data dimensions: (33, 64, 64)
- Decomposed dimensions: (1, 2)
- Data permutation: Permutation(3, 2, 1)
- Array type: Array
This is because the 3D FFTs are performed one dimension at a time, with the $x$ direction first and the $z$ direction last. To efficiently perform an FFT along a given direction (taking advantage of serial FFT implementations like FFTW), all the data along that direction must be contained locally within a single MPI process. For that reason, data redistributions (or transpositions) among MPI processes are performed behind the scenes during each FFT computation. Such transpositions require important communications between MPI processes, and are usually the most time-consuming aspect of massively-parallel simulations using this kind of methods.
To solve the Navier–Stokes equations in Fourier space, we will also need the respective wave numbers $\bm{k}$ associated to the local MPI process. Similarly to the local grid points, these are obtained using the localgrid function:
As an example, let's first use this to compute and plot the vorticity associated to the initial condition. The vorticity is defined as the curl of the velocity, $\bm{ω} = \bm{∇} × \bm{v}$. In Fourier space, this becomes $\hat{\bm{ω}} = i \bm{k} × \hat{\bm{v}}$.
using StaticArrays: SVector
-using LinearAlgebra: ×
-
-function curl_fourier!(
- ω̂s::NTuple{N, <:PencilArray}, v̂s::NTuple{N, <:PencilArray}, grid_fourier,
- ) where {N}
- @inbounds for I ∈ eachindex(grid_fourier)
- # We use StaticArrays for the cross product between small vectors.
- ik⃗ = im * SVector(grid_fourier[I])
- v⃗ = SVector(getindex.(v̂s, Ref(I))) # = (v̂s[1][I], v̂s[2][I], ...)
- ω⃗ = ik⃗ × v⃗
- for n ∈ eachindex(ω⃗)
- ω̂s[n][I] = ω⃗[n]
- end
- end
- ω̂s
-end
-
-ω̂s = similar.(v̂s)
-curl_fourier!(ω̂s, v̂s, grid_fourier);
We finally transform back to physical space and plot the result:
where $\mathcal{P}_{\bm{k}}$ is a projection operator allowing to preserve the incompressibility condition $\bm{∇} ⋅ \bm{v} = 0$. This operator encodes the action of the pressure gradient term, which serves precisely to enforce incompressibility. Note that, because of this, the pressure gradient dissapears from the equations.
Now that we have the wave numbers $\bm{k}$, computing the linear viscous term in Fourier space is straighforward once we have the Fourier coefficients $\hat{\bm{v}}_{\bm{k}}$ of the velocity field. What is slightly more challenging (and much more costly) is the computation of the non-linear term in Fourier space, $\hat{\bm{F}}_{\bm{k}} = \left[ \widehat{(\bm{v} ⋅ \bm{∇}) \bm{v}} \right]_{\bm{k}}$. In the pseudo-spectral method, the quadratic nonlinearity is computed by collocation in physical space (i.e. this term is evaluated at grid points), while derivatives are computed in Fourier space. This requires transforming fields back and forth between both spaces.
Below we implement a function that computes the non-linear term in Fourier space based on its convective form $(\bm{v} ⋅ \bm{∇}) \bm{v} = \bm{∇} ⋅ (\bm{v} ⊗ \bm{v})$. Note that this equivalence uses the incompressibility condition $\bm{∇} ⋅ \bm{v} = 0$.
using LinearAlgebra: mul!, ldiv! # for applying FFT plans in-place
-
-# Compute non-linear term in Fourier space from velocity field in physical
-# space. Optional keyword arguments may be passed to avoid allocations.
-function ns_nonlinear!(
- F̂s, vs, plan, grid_fourier;
- vbuf = similar(vs[1]), v̂buf = similar(F̂s[1]),
- )
- # Compute F_i = ∂_j (v_i v_j) for each i.
- # In Fourier space: F̂_i = im * k_j * FFT(v_i * v_j)
- w, ŵ = vbuf, v̂buf
- @inbounds for (i, F̂i) ∈ enumerate(F̂s)
- F̂i .= 0
- vi = vs[i]
- for (j, vj) ∈ enumerate(vs)
- w .= vi .* vj # w = v_i * v_j in physical space
- mul!(ŵ, plan, w) # same in Fourier space
- # Add derivative in Fourier space
- for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I] # = (kx, ky, kz)
- kj = k⃗[j]
- F̂i[I] += im * kj * ŵ[I]
- end
- end
- end
- F̂s
-end
ns_nonlinear! (generic function with 1 method)
As an example, let's use this function on our initial velocity field:
Strictly speaking, computing the non-linear term by collocation can lead to aliasing errors, as the quadratic term excites Fourier modes that fall beyond the range of resolved wave numbers. The typical solution is to apply Orzsag's 2/3 rule to zero-out the Fourier coefficients associated to the highest wave numbers. We define a function that applies this procedure below.
function dealias_twothirds!(ŵs::Tuple, grid_fourier, ks_global)
- ks_max = maximum.(abs, ks_global) # maximum stored wave numbers (kx_max, ky_max, kz_max)
- ks_lim = (2 / 3) .* ks_max
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I]
- if any(abs.(k⃗) .> ks_lim)
- for ŵ ∈ ŵs
- ŵ[I] = 0
- end
- end
- end
- ŵs
-end
-
-# We can apply this on the previously computed non-linear term:
-dealias_twothirds!(F̂s, grid_fourier, ks_global);
Finally, we implement the projection associated to the incompressibility condition:
function project_divergence_free!(ûs, grid_fourier)
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I]
- k² = sum(abs2, k⃗)
- iszero(k²) && continue # avoid division by zero
- û = getindex.(ûs, Ref(I)) # (ûs[1][I], ûs[2][I], ...)
- for i ∈ eachindex(û)
- ŵ = û[i]
- for j ∈ eachindex(û)
- ŵ -= k⃗[i] * k⃗[j] * û[j] / k²
- end
- ûs[i][I] = ŵ
- end
- end
- ûs
-end
project_divergence_free! (generic function with 1 method)
We can verify the correctness of the projection operator by checking that the initial velocity field is not modified by it, since it is already incompressible:
v̂s_proj = project_divergence_free!(copy.(v̂s), grid_fourier)
-v̂s_proj .≈ v̂s # the last one may be false because v_z = 0 initially
To perform the time integration of the Navier–Stokes equations, we will use the timestepping routines implemented in the DifferentialEquations.jl suite. For simplicity, we use here an explicit Runge–Kutta scheme. In this case, we just need to write a function that computes the right-hand side of the Navier–Stokes equations in Fourier space:
function ns_rhs!(
- dvs::NTuple{N, <:PencilArray}, vs::NTuple{N, <:PencilArray}, p, t,
- ) where {N}
- # 1. Compute non-linear term and dealias it
- (; plan, cache, ks_global, grid_fourier) = p
- F̂s = cache.F̂s
- ns_nonlinear!(F̂s, vs, plan, grid_fourier; vbuf = dvs[1], v̂buf = cache.v̂s[1])
- dealias_twothirds!(F̂s, grid_fourier, ks_global)
-
- # 2. Project onto divergence-free space
- project_divergence_free!(F̂s, grid_fourier)
-
- # 3. Transform velocity to Fourier space
- v̂s = cache.v̂s
- map((v, v̂) -> mul!(v̂, plan, v), vs, v̂s)
-
- # 4. Add viscous term (and multiply projected non-linear term by -1)
- ν = p.ν
- for n ∈ eachindex(v̂s)
- v̂ = v̂s[n]
- F̂ = F̂s[n]
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I] # = (kx, ky, kz)
- k² = sum(abs2, k⃗)
- F̂[I] = -F̂[I] - ν * k² * v̂[I]
- end
- end
-
- # 5. Transform RHS back to physical space
- map((dv, dv̂) -> ldiv!(dv, plan, dv̂), dvs, F̂s)
-
- nothing
-end
ns_rhs! (generic function with 1 method)
For the time-stepping, we load OrdinaryDiffEq.jl from the DifferentialEquations.jl suite and set-up the simulation. Since DifferentialEquations.jl can't directly deal with tuples of arrays, we convert the input data to the ArrayPartition type and write an interface function to make things work with our functions defined above.
We finally solve the problem over time and plot the vorticity associated to the solution. It is also useful to look at the energy spectrum $E(k)$, to see if the small scales are correctly resolved. To obtain a turbulent flow, the viscosity $ν$ must be small enough to allow the transient appearance of an energy cascade towards the small scales (i.e. from small to large $k$), while high enough to allow the small-scale motions to be correctly resolved.
This package provides multidimensional FFTs and related transforms on MPI-distributed Julia arrays via the PencilArrays package.
The name of this package originates from the decomposition of 3D domains along two out of three dimensions, sometimes called pencil decomposition. This is illustrated by the figure below,[1] where each coloured block is managed by a different MPI process. Typically, one wants to compute FFTs on a scalar or vector field along the three spatial dimensions. In the case of a pencil decomposition, 3D FFTs are performed one dimension at a time, along the non-decomposed direction. Transforms must then be interleaved with global data transpositions to switch between pencil configurations. In high-performance computing environments, such data transpositions are generally the most expensive part of a parallel FFT computation, due to the large cost of communications between computing nodes.
-
-
More generally, PencilFFTs allows to decompose and perform FFTs on geometries of arbitrary dimension $N$. The decompositions can be performed along an arbitrary number $M < N$ of dimensions.[2] Moreover, the transforms applied along each dimension can be arbitrarily chosen (and combined) among those supported by FFTW.jl, including complex-to-complex, real-to-complex and real-to-real transforms.
The generic and efficient implementation of this package is greatly enabled by the use of zero-cost abstractions in Julia. As shown in the Benchmarks section, PencilFFTs scales well to large numbers of processes, and performs similarly to the Fortran implementation of P3DFFT, possibly the most popular library for computing parallel FFTs using 2D domain decomposition.
FFTW3 implements distributed-memory transforms using MPI, but these are limited to 1D decompositions. Also, this functionality is not currently included in the FFTW.jl wrappers.
PFFT is a very general parallel FFT library written in C.
P3DFFT implements parallel 3D FFTs using pencil decomposition in Fortran and C++.
2DECOMP&FFT is another parallel 3D FFT library using pencil decomposition written in Fortran.
This document was generated with Documenter.jl version 0.27.19 on Tuesday 21 June 2022. Using Julia version 1.7.3.
diff --git a/previews/PR48/search_index.js b/previews/PR48/search_index.js
deleted file mode 100644
index ebd1df56..00000000
--- a/previews/PR48/search_index.js
+++ /dev/null
@@ -1,3 +0,0 @@
-var documenterSearchIndex = {"docs":
-[{"location":"tutorial/#Tutorial","page":"Tutorial","title":"Tutorial","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The following tutorial shows how to perform a 3D FFT of real periodic data defined on a grid of N_x N_y N_z points.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"
\n \n \n
","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"By default, the domain is distributed on a 2D MPI topology of dimensions N_1 N_2. As an example, the above figure shows such a topology with N_1 = 4 and N_2 = 3, for a total of 12 MPI processes.","category":"page"},{"location":"tutorial/#tutorial:creating_plans","page":"Tutorial","title":"Creating plans","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The first thing to do is to create a domain decomposition configuration for the given dataset dimensions N_x N_y N_z. In the framework of PencilArrays, such a configuration is described by a Pencil object. As described in the PencilArrays docs, we can let the Pencil constructor automatically determine such a configuration. For this, only an MPI communicator and the dataset dimensions are needed:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"using MPI\nusing PencilFFTs\n\nMPI.Init()\ncomm = MPI.COMM_WORLD\n\n# Input data dimensions (Nx × Ny × Nz)\ndims = (16, 32, 64)\npen = Pencil(dims, comm)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"By default this creates a 2D decomposition (for the case of a 3D dataset), but one can change this as detailed in the PencilArrays documentation linked above.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"We can now create a PencilFFTPlan, which requires information on decomposition configuration (the Pencil object) and on the transforms that will be applied:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# Apply a 3D real-to-complex (r2c) FFT.\ntransform = Transforms.RFFT()\n\n# Note that, for more control, one can instead separately specify the transforms along each dimension:\n# transform = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT())\n\n# Create plan\nplan = PencilFFTPlan(pen, transform)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"See the PencilFFTPlan constructor for details on the accepted options, and the Transforms module for the possible transforms. It is also possible to enable fine-grained performance measurements via the TimerOutputs package, as described in Measuring performance.","category":"page"},{"location":"tutorial/#Allocating-data","page":"Tutorial","title":"Allocating data","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Next, we want to apply the plan on some data. Transforms may only be applied on PencilArrays, which are array wrappers that include MPI decomposition information (in some sense, analogous to DistributedArrays in Julia's distributed computing approach). The helper function allocate_input can be used to allocate a PencilArray that is compatible with our plan:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# In our example, this returns a 3D PencilArray of real data (Float64).\nu = allocate_input(plan)\n\n# Fill the array with some (random) data\nusing Random\nrandn!(u)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"PencilArrays are a subtype of AbstractArray, and thus they support all common array operations.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Similarly, to preallocate output data, one can use allocate_output:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# In our example, this returns a 3D PencilArray of complex data (Complex{Float64}).\nv = allocate_output(plan)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"This is only required if one wants to apply the plans using a preallocated output (with mul!, see right below).","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The data types returned by allocate_input and allocate_output are slightly different when working with in-place transforms. See the in-place example for details.","category":"page"},{"location":"tutorial/#Applying-plans","page":"Tutorial","title":"Applying plans","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The interface to apply plans is consistent with that of AbstractFFTs. Namely, * and mul! are respectively used for forward transforms without and with preallocated output data. Similarly, \\ and ldiv! are used for backward transforms.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"using LinearAlgebra # for mul!, ldiv!\n\n# Apply plan on `u` with `v` as an output\nmul!(v, plan, u)\n\n# Apply backward plan on `v` with `w` as an output\nw = similar(u)\nldiv!(w, plan, v) # now w ≈ u","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Note that, consistently with AbstractFFTs, normalisation is performed at the end of a backward transform, so that the original data is recovered when applying a forward followed by a backward transform.","category":"page"},{"location":"tutorial/#Accessing-and-modifying-data","page":"Tutorial","title":"Accessing and modifying data","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For any given MPI process, a PencilArray holds the data associated to its local partition in the global geometry. PencilArrays are accessed using local indices that start at 1, regardless of the location of the local process in the MPI topology. Note that PencilArrays, being based on regular Arrays, support both linear and Cartesian indexing (see the Julia docs for details).","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For convenience, the global_view function can be used to generate an OffsetArray wrapper that takes global indices.","category":"page"},{"location":"tutorial/#tutorial:output_data_layout","page":"Tutorial","title":"Output data layout","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"In memory, the dimensions of the transform output are by default reversed with respect to the input. That is, if the order of indices in the input data is (x, y, z), then the output has order (z, y, x) in memory. This detail is hidden from the user, and output arrays are always accessed in the same order as the input data, regardless of the underlying output dimension permutation. This applies to PencilArrays and to OffsetArrays returned by global_view.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The reasoning behind dimension permutations, is that they allow to always perform FFTs along the fastest array dimension and to avoid a local data transposition, resulting in performance gains. A similar approach is followed by other parallel FFT libraries. FFTW itself, in its distributed-memory routines, includes a flag that enables a similar behaviour. In PencilFFTs, index permutation is the default, but it can be disabled via the permute_dims flag of PencilFFTPlan.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"A great deal of work has been spent in making generic index permutations as efficient as possible, both in intermediate and in the output state of the multidimensional transforms. This has been achieved, in part, by making sure that permutations such as (3, 2, 1) are compile-time constants.","category":"page"},{"location":"tutorial/#Further-reading","page":"Tutorial","title":"Further reading","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For details on working with PencilArrays see the PencilArrays docs.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The examples on the sidebar further illustrate the use of transforms and provide an introduction to working with MPI-distributed data in the form of PencilArrays. In particular, the gradient example illustrates different ways of computing things using Fourier-transformed distributed arrays. Then, the incompressible Navier–Stokes example is a more advanced and complete example of a possible application of the PencilFFTs package.","category":"page"},{"location":"benchmarks/#Benchmarks","page":"Benchmarks","title":"Benchmarks","text":"","category":"section"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The performance of PencilFFTs.jl is comparable to that of other open-source parallel FFT libraries implemented in lower-level languages. Below, we show comparisons with the Fortran implementation of P3DFFT, possibly the most popular of these libraries. The benchmarks were performed on the Jean–Zay cluster of the IDRIS French computing centre (CNRS).","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The figure below shows strong scaling benchmarks of 3D real-to-complex FFTs using 2D (\"pencil\") decomposition. The benchmarks were run for input arrays of dimensions N_x N_y N_z = 512^3, 1024^3 and 2048^3. Each timing is averaged over 100 repetitions.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"
\n \n \n
","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"As seen above, PencilFFTs generally outperforms P3DFFT in its default setting. This is largely explained by the choice of using non-blocking point-to-point MPI communications (via MPI_Isend and MPI_Irecv), while P3DFFT uses collective MPI_Alltoallv calls. This enables PencilFFTs to perform data reordering operations on the partially received data while waiting for the incoming data, leading to better performance. Moreover, in contrast with P3DFFT, the high performance and scalability of PencilFFTs results from a highly generic code, handling decompositions in arbitrary dimensions and a relatively large (and extensible) variety of transformations.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"Note that PencilFFTs can optionally use collective communications (using MPI_Alltoallv) instead of point-to-point communications. For details, see the docs for PencilFFTPlan and for PencilArray transpositions. As seen above, collective communications generally perform worse than point-to-point ones, and runtimes are nearly indistinguishable from those of P3DFFT.","category":"page"},{"location":"benchmarks/#Benchmark-details","page":"Benchmarks","title":"Benchmark details","text":"","category":"section"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The benchmarks were performed using Julia 1.7-beta3 and Intel MPI 2019. We used PencilFFTs v0.12.5 with FFTW.jl v1.4.3 and MPI.jl v0.19.0. We used the Fortran implementation of P3DFFT, version 2.7.6, which was built with Intel 2019 compilers and linked to FFTW 3.3.8. The cluster where the benchmarks were run has Intel Cascade Lake 6248 processors with 2×20 cores per node.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The number of MPI processes along each decomposed dimension, P_1 and P_2, was automatically determined by a call to MPI_Dims_create, which tends to create a balanced decomposition with P_1 P_2. For instance, a total of 1024 processes is divided into P_1 = P_2 = 32. Different results may be obtained with other combinations, but this was not benchmarked.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The source files used to generate this benchmark, as well as the raw benchmark results, are all available in the PencilFFTs repo.","category":"page"},{"location":"GlobalFFTParams/#Global-FFT-parameters","page":"Global FFT parameters","title":"Global FFT parameters","text":"","category":"section"},{"location":"GlobalFFTParams/","page":"Global FFT parameters","title":"Global FFT parameters","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"GlobalFFTParams/","page":"Global FFT parameters","title":"Global FFT parameters","text":"GlobalFFTParams","category":"page"},{"location":"GlobalFFTParams/#PencilFFTs.GlobalFFTParams","page":"Global FFT parameters","title":"PencilFFTs.GlobalFFTParams","text":"GlobalFFTParams{T, N, inplace}\n\nSpecifies the global parameters for an N-dimensional distributed transform. These include the element type T and global data sizes of input and output data, as well as the transform types to be performed along each dimension.\n\n\n\nGlobalFFTParams(size_global, transforms, [real_type=Float64])\n\nDefine parameters for N-dimensional transform.\n\ntransforms must be a tuple of length N specifying the transforms to be applied along each dimension. Each element must be a subtype of Transforms.AbstractTransform. For all the possible transforms, see Transform types.\n\nThe element type must be a real type accepted by FFTW, i.e. either Float32 or Float64.\n\nNote that the transforms are applied one dimension at a time, with the leftmost dimension first for forward transforms.\n\nExample\n\nTo perform a 3D FFT of real data, first a real-to-complex FFT must be applied along the first dimension, followed by two complex-to-complex FFTs along the other dimensions:\n\njulia> size_global = (64, 32, 128); # size of real input data\n\njulia> transforms = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT());\n\njulia> fft_params = PencilFFTs.GlobalFFTParams(size_global, transforms)\nTransforms: (RFFT, FFT, FFT)\nInput type: Float64\nGlobal dimensions: (64, 32, 128) -> (33, 32, 128)\n\n\n\n\n\n","category":"type"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/in-place.jl\"","category":"page"},{"location":"generated/in-place/#In-place-transforms","page":"In-place transforms","title":"In-place transforms","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Complex-to-complex and real-to-real transforms can be performed in-place, enabling important memory savings. The procedure is very similar to that of out-of-place transforms described in the tutorial. The differences are illustrated in the sections below.","category":"page"},{"location":"generated/in-place/#Creating-a-domain-partition","page":"In-place transforms","title":"Creating a domain partition","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"We start by partitioning a domain of dimensions 163264 along all available MPI processes.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"using PencilFFTs\nusing MPI\nMPI.Init()\n\ndims_global = (16, 32, 64) # global dimensions","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Such a partitioning is described by a Pencil object. Here we choose to decompose the domain along the last two dimensions. In this case, the actual number of processes along each of these dimensions is chosen automatically.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"decomp_dims = (2, 3)\ncomm = MPI.COMM_WORLD\npen = Pencil(dims_global, decomp_dims, comm)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"warning: Allowed decompositions\nDistributed transforms using PencilFFTs.jl require that the first dimension is not decomposed. In other words, if one wants to perform transforms, then decomp_dims above must not contain 1.","category":"page"},{"location":"generated/in-place/#Creating-in-place-plans","page":"In-place transforms","title":"Creating in-place plans","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"To create an in-place plan, pass an in-place transform such as Transforms.FFT! or Transforms.R2R! to PencilFFTPlan. For instance:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"# Perform a 3D in-place complex-to-complex FFT.\ntransform = Transforms.FFT!()\n\n# Note that one can also combine different types of in-place transforms.\n# For instance:\n# transform = (\n# Transforms.R2R!(FFTW.REDFT01),\n# Transforms.FFT!(),\n# Transforms.R2R!(FFTW.DHT),\n# )","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"We can now create a distributed plan from the previously-created domain partition and the chosen transform.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan = PencilFFTPlan(pen, transform)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Note that in-place real-to-complex transforms are not currently supported. (In other words, the RFFT! transform type is not defined.)","category":"page"},{"location":"generated/in-place/#Allocating-data","page":"In-place transforms","title":"Allocating data","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"As with out-of-place plans, data should be allocated using allocate_input. The difference is that, for in-place plans, this function returns a ManyPencilArray object, which is a container holding multiple PencilArray views sharing the same memory space.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"# Allocate data for the plan.\n# Since `plan` is in-place, this returns a `ManyPencilArray` container.\nA = allocate_input(plan)\nsummary(A)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Note that allocate_output also works for in-place plans. In this case, it returns exactly the same thing as allocate_input.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"As shown in the next section, in-place plans must be applied on the returned ManyPencilArray. On the other hand, one usually wants to access and modify data, and for this one needs the PencilArray views contained in the ManyPencilArray. The input and output array views can be obtained by calling first(::ManyPencilArray) and last(::ManyPencilArray).","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"For instance, we can initialise the input array with some data before transforming:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"using Random\nu_in = first(A) # input data view\nrandn!(u_in)\nsummary(u_in)","category":"page"},{"location":"generated/in-place/#Applying-plans","page":"In-place transforms","title":"Applying plans","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Like in FFTW.jl, one can perform in-place transforms using the * and \\ operators. As mentioned above, in-place plans must be applied on the ManyPencilArray containers returned by allocate_input.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan * A; # performs in-place forward transform\nnothing #hide","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"After performing an in-place transform, data contained in u_in has been overwritten and has no \"physical\" meaning. In other words, u_in should not be used at this point. To access the transformed data, one should retrieve the output data view using last(A).","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"For instance, to compute the global sum of the transformed data:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"u_out = last(A) # output data view\nsum(u_out) # sum of transformed data (note that `sum` reduces over all processes)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Finally, we can perform a backward transform and do stuff with the input view:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan \\ A; # perform in-place backward transform\nnothing #hide","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"At this point, the data can be once again found in the input view u_in, while u_out should not be accessed.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"This page was generated using Literate.jl.","category":"page"},{"location":"Transforms/#Available-transforms","page":"Available transforms","title":"Available transforms","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"CurrentModule = PencilFFTs.Transforms","category":"page"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"Transforms","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms","page":"Available transforms","title":"PencilFFTs.Transforms","text":"Defines different one-dimensional FFT-based transforms.\n\nThe transforms are all subtypes of an AbstractTransform type.\n\nWhen possible, the names of the transforms are kept consistent with the functions exported by AbstractFFTs.jl and FFTW.jl.\n\n\n\n\n\n","category":"module"},{"location":"Transforms/#Transform-types","page":"Available transforms","title":"Transform types","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"FFT\nFFT!\nBFFT\nBFFT!\n\nRFFT\nBRFFT\n\nR2R\nR2R!\n\nNoTransform\nNoTransform!","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.FFT","page":"Available transforms","title":"PencilFFTs.Transforms.FFT","text":"FFT()\n\nComplex-to-complex FFT.\n\nSee also AbstractFFTs.fft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.FFT!","page":"Available transforms","title":"PencilFFTs.Transforms.FFT!","text":"FFT!()\n\nIn-place version of FFT.\n\nSee also AbstractFFTs.fft!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BFFT","page":"Available transforms","title":"PencilFFTs.Transforms.BFFT","text":"BFFT()\n\nUnnormalised backward complex-to-complex FFT.\n\nLike AbstractFFTs.bfft, this transform is not normalised. To obtain the inverse transform, divide the output by the length of the transformed dimension.\n\nSee also AbstractFFTs.bfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BFFT!","page":"Available transforms","title":"PencilFFTs.Transforms.BFFT!","text":"BFFT()\n\nIn-place version of BFFT.\n\nSee also AbstractFFTs.bfft!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.RFFT","page":"Available transforms","title":"PencilFFTs.Transforms.RFFT","text":"RFFT()\n\nReal-to-complex FFT.\n\nSee also AbstractFFTs.rfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BRFFT","page":"Available transforms","title":"PencilFFTs.Transforms.BRFFT","text":"BRFFT(d::Integer)\nBRFFT((d1, d2, ..., dN))\n\nUnnormalised inverse of RFFT.\n\nTo obtain the inverse transform, divide the output by the length of the transformed dimension (of the real output array).\n\nAs described in the AbstractFFTs docs, the length of the output cannot be fully inferred from the input length. For this reason, the BRFFT constructor accepts an optional d argument indicating the output length.\n\nFor multidimensional datasets, a tuple of dimensions (d1, d2, ..., dN) may also be passed. This is equivalent to passing just dN. In this case, the last dimension (dN) is the one that changes size between the input and output. Note that this is the opposite of FFTW.brfft. The reason is that, in PencilFFTs, the last dimension is the one along which a complex-to-real transform is performed.\n\nSee also AbstractFFTs.brfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.R2R","page":"Available transforms","title":"PencilFFTs.Transforms.R2R","text":"R2R(kind)\n\nReal-to-real transform of type kind.\n\nThe possible values of kind are those described in the FFTW.r2r docs and the FFTW manual:\n\ndiscrete cosine transforms: FFTW.REDFT00, FFTW.REDFT01, FFTW.REDFFT10, FFTW.REDFFT11\ndiscrete sine transforms: FFTW.RODFT00, FFTW.RODFT01, FFTW.RODFFT10, FFTW.RODFFT11\ndiscrete Hartley transform: FFTW.DHT\n\nNote: half-complex format DFTs (FFTW.R2HC, FFTW.HC2R) are not currently supported.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.R2R!","page":"Available transforms","title":"PencilFFTs.Transforms.R2R!","text":"R2R!(kind)\n\nIn-place version of R2R.\n\nSee also FFTW.r2r!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.NoTransform","page":"Available transforms","title":"PencilFFTs.Transforms.NoTransform","text":"NoTransform()\n\nIdentity transform.\n\nSpecifies that no transformation should be applied.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.NoTransform!","page":"Available transforms","title":"PencilFFTs.Transforms.NoTransform!","text":"NoTransform!()\n\nIn-place version of NoTransform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#Internals","page":"Available transforms","title":"Internals","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"What follows is used internally in PencilFFTs.","category":"page"},{"location":"Transforms/#Types","page":"Available transforms","title":"Types","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"AbstractCustomPlan\nAbstractTransform\nIdentityPlan\nIdentityPlan!\nPlan","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.AbstractCustomPlan","page":"Available transforms","title":"PencilFFTs.Transforms.AbstractCustomPlan","text":"AbstractCustomPlan\n\nAbstract type defining a custom plan, to be used as an alternative to FFTW plans (FFTW.FFTWPlan).\n\nThe only custom plan defined in this module is IdentityPlan. The user can define other custom plans that are also subtypes of AbstractCustomPlan.\n\nNote that plan returns a subtype of either AbstractFFTs.Plan or AbstractCustomPlan.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.AbstractTransform","page":"Available transforms","title":"PencilFFTs.Transforms.AbstractTransform","text":"AbstractTransform\n\nSpecifies a one-dimensional FFT-based transform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.IdentityPlan","page":"Available transforms","title":"PencilFFTs.Transforms.IdentityPlan","text":"IdentityPlan\n\nType of plan associated to NoTransform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.IdentityPlan!","page":"Available transforms","title":"PencilFFTs.Transforms.IdentityPlan!","text":"IdentityPlan!\n\nType of plan associated to NoTransform!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.Plan","page":"Available transforms","title":"PencilFFTs.Transforms.Plan","text":"Plan = Union{AbstractFFTs.Plan, AbstractCustomPlan}\n\nUnion type representing any plan returned by plan.\n\nSee also AbstractCustomPlan.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#Functions","page":"Available transforms","title":"Functions","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"plan\n\nbinv\nscale_factor\n\neltype_input\neltype_output\nexpand_dims\nis_inplace\nkind\nlength_output","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.plan","page":"Available transforms","title":"PencilFFTs.Transforms.plan","text":"plan(transform::AbstractTransform, A, [dims];\n flags=FFTW.ESTIMATE, timelimit=Inf)\n\nCreate plan to transform array A along dimensions dims.\n\nIf dims is not specified, all dimensions of A are transformed.\n\nFor FFT plans, this function wraps the AbstractFFTs.jl and FFTW.jl plan creation functions. For more details on the function arguments, see AbstractFFTs.plan_fft.\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.binv","page":"Available transforms","title":"PencilFFTs.Transforms.binv","text":"binv(transform::AbstractTransform, d::Integer)\n\nReturns the backwards transform associated to the given transform.\n\nThe second argument must be the length of the first transformed dimension in the forward transform. It is used in particular when transform = RFFT(), to determine the length of the inverse (complex-to-real) transform. See the AbstractFFTs.irfft docs for details.\n\nThe backwards transform returned by this function is not normalised. The normalisation factor for a given array can be obtained by calling scale_factor.\n\nExample\n\njulia> binv(Transforms.FFT(), 42)\nBFFT\n\njulia> binv(Transforms.BRFFT(9), 42)\nRFFT\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.scale_factor","page":"Available transforms","title":"PencilFFTs.Transforms.scale_factor","text":"scale_factor(transform::AbstractTransform, A, [dims = 1:ndims(A)])\n\nGet factor required to normalise the given array after a transformation along dimensions dims (all dimensions by default).\n\nThe array A must have the dimensions of the transform input.\n\nImportant: the dimensions dims must be the same that were passed to plan.\n\nExamples\n\njulia> C = zeros(ComplexF32, 3, 4, 5);\n\njulia> scale_factor(Transforms.FFT(), C)\n60\n\njulia> scale_factor(Transforms.BFFT(), C)\n60\n\njulia> scale_factor(Transforms.BFFT(), C, 2:3)\n20\n\njulia> R = zeros(Float64, 3, 4, 5);\n\njulia> scale_factor(Transforms.RFFT(), R, 2)\n4\n\njulia> scale_factor(Transforms.RFFT(), R, 2:3)\n20\n\njulia> scale_factor(Transforms.BRFFT(8), C)\n96\n\njulia> scale_factor(Transforms.BRFFT(9), C)\n108\n\nThis will fail because the input of RFFT is real, and R is a complex array:\n\njulia> scale_factor(Transforms.RFFT(), C, 2:3)\nERROR: MethodError: no method matching scale_factor(::PencilFFTs.Transforms.RFFT, ::Array{ComplexF32, 3}, ::UnitRange{Int64})\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.eltype_input","page":"Available transforms","title":"PencilFFTs.Transforms.eltype_input","text":"eltype_input(transform::AbstractTransform, real_type<:AbstractFloat)\n\nDetermine input data type for a given transform given the floating point precision of the input data.\n\nSome transforms, such as R2R and NoTransform, can take both real and complex data. For those kinds of transforms, nothing is returned.\n\nExample\n\njulia> eltype_input(Transforms.FFT(), Float32)\nComplexF32 (alias for Complex{Float32})\n\njulia> eltype_input(Transforms.RFFT(), Float64)\nFloat64\n\njulia> eltype_input(Transforms.R2R(FFTW.REDFT01), Float64) # nothing\n\njulia> eltype_input(Transforms.NoTransform(), Float64) # nothing\n\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.eltype_output","page":"Available transforms","title":"PencilFFTs.Transforms.eltype_output","text":"eltype_output(transform::AbstractTransform, eltype_input)\n\nReturns the output data type for a given transform given the input type.\n\nThrows ArgumentError if the input data type is incompatible with the transform type.\n\nExample\n\njulia> eltype_output(Transforms.NoTransform(), Float32)\nFloat32\n\njulia> eltype_output(Transforms.RFFT(), Float64)\nComplexF64 (alias for Complex{Float64})\n\njulia> eltype_output(Transforms.BRFFT(4), ComplexF32)\nFloat32\n\njulia> eltype_output(Transforms.FFT(), Float64)\nERROR: ArgumentError: invalid input data type for PencilFFTs.Transforms.FFT: Float64\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.expand_dims","page":"Available transforms","title":"PencilFFTs.Transforms.expand_dims","text":"expand_dims(transform::AbstractTransform, Val(N))\n\nExpand a single multidimensional transform into one transform per dimension.\n\nExample\n\n# Expand a real-to-complex transform in 3 dimensions.\njulia> expand_dims(Transforms.RFFT(), Val(3))\n(RFFT, FFT, FFT)\n\njulia> expand_dims(Transforms.BRFFT(4), Val(3))\n(BFFT, BFFT, BRFFT{even})\n\njulia> expand_dims(Transforms.NoTransform(), Val(2))\n(NoTransform, NoTransform)\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.is_inplace","page":"Available transforms","title":"PencilFFTs.Transforms.is_inplace","text":"is_inplace(transform::AbstractTransform) -> Bool\nis_inplace(transforms::Vararg{AbtractTransform}) -> Union{Bool, Nothing}\n\nCheck whether a transform or a list of transforms is performed in-place.\n\nIf the list of transforms has a combination of in-place and out-of-place transforms, nothing is returned.\n\nExample\n\njulia> is_inplace(Transforms.RFFT())\nfalse\n\njulia> is_inplace(Transforms.NoTransform!())\ntrue\n\njulia> is_inplace(Transforms.FFT!(), Transforms.R2R!(FFTW.REDFT01))\ntrue\n\njulia> is_inplace(Transforms.FFT(), Transforms.R2R(FFTW.REDFT01))\nfalse\n\njulia> is_inplace(Transforms.FFT(), Transforms.R2R!(FFTW.REDFT01)) === nothing\ntrue\n\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.kind","page":"Available transforms","title":"PencilFFTs.Transforms.kind","text":"kind(transform::R2R)\n\nGet kind of real-to-real transform.\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.length_output","page":"Available transforms","title":"PencilFFTs.Transforms.length_output","text":"length_output(transform::AbstractTransform, length_in::Integer)\n\nReturns the length of the transform output, given the length of its input.\n\nThe input and output lengths are specified in terms of the respective input and output datatypes. For instance, for real-to-complex transforms, these are respectively the length of input real data and of output complex data.\n\n\n\n\n\n","category":"function"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/navier_stokes.jl\"","category":"page"},{"location":"generated/navier_stokes/#Navier–Stokes-equations","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"In this example, we numerically solve the incompressible Navier–Stokes equations","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"_t bmv + (bmv bm) bmv = -frac1ρ bm p + ν ^2 bmv\nquad bm bmv = 0","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where bmv(bmx t) and p(bmx t) are respectively the velocity and pressure fields, ν is the fluid kinematic viscosity and ρ is the fluid density.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We solve the above equations a 3D periodic domain using a standard Fourier pseudo-spectral method.","category":"page"},{"location":"generated/navier_stokes/#First-steps","page":"Navier–Stokes equations","title":"First steps","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We start by loading the required packages, initialising MPI and setting the simulation parameters.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using MPI\nusing PencilFFTs\n\nMPI.Init()\ncomm = MPI.COMM_WORLD\nprocid = MPI.Comm_rank(comm) + 1\n\n# Simulation parameters\nNs = (64, 64, 64) # = (Nx, Ny, Nz)\nLs = (2π, 2π, 2π) # = (Lx, Ly, Lz)\n\n# Collocation points (\"global\" = over all processes).\n# We include the endpoint (length = N + 1) for convenience.\nxs_global = map((N, L) -> range(0, L; length = N + 1), Ns, Ls) # = (x, y, z)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's check the number of MPI processes over which we're running our simulation:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"MPI.Comm_size(comm)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can now create a partitioning of the domain based on the number of grid points (Ns) and on the number of MPI processes. There are different ways to do this. For simplicity, here we do it automatically following the PencilArrays.jl docs:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"pen = Pencil(Ns, comm)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"The subdomain associated to the local MPI process can be obtained using range_local:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"range_local(pen)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We now construct a distributed vector field that follows the decomposition configuration we just created:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v⃗₀ = (\n PencilArray{Float64}(undef, pen), # vx\n PencilArray{Float64}(undef, pen), # vy\n PencilArray{Float64}(undef, pen), # vz\n)\nsummary(v⃗₀[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We still need to fill this array with interesting values that represent a physical velocity field.","category":"page"},{"location":"generated/navier_stokes/#Initial-condition","page":"Navier–Stokes equations","title":"Initial condition","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's set the initial condition in physical space. In this example, we choose the Taylor–Green vortex configuration as an initial condition:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"beginaligned\nv_x(x y z) = u₀ sin(k₀ x) cos(k₀ y) cos(k₀ z) \nv_y(x y z) = -u₀ cos(k₀ x) sin(k₀ y) cos(k₀ z) \nv_z(x y z) = 0\nendaligned","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where u₀ and k₀ are two parameters setting the amplitude and the period of the velocity field.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To set the initial condition, each MPI process needs to know which portion of the physical grid it has been attributed. For this, PencilArrays.jl includes a localgrid helper function:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"grid = localgrid(pen, xs_global)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can use this to initialise the velocity field:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"u₀ = 1.0\nk₀ = 2π / Ls[1] # should be integer if L = 2π (to preserve periodicity)\n\n@. v⃗₀[1] = u₀ * sin(k₀ * grid.x) * cos(k₀ * grid.y) * cos(k₀ * grid.z)\n@. v⃗₀[2] = -u₀ * cos(k₀ * grid.x) * sin(k₀ * grid.y) * cos(k₀ * grid.z)\n@. v⃗₀[3] = 0\nnothing # hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's plot a 2D slice of the velocity field managed by the local MPI process:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using GLMakie\n\n# Compute the norm of a vector field represented by a tuple of arrays.\nfunction vecnorm(v⃗::NTuple)\n vnorm = similar(v⃗[1])\n for n ∈ eachindex(v⃗[1])\n w = zero(eltype(vnorm))\n for v ∈ v⃗\n w += v[n]^2\n end\n vnorm[n] = sqrt(w)\n end\n vnorm\nend\n\nlet fig = Figure(resolution = (700, 600))\n ax = Axis3(fig[1, 1]; aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\")\n vnorm = vecnorm(v⃗₀)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, vnorm;\n alpha = 0.2, levels = 4,\n colormap = :viridis, colorrange = (0.0, 1.0),\n )\n cb = Colorbar(fig[1, 2], ct; label = \"Velocity magnitude\")\n fig\nend","category":"page"},{"location":"generated/navier_stokes/#Velocity-in-Fourier-space","page":"Navier–Stokes equations","title":"Velocity in Fourier space","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"In the Fourier pseudo-spectral method, the periodic velocity field is discretised in space as a truncated Fourier series","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"bmv(bmx t) =\n_bmk hatbmv_bmk(t) e^i bmk bmx","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where bmk = (k_x k_y k_z) are the discrete wave numbers.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"The wave numbers can be obtained using the fftfreq function. Since we perform a real-to-complex transform along the first dimension, we use rfftfreq instead for k_x:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using AbstractFFTs: fftfreq, rfftfreq\n\nks_global = (\n rfftfreq(Ns[1], 2π * Ns[1] / Ls[1]), # kx | real-to-complex\n fftfreq(Ns[2], 2π * Ns[2] / Ls[2]), # ky | complex-to-complex\n fftfreq(Ns[3], 2π * Ns[3] / Ls[3]), # kz | complex-to-complex\n)\n\nks_global[1]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ks_global[2]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ks_global[3]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To transform the velocity field to Fourier space, we first create a real-to-complex FFT plan to be applied to one of the velocity components:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"plan = PencilFFTPlan(v⃗₀[1], Transforms.RFFT())","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"See PencilFFTPlan for details on creating plans and on optional keyword arguments.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can now apply this plan to the three velocity components to obtain the respective Fourier coefficients hatbmv_bmk:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v̂s = plan .* v⃗₀\nsummary(v̂s[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Note that, in Fourier space, the domain decomposition is performed along the directions x and y:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"pencil(v̂s[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"This is because the 3D FFTs are performed one dimension at a time, with the x direction first and the z direction last. To efficiently perform an FFT along a given direction (taking advantage of serial FFT implementations like FFTW), all the data along that direction must be contained locally within a single MPI process. For that reason, data redistributions (or transpositions) among MPI processes are performed behind the scenes during each FFT computation. Such transpositions require important communications between MPI processes, and are usually the most time-consuming aspect of massively-parallel simulations using this kind of methods.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To solve the Navier–Stokes equations in Fourier space, we will also need the respective wave numbers bmk associated to the local MPI process. Similarly to the local grid points, these are obtained using the localgrid function:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"grid_fourier = localgrid(v̂s[1], ks_global)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"As an example, let's first use this to compute and plot the vorticity associated to the initial condition. The vorticity is defined as the curl of the velocity, bmω = bm bmv. In Fourier space, this becomes hatbmω = i bmk hatbmv.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using StaticArrays: SVector\nusing LinearAlgebra: ×\n\nfunction curl_fourier!(\n ω̂s::NTuple{N, <:PencilArray}, v̂s::NTuple{N, <:PencilArray}, grid_fourier,\n ) where {N}\n @inbounds for I ∈ eachindex(grid_fourier)\n # We use StaticArrays for the cross product between small vectors.\n ik⃗ = im * SVector(grid_fourier[I])\n v⃗ = SVector(getindex.(v̂s, Ref(I))) # = (v̂s[1][I], v̂s[2][I], ...)\n ω⃗ = ik⃗ × v⃗\n for n ∈ eachindex(ω⃗)\n ω̂s[n][I] = ω⃗[n]\n end\n end\n ω̂s\nend\n\nω̂s = similar.(v̂s)\ncurl_fourier!(ω̂s, v̂s, grid_fourier);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We finally transform back to physical space and plot the result:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ωs = plan .\\ ω̂s\n\nlet fig = Figure(resolution = (700, 600))\n ax = Axis3(fig[1, 1]; aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\")\n ω_norm = vecnorm(ωs)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, ω_norm;\n alpha = 0.1, levels = 0.8:0.2:2.0,\n colormap = :viridis, colorrange = (0.8, 2.0),\n )\n cb = Colorbar(fig[1, 2], ct; label = \"Vorticity magnitude\")\n fig\nend","category":"page"},{"location":"generated/navier_stokes/#Computing-the-non-linear-term","page":"Navier–Stokes equations","title":"Computing the non-linear term","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"One can show that, in Fourier space, the incompressible Navier–Stokes equations can be written as","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"_t hatbmv_bmk =\n- mathcalP_bmk left widehat(bmv bm) bmv right\n- ν bmk^2 hatbmv_bmk\nquad text with quad\nmathcalP_bmk(hatbmF_bmk) = left( I - fracbmk \nbmkbmk^2 right) hatbmF_bmk","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where mathcalP_bmk is a projection operator allowing to preserve the incompressibility condition bm bmv = 0. This operator encodes the action of the pressure gradient term, which serves precisely to enforce incompressibility. Note that, because of this, the pressure gradient dissapears from the equations.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Now that we have the wave numbers bmk, computing the linear viscous term in Fourier space is straighforward once we have the Fourier coefficients hatbmv_bmk of the velocity field. What is slightly more challenging (and much more costly) is the computation of the non-linear term in Fourier space, hatbmF_bmk = left widehat(bmv bm) bmv right_bmk. In the pseudo-spectral method, the quadratic nonlinearity is computed by collocation in physical space (i.e. this term is evaluated at grid points), while derivatives are computed in Fourier space. This requires transforming fields back and forth between both spaces.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Below we implement a function that computes the non-linear term in Fourier space based on its convective form (bmv bm) bmv = bm (bmv bmv). Note that this equivalence uses the incompressibility condition bm bmv = 0.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using LinearAlgebra: mul!, ldiv! # for applying FFT plans in-place\n\n# Compute non-linear term in Fourier space from velocity field in physical\n# space. Optional keyword arguments may be passed to avoid allocations.\nfunction ns_nonlinear!(\n F̂s, vs, plan, grid_fourier;\n vbuf = similar(vs[1]), v̂buf = similar(F̂s[1]),\n )\n # Compute F_i = ∂_j (v_i v_j) for each i.\n # In Fourier space: F̂_i = im * k_j * FFT(v_i * v_j)\n w, ŵ = vbuf, v̂buf\n @inbounds for (i, F̂i) ∈ enumerate(F̂s)\n F̂i .= 0\n vi = vs[i]\n for (j, vj) ∈ enumerate(vs)\n w .= vi .* vj # w = v_i * v_j in physical space\n mul!(ŵ, plan, w) # same in Fourier space\n # Add derivative in Fourier space\n for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n kj = k⃗[j]\n F̂i[I] += im * kj * ŵ[I]\n end\n end\n end\n F̂s\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"As an example, let's use this function on our initial velocity field:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"F̂s = similar.(v̂s)\nns_nonlinear!(F̂s, v⃗₀, plan, grid_fourier);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Strictly speaking, computing the non-linear term by collocation can lead to aliasing errors, as the quadratic term excites Fourier modes that fall beyond the range of resolved wave numbers. The typical solution is to apply Orzsag's 2/3 rule to zero-out the Fourier coefficients associated to the highest wave numbers. We define a function that applies this procedure below.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function dealias_twothirds!(ŵs::Tuple, grid_fourier, ks_global)\n ks_max = maximum.(abs, ks_global) # maximum stored wave numbers (kx_max, ky_max, kz_max)\n ks_lim = (2 / 3) .* ks_max\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I]\n if any(abs.(k⃗) .> ks_lim)\n for ŵ ∈ ŵs\n ŵ[I] = 0\n end\n end\n end\n ŵs\nend\n\n# We can apply this on the previously computed non-linear term:\ndealias_twothirds!(F̂s, grid_fourier, ks_global);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Finally, we implement the projection associated to the incompressibility condition:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function project_divergence_free!(ûs, grid_fourier)\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I]\n k² = sum(abs2, k⃗)\n iszero(k²) && continue # avoid division by zero\n û = getindex.(ûs, Ref(I)) # (ûs[1][I], ûs[2][I], ...)\n for i ∈ eachindex(û)\n ŵ = û[i]\n for j ∈ eachindex(û)\n ŵ -= k⃗[i] * k⃗[j] * û[j] / k²\n end\n ûs[i][I] = ŵ\n end\n end\n ûs\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can verify the correctness of the projection operator by checking that the initial velocity field is not modified by it, since it is already incompressible:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v̂s_proj = project_divergence_free!(copy.(v̂s), grid_fourier)\nv̂s_proj .≈ v̂s # the last one may be false because v_z = 0 initially","category":"page"},{"location":"generated/navier_stokes/#Putting-it-all-together","page":"Navier–Stokes equations","title":"Putting it all together","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To perform the time integration of the Navier–Stokes equations, we will use the timestepping routines implemented in the DifferentialEquations.jl suite. For simplicity, we use here an explicit Runge–Kutta scheme. In this case, we just need to write a function that computes the right-hand side of the Navier–Stokes equations in Fourier space:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function ns_rhs!(\n dvs::NTuple{N, <:PencilArray}, vs::NTuple{N, <:PencilArray}, p, t,\n ) where {N}\n # 1. Compute non-linear term and dealias it\n (; plan, cache, ks_global, grid_fourier) = p\n F̂s = cache.F̂s\n ns_nonlinear!(F̂s, vs, plan, grid_fourier; vbuf = dvs[1], v̂buf = cache.v̂s[1])\n dealias_twothirds!(F̂s, grid_fourier, ks_global)\n\n # 2. Project onto divergence-free space\n project_divergence_free!(F̂s, grid_fourier)\n\n # 3. Transform velocity to Fourier space\n v̂s = cache.v̂s\n map((v, v̂) -> mul!(v̂, plan, v), vs, v̂s)\n\n # 4. Add viscous term (and multiply projected non-linear term by -1)\n ν = p.ν\n for n ∈ eachindex(v̂s)\n v̂ = v̂s[n]\n F̂ = F̂s[n]\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n k² = sum(abs2, k⃗)\n F̂[I] = -F̂[I] - ν * k² * v̂[I]\n end\n end\n\n # 5. Transform RHS back to physical space\n map((dv, dv̂) -> ldiv!(dv, plan, dv̂), dvs, F̂s)\n\n nothing\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"For the time-stepping, we load OrdinaryDiffEq.jl from the DifferentialEquations.jl suite and set-up the simulation. Since DifferentialEquations.jl can't directly deal with tuples of arrays, we convert the input data to the ArrayPartition type and write an interface function to make things work with our functions defined above.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using OrdinaryDiffEq\nusing RecursiveArrayTools: ArrayPartition\n\nns_rhs!(dv::ArrayPartition, v::ArrayPartition, args...) = ns_rhs!(dv.x, v.x, args...)\nvs_init_ode = ArrayPartition(v⃗₀)\nsummary(vs_init_ode)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We now define solver parameters and temporary variables, and initialise the problem:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"params = (;\n ν = 5e-3, # kinematic viscosity\n plan, grid_fourier, ks_global,\n cache = (\n v̂s = similar.(v̂s),\n F̂s = similar.(v̂s),\n )\n)\n\ntspan = (0.0, 10.0)\nprob = ODEProblem(ns_rhs!, vs_init_ode, tspan, params)\nintegrator = init(prob, RK4(); dt = 1e-3, save_everystep = false);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We finally solve the problem over time and plot the vorticity associated to the solution. It is also useful to look at the energy spectrum E(k), to see if the small scales are correctly resolved. To obtain a turbulent flow, the viscosity ν must be small enough to allow the transient appearance of an energy cascade towards the small scales (i.e. from small to large k), while high enough to allow the small-scale motions to be correctly resolved.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function energy_spectrum!(Ek, ks, v̂s, grid_fourier)\n Nk = length(Ek)\n @assert Nk == length(ks)\n Ek .= 0\n for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n knorm = sqrt(sum(abs2, k⃗))\n i = searchsortedfirst(ks, knorm)\n i > Nk && continue\n v⃗ = getindex.(v̂s, Ref(I)) # = (v̂s[1][I], v̂s[2][I], ...)\n factor = k⃗[1] == 0 ? 1 : 2 # account for Hermitian symmetry and r2c transform\n Ek[i] += factor * sum(abs2, v⃗) / 2\n end\n MPI.Allreduce!(Ek, +, get_comm(v̂s[1])) # sum across all processes\n Ek\nend\n\nks = rfftfreq(Ns[1], 2π * Ns[1] / Ls[1])\nEk = similar(ks)\nv̂s = plan .* integrator.u.x\nenergy_spectrum!(Ek, ks, v̂s, grid_fourier)\nEk ./= scale_factor(plan)^2 # rescale energy\n\ncurl_fourier!(ω̂s, v̂s, grid_fourier)\nldiv!.(ωs, plan, ω̂s)\nω⃗_plot = Observable(ωs)\nk_plot = @view ks[2:end]\nE_plot = Observable(@view Ek[2:end])\nt_plot = Observable(integrator.t)\n\nfig = let\n fig = Figure(resolution = (1200, 600))\n ax = Axis3(\n fig[1, 1][1, 1]; title = @lift(\"t = $(round($t_plot, digits = 3))\"),\n aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\",\n )\n ω_mag = @lift vecnorm($ω⃗_plot)\n ω_mag_norm = @lift $ω_mag ./ maximum($ω_mag)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, ω_mag_norm;\n alpha = 0.3, levels = 3,\n colormap = :viridis, colorrange = (0.0, 1.0),\n )\n cb = Colorbar(fig[1, 1][1, 2], ct; label = \"Normalised vorticity magnitude\")\n ax_sp = Axis(\n fig[1, 2];\n xlabel = \"k\", ylabel = \"E(k)\", xscale = log2, yscale = log10,\n title = \"Kinetic energy spectrum\",\n )\n ylims!(ax_sp, 1e-8, 1e0)\n scatterlines!(ax_sp, k_plot, E_plot)\n ks_slope = exp.(range(log(2.5), log(25.0), length = 3))\n E_fivethirds = @. 0.3 * ks_slope^(-5/3)\n @views lines!(ax_sp, ks_slope, E_fivethirds; color = :black, linestyle = :dot)\n text!(ax_sp, L\"k^{-5/3}\"; position = (ks_slope[2], E_fivethirds[2]), align = (:left, :bottom))\n fig\nend\n\nusing Printf # hide\nwith_xvfb = ENV[\"DISPLAY\"] == \":99\" # hide\nnstep = 0 # hide\nconst tmpdir = mktempdir() # hide\nfilename_frame(procid, nstep) = joinpath(tmpdir, @sprintf(\"proc%d_%04d.png\", procid, nstep)) # hide\nrecord(fig, \"vorticity_proc$procid.mp4\"; framerate = 10) do io\n with_xvfb && recordframe!(io) # hide\n while integrator.t < 20\n dt = 0.001\n step!(integrator, dt)\n t_plot[] = integrator.t\n mul!.(v̂s, plan, integrator.u.x) # current velocity in Fourier space\n curl_fourier!(ω̂s, v̂s, grid_fourier)\n ldiv!.(ω⃗_plot[], plan, ω̂s)\n ω⃗_plot[] = ω⃗_plot[] # to force updating the plot\n energy_spectrum!(Ek, ks, v̂s, grid_fourier)\n Ek ./= scale_factor(plan)^2 # rescale energy\n E_plot[] = E_plot[]\n global nstep += 1 # hide\n with_xvfb ? # hide\n save(filename_frame(procid, nstep), fig) : # hide\n recordframe!(io)\n end\nend;\n\nif with_xvfb # hide\n run(pipeline(`ffmpeg -y -r 10 -i $tmpdir/proc$(procid)_%04d.png -c:v libx264 -vf \"fps=25,format=yuv420p\" vorticity_proc$procid.mp4`; stdout = \"ffmpeg.out\", stderr = \"ffmpeg.err\")) # hide\nend # hide\nnothing # hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"This page was generated using Literate.jl.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/gradient.jl\"","category":"page"},{"location":"generated/gradient/#Gradient-of-a-scalar-field","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This example shows different methods to compute the gradient of a real-valued 3D scalar field θ(bmx) in Fourier space, where bmx = (x y z). It is assumed that the field is periodic with period L = 2π along all dimensions.","category":"page"},{"location":"generated/gradient/#General-procedure","page":"Gradient of a scalar field","title":"General procedure","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The discrete Fourier expansion of θ writes","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"θ(bmx) = _bmk Z^3 hatθ(bmk) e^i bmk bmx","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"where bmk = (k_x k_y k_z) are the Fourier wave numbers and hatθ is the discrete Fourier transform of θ. Then, the spatial derivatives of θ are given by","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"frac θ(bmx) x_i =\n_bmk Z^3 i k_i hatθ(bmk) e^i bmk bmx","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"where the subscript i denotes one of the spatial components x, y or z.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In other words, to compute bm θ = (_x θ _y θ _z θ), one has to:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"transform θ to Fourier space to obtain hatθ,\nmultiply hatθ by i bmk,\ntransform the result back to physical space to obtain bm θ.","category":"page"},{"location":"generated/gradient/#Preparation","page":"Gradient of a scalar field","title":"Preparation","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In this section, we initialise a random real-valued scalar field θ and compute its FFT. For more details see the Tutorial.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"using MPI\nusing PencilFFTs\nusing Random\n\nMPI.Init()\n\n# Input data dimensions (Nx × Ny × Nz)\ndims = (64, 32, 64)\n\n# Apply a 3D real-to-complex (r2c) FFT.\ntransform = Transforms.RFFT()\n\n# Automatically create decomposition configuration\ncomm = MPI.COMM_WORLD\npen = Pencil(dims, comm)\n\n# Create plan\nplan = PencilFFTPlan(pen, transform)\n\n# Allocate data and initialise field\nθ = allocate_input(plan)\nrandn!(θ)\n\n# Perform distributed FFT\nθ_hat = plan * θ\nnothing # hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, we initialise the output that will hold ∇θ in Fourier space. Noting that ∇θ is a vector field, we choose to store it as a tuple of 3 PencilArrays.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"∇θ_hat = allocate_output(plan, Val(3))\n\n# This is equivalent:\n# ∇θ_hat = ntuple(d -> similar(θ_hat), Val(3))\n\nsummary(∇θ_hat)","category":"page"},{"location":"generated/gradient/#Fourier-wave-numbers","page":"Gradient of a scalar field","title":"Fourier wave numbers","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In general, the Fourier wave numbers are of the form k_i = 0 frac2πL_i frac4πL_i frac6πL_i , where L_i is the period along dimension i. When a real-to-complex Fourier transform is applied, roughly half of these wave numbers are redundant due to the Hermitian symmetry of the complex Fourier coefficients. In practice, this means that for the fastest dimension x (along which a real-to-complex transform is performed), the negative wave numbers are dropped, i.e. k_x = 0 frac2πL_x frac4πL_x .","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The AbstractFFTs package provides a convenient way to generate the Fourier wave numbers, using the functions fftfreq and rfftfreq. We can use these functions to initialise a \"grid\" of wave numbers associated to our 3D real-to-complex transform:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"using AbstractFFTs: fftfreq, rfftfreq\n\nbox_size = (2π, 2π, 2π) # Lx, Ly, Lz\nsample_rate = 2π .* dims ./ box_size\n\n# In our case (Lx = 2π and Nx even), this gives kx = [0, 1, 2, ..., Nx/2].\nkx = rfftfreq(dims[1], sample_rate[1])\n\n# In our case (Ly = 2π and Ny even), this gives\n# ky = [0, 1, 2, ..., Ny/2-1, -Ny/2, -Ny/2+1, ..., -1] (and similarly for kz).\nky = fftfreq(dims[2], sample_rate[2])\nkz = fftfreq(dims[3], sample_rate[3])\n\nkvec = (kx, ky, kz)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that kvec now contains the wave numbers associated to the global domain. In the following, we will only need the wave numbers associated to the portion of the domain handled by the local MPI process.","category":"page"},{"location":"generated/gradient/#gradient_method_global","page":"Gradient of a scalar field","title":"Method 1: global views","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"PencilArrays, returned for instance by allocate_input and allocate_output, take indices that start at 1, regardless of the location of the subdomain associated to the local process on the global grid. (In other words, PencilArrays take local indices.) On the other hand, we have defined the wave number vector kvec which, for each MPI process, is defined over the global domain, and as such it takes global indices.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"One straightforward way of making data arrays compatible with wave numbers is to use global views, i.e. arrays that take global indices. These are generated from PencilArrays by calling the global_view function. Note that, in general, global indices do not start at 1 for a given MPI process. A given process will own a range of data given by indices in (i1:i2, j1:j2, k1:k2).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"θ_glob = global_view(θ_hat)\n∇θ_glob = global_view.(∇θ_hat)\nsummary(θ_glob)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Once we have global views, we can combine data and wave numbers using the portion of global indices owned by the local MPI process, as shown below. We can use CartesianIndices to iterate over the global indices associated to the local process.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"for I in CartesianIndices(θ_glob)\n i, j, k = Tuple(I) # unpack indices\n\n # Wave number vector associated to current Cartesian index.\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n # Compute gradient in Fourier space.\n # Note that modifying ∇θ_glob also modifies the original PencilArray ∇θ_hat.\n ∇θ_glob[1][I] = im * kx * θ_glob[I]\n ∇θ_glob[2][I] = im * ky * θ_glob[I]\n ∇θ_glob[3][I] = im * kz * θ_glob[I]\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The above loop can be written in a slightly more efficient manner by precomputing im * θ_glob[I]:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@inbounds for I in CartesianIndices(θ_glob)\n i, j, k = Tuple(I)\n\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n u = im * θ_glob[I]\n\n ∇θ_glob[1][I] = kx * u\n ∇θ_glob[2][I] = ky * u\n ∇θ_glob[3][I] = kz * u\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Also note that the above can be easily written in a more generic way, e.g. for arbitrary dimensions, thanks in part to the use of CartesianIndices. Moreover, in the above there is no notion of the dimension permutations discussed in the tutorial, as it is all hidden behind the implementation of PencilArrays. And as seen later in the benchmarks, these (hidden) permutations have zero cost, as the speed is identical to that of a function that explicitly takes into account these permutations.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, we can perform a backwards transform to obtain bm θ in physical space:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"∇θ = plan \\ ∇θ_hat;\nnothing #hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that the transform is automatically broadcast over the three fields of the ∇θ_hat vector, and the result ∇θ is also a tuple of three PencilArrays.","category":"page"},{"location":"generated/gradient/#gradient_method_global_explicit","page":"Gradient of a scalar field","title":"Method 2: explicit global indexing","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Sometimes, one does not need to write generic code. In our case, one often knows the dimensionality of the problem and the memory layout of the data (i.e. the underlying index permutation).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Below is a reimplementation of the above loop, using explicit indices instead of CartesianIndices, and assuming that the underlying index permutation is (3, 2, 1), that is, data is stored in (z y x) order. As discussed in the tutorial, this is the default for transformed arrays. This example also serves as a more explicit explanation for what is going on in the first method.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"# Get local data range in the global grid.\nrng = axes(θ_glob) # = (i1:i2, j1:j2, k1:k2)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"For the loop below, we're assuming that the permutation is (3, 2, 1). In other words, the fastest index is the last one, and not the first one as it is usually in Julia. If the permutation is not (3, 2, 1), things will still work (well, except for the assertion below!), but the loop order will not be optimal.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@assert permutation(θ_hat) === Permutation(3, 2, 1)\n\n@inbounds for i in rng[1], j in rng[2], k in rng[3]\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n # Note that we still access the arrays in (i, j, k) order.\n # (The permutation happens behind the scenes!)\n u = im * θ_glob[i, j, k]\n\n ∇θ_glob[1][i, j, k] = kx * u\n ∇θ_glob[2][i, j, k] = ky * u\n ∇θ_glob[3][i, j, k] = kz * u\nend","category":"page"},{"location":"generated/gradient/#gradient_method_local","page":"Gradient of a scalar field","title":"Method 3: using local indices","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Alternatively, we can avoid global views and work directly on PencilArrays using local indices that start at 1. In this case, part of the strategy is to construct a \"local\" grid of wave numbers that can also be accessed with local indices. This can be conveniently done using the localgrid function of the PencilArrays.jl package, which accepts a PencilArray (or its associated Pencil) and the global coordinates (here kvec):","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"grid_fourier = localgrid(θ_hat, kvec)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that one can directly iterate on the returned grid object:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@inbounds for I in CartesianIndices(grid_fourier)\n # Wave number vector associated to current Cartesian index.\n local k⃗ # hide\n k⃗ = grid_fourier[I]\n u = im * θ_hat[I]\n ∇θ_hat[1][I] = k⃗[1] * u\n ∇θ_hat[2][I] = k⃗[2] * u\n ∇θ_hat[3][I] = k⃗[3] * u\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This implementation is as efficient as the other examples, while being slightly shorter to write. Moreover, it is quite generic, and can be made independent of the number of dimensions with little effort.","category":"page"},{"location":"generated/gradient/#gradient_method_broadcast","page":"Gradient of a scalar field","title":"Method 4: using broadcasting","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, note that the local grid object returned by localgrid makes it is possible to compute the gradient using broadcasting, thus fully avoiding scalar indexing. This can be quite convenient in some cases, and can also be very useful if one is working on GPUs (where scalar indexing is prohibitively expensive). Using broadcasting, the above examples simply become:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@. ∇θ_hat[1] = im * grid_fourier[1] * θ_hat\n@. ∇θ_hat[2] = im * grid_fourier[2] * θ_hat\n@. ∇θ_hat[3] = im * grid_fourier[3] * θ_hat\nnothing # hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Once again, as shown in the benchmarks further below, this method performs quite similarly to the other ones.","category":"page"},{"location":"generated/gradient/#Summary","page":"Gradient of a scalar field","title":"Summary","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The PencilArrays module provides different alternatives to deal with MPI-distributed data that may be subject to dimension permutations. In particular, one can choose to work with global indices (first two examples), with local indices (third example), or to avoid scalar indexing altogether (fourth example).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"If one wants to stay generic, making sure that the same code will work for arbitrary dimensions and will be efficient regardless of the underlying dimension permutation, methods 1, 3 or 4 should be preferred. These use CartesianIndices and make no assumptions on possible dimension permutations, which are by default enabled in the output of PencilFFTs transforms. In fact, such permutations are completely invisible in the implementations.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The second method uses explicit (i, j, k) indices. It assumes that the underlying permutation is (3, 2, 1) to loop with i as the slowest index and k as the fastest, which is the optimal order in this case given the permutation. As such, the implementation is less generic than the others, and differences in performance are negligible with respect to more generic variants.","category":"page"},{"location":"generated/gradient/#gradient_benchmarks","page":"Gradient of a scalar field","title":"Benchmark results","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The following are the benchmark results obtained from running examples/gradient.jl on a laptop, using 2 MPI processes and Julia 1.7.2, with an input array of global dimensions 64 32 64. The different methods detailed above are marked on the right. The \"lazy\" marks indicate runs where the wave numbers were represented by lazy Frequencies objects (returned by rfftfreq and fftfreq). Otherwise, they were collected into Vectors. For some reason, plain Vectors are faster when working with grids generated by localgrid.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In the script, additional implementations can be found which rely on a more advanced understanding of permutations and on the internals of the PencilArrays package. For instance, gradient_local_parent! directly works with the raw data stored in Julia Arrays, while gradient_local_linear! completely avoids CartesianIndices while staying generic and efficient. Nevertheless, these display roughly the same performance as the above examples.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":" gradient_global_view!... 89.900 μs\n gradient_global_view! (lazy)... 92.060 μs [Method 1]\n gradient_global_view_explicit!... 88.958 μs\n gradient_global_view_explicit! (lazy)... 81.055 μs [Method 2]\n gradient_local!... 92.305 μs\n gradient_grid!... 92.770 μs\n gradient_grid! (lazy)... 101.388 μs [Method 3]\n gradient_grid_broadcast!... 88.606 μs\n gradient_grid_broadcast! (lazy)... 151.020 μs [Method 4]\n gradient_local_parent!... 92.248 μs\n gradient_local_linear!... 91.212 μs\n gradient_local_linear_explicit!... 90.992 μs","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This page was generated using Literate.jl.","category":"page"},{"location":"PencilFFTs_timers/#PencilFFTs.measuring_performance","page":"Measuring performance","title":"Measuring performance","text":"","category":"section"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"It is possible to measure the time spent in different sections of the distributed transforms using the TimerOutputs package. This has a (very small) performance overhead, so it is disabled by default. To enable time measurements, call TimerOutputs.enable_debug_timings after loading PencilFFTs (see below for an example). For more details see the TimerOutputs docs.","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"Minimal example:","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"using MPI\nusing PencilFFTs\nusing TimerOutputs\n\n# Enable timing of `PencilFFTs` functions\nTimerOutputs.enable_debug_timings(PencilFFTs)\nTimerOutputs.enable_debug_timings(PencilArrays)\nTimerOutputs.enable_debug_timings(Transpositions)\n\nMPI.Init()\n\nplan = PencilFFTPlan(#= args... =#)\n\n# [do stuff with `plan`...]\n\n# Retrieve and print timing data associated to `plan`\nto = timer(plan)\nprint_timer(to)","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"By default, each PencilFFTPlan has its own TimerOutput. If you already have a TimerOutput, you can pass it to the PencilFFTPlan constructor:","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"to = TimerOutput()\nplan = PencilFFTPlan(..., timer=to)\n\n# [do stuff with `plan`...]\n\nprint_timer(to)","category":"page"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"#PencilFFTs","page":"Home","title":"PencilFFTs","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Fast Fourier transforms of MPI-distributed Julia arrays.","category":"page"},{"location":"#Introduction","page":"Home","title":"Introduction","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package provides multidimensional FFTs and related transforms on MPI-distributed Julia arrays via the PencilArrays package.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The name of this package originates from the decomposition of 3D domains along two out of three dimensions, sometimes called pencil decomposition. This is illustrated by the figure below,[1] where each coloured block is managed by a different MPI process. Typically, one wants to compute FFTs on a scalar or vector field along the three spatial dimensions. In the case of a pencil decomposition, 3D FFTs are performed one dimension at a time, along the non-decomposed direction. Transforms must then be interleaved with global data transpositions to switch between pencil configurations. In high-performance computing environments, such data transpositions are generally the most expensive part of a parallel FFT computation, due to the large cost of communications between computing nodes.","category":"page"},{"location":"","page":"Home","title":"Home","text":"
\n \n
","category":"page"},{"location":"","page":"Home","title":"Home","text":"More generally, PencilFFTs allows to decompose and perform FFTs on geometries of arbitrary dimension N. The decompositions can be performed along an arbitrary number M N of dimensions.[2] Moreover, the transforms applied along each dimension can be arbitrarily chosen (and combined) among those supported by FFTW.jl, including complex-to-complex, real-to-complex and real-to-real transforms.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The generic and efficient implementation of this package is greatly enabled by the use of zero-cost abstractions in Julia. As shown in the Benchmarks section, PencilFFTs scales well to large numbers of processes, and performs similarly to the Fortran implementation of P3DFFT, possibly the most popular library for computing parallel FFTs using 2D domain decomposition.","category":"page"},{"location":"#Features","page":"Home","title":"Features","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"distributed N-dimensional FFTs of MPI-distributed Julia arrays, using the PencilArrays package;\nFFTs and related transforms (e.g. DCTs / Chebyshev transforms) may be arbitrarily combined along different dimensions;\nin-place and out-of-place transforms;\nhigh scalability up to (at least) tens of thousands of MPI processes.","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"PencilFFTs can be installed using the Julia package manager:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> ] add PencilFFTs","category":"page"},{"location":"#Similar-projects","page":"Home","title":"Similar projects","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"FFTW3 implements distributed-memory transforms using MPI, but these are limited to 1D decompositions. Also, this functionality is not currently included in the FFTW.jl wrappers.\nPFFT is a very general parallel FFT library written in C.\nP3DFFT implements parallel 3D FFTs using pencil decomposition in Fortran and C++.\n2DECOMP&FFT is another parallel 3D FFT library using pencil decomposition written in Fortran.","category":"page"},{"location":"","page":"Home","title":"Home","text":"[1]: Figure adapted from this PhD thesis.","category":"page"},{"location":"","page":"Home","title":"Home","text":"[2]: For the pencil decomposition represented in the figure, N = 3 and M = 2.","category":"page"},{"location":"PencilFFTs/#Distributed-FFT-plans","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"Distributed FFTs are implemented in the PencilFFTs module, and are built on top of the PencilArrays package.","category":"page"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"PencilFFTs/#Creating-plans","page":"Distributed FFT plans","title":"Creating plans","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"PencilFFTPlan","category":"page"},{"location":"PencilFFTs/#PencilFFTs.PencilFFTPlan","page":"Distributed FFT plans","title":"PencilFFTs.PencilFFTPlan","text":"PencilFFTPlan{T,N} <: AbstractFFTs.Plan{T}\n\nPlan for N-dimensional FFT-based transform on MPI-distributed data, where input data has type T.\n\n\n\nPencilFFTPlan(p::Pencil, transforms; kwargs...)\n\nCreate a PencilFFTPlan for distributed arrays following a given Pencil configuration. See variant below for details on the specification of transforms and on possible keyword arguments.\n\n\n\nPencilFFTPlan(\n A::PencilArray, transforms;\n fftw_flags = FFTW.ESTIMATE,\n fftw_timelimit = FFTW.NO_TIMELIMIT,\n permute_dims = Val(true),\n transpose_method = Transpositions.PointToPoint(),\n timer = TimerOutput(),\n)\n\nCreate plan for N-dimensional transform on MPI-distributed PencilArrays.\n\nExtended help\n\nThis creates a PencilFFTPlan for arrays sharing the same properties as A (dimensions, MPI decomposition, memory layout, ...), which describe data on an N-dimensional domain.\n\nTransforms\n\nThe transforms to be applied along each dimension are specified by the transforms argument. Possible transforms are defined as subtypes of Transforms.AbstractTransform, and are listed in Transform types. This argument may be either:\n\na tuple of N transforms to be applied along each dimension. For instance, transforms = (Transforms.R2R(FFTW.REDFT01), Transforms.RFFT(), Transforms.FFT());\na single transform to be applied along all dimensions. The input is automatically expanded into N equivalent transforms. For instance, for a three-dimensional array, transforms = Transforms.RFFT() specifies a 3D real-to-complex transform, and is equivalent to passing (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT()).\n\nNote that forward transforms are applied from left to right. In the last example, this means that a real-to-complex transform (RFFT) is first performed along the first dimension. This is followed by complex-to-complex transforms (FFT) along the second and third dimensions.\n\nInput data layout\n\nThe input PencilArray must satisfy the following constraints:\n\narray dimensions must not be permuted. This is the default when constructing PencilArrays.\nfor an M-dimensional domain decomposition (with M < N), the input array must be decomposed along the last M dimensions. For example, for a 2D decomposition of 3D data, the decomposed dimensions must be (2, 3). In particular, the first array dimension must not be distributed among different MPI processes.\nIn the PencilArrays package, the decomposed dimensions are specified at the moment of constructing a Pencil.\nthe element type must be compatible with the specified transform. For instance, real-to-complex transforms (Transforms.RFFT) require the input to be real floating point values. Other transforms, such as Transforms.R2R, accept both real and complex data.\n\nKeyword arguments\n\nThe keyword arguments fftw_flags and fftw_timelimit are passed to the FFTW plan creation functions (see AbstractFFTs docs).\npermute_dims determines whether the indices of the output data should be reversed. For instance, if the input data has global dimensions (Nx, Ny, Nz), then the output of a complex-to-complex FFT would have dimensions (Nz, Ny, Nx). This enables FFTs to always be performed along the first (i.e. fastest) array dimension, which could lead to performance gains. This option is enabled by default. For type inference reasons, it must be a value type (Val(true) or Val(false)).\ntranspose_method allows to select between implementations of the global data transpositions. See PencilArrays docs docs for details.\ntimer should be a TimerOutput object. See Measuring performance for details.\n\n\n\nPencilFFTPlan(\n dims_global::Dims{N}, transforms, proc_dims::Dims{M}, comm::MPI.Comm,\n [real_type = Float64]; extra_dims = (), kws...\n)\n\nCreate plan for N-dimensional transform.\n\nExtended help\n\nInstead of taking a PencilArray or a Pencil, this constructor requires the global dimensions of the input data, passed via the size_global argument.\n\nThe data is distributed over the MPI processes in the comm communicator. The distribution is performed over M dimensions (with M < N) according to the values in proc_dims, which specifies the number of MPI processes to put along each dimension.\n\nPencilArrays that may be transformed with the returned plan can be created using allocate_input.\n\nOptional arguments\n\nThe floating point precision can be selected by setting real_type parameter, which is Float64 by default.\nextra_dims may be used to specify the sizes of one or more extra dimensions that should not be transformed. These dimensions will be added to the rightmost (i.e. slowest) indices of the arrays. See Extra dimensions below for usage hints.\nsee the other constructor for more keyword arguments.\n\nExtra dimensions\n\nOne possible application of extra_dims is for describing the components of a vector or tensor field. However, this means that different PencilFFTPlans would need to be created for each kind of field (scalar, vector, ...). To avoid the creation of multiple plans, a possibly better alternative is to create tuples (or arrays) of PencilArrays using allocate_input and allocate_output.\n\nAnother more legitimate usage of extra_dims is to specify one or more Cartesian dimensions that should not be transformed nor split among MPI processes.\n\nExample\n\nSuppose we want to perform a 3D FFT of real data. The data is to be decomposed along two dimensions, over 8 MPI processes:\n\nsize_global = (64, 32, 128) # size of real input data\n\n# Perform real-to-complex transform along the first dimension, then\n# complex-to-complex transforms along the other dimensions.\ntransforms = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT())\n# transforms = Transforms.RFFT() # this is equivalent to the above line\n\nproc_dims = (4, 2) # 2D decomposition\ncomm = MPI.COMM_WORLD\n\nplan = PencilFFTPlan(size_global, transforms, proc_dims, comm)\n\n\n\n\n\n","category":"type"},{"location":"PencilFFTs/#Allocating-data","page":"Distributed FFT plans","title":"Allocating data","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"allocate_input\nallocate_output","category":"page"},{"location":"PencilFFTs/#PencilFFTs.allocate_input","page":"Distributed FFT plans","title":"PencilFFTs.allocate_input","text":"allocate_input(p::PencilFFTPlan) -> PencilArray\nallocate_input(p::PencilFFTPlan, dims...) -> Array{PencilArray}\nallocate_input(p::PencilFFTPlan, Val(N)) -> NTuple{N, PencilArray}\n\nAllocate uninitialised PencilArray that can hold input data for the given plan.\n\nThe second and third forms respectively allocate an array of PencilArrays of size dims, and a tuple of N PencilArrays.\n\nnote: In-place plans\nIf p is an in-place plan, a ManyPencilArray is allocated. This type holds PencilArray wrappers for the input and output transforms (as well as for intermediate transforms) which share the same space in memory. The input and output PencilArrays should be respectively accessed by calling first(::ManyPencilArray) and last(::ManyPencilArray).ExampleSuppose p is an in-place PencilFFTPlan. Then,@assert is_inplace(p)\nA = allocate_input(p) :: ManyPencilArray\nv_in = first(A) :: PencilArray # input data view\nv_out = last(A) :: PencilArray # output data viewAlso note that in-place plans must be performed directly on the returned ManyPencilArray, and not on the contained PencilArray views:p * A # perform forward transform in-place\np \\ A # perform backward transform in-place\n# p * v_in # not allowed!!\n\n\n\n\n\n","category":"function"},{"location":"PencilFFTs/#PencilFFTs.allocate_output","page":"Distributed FFT plans","title":"PencilFFTs.allocate_output","text":"allocate_output(p::PencilFFTPlan) -> PencilArray\nallocate_output(p::PencilFFTPlan, dims...) -> Array{PencilArray}\nallocate_output(p::PencilFFTPlan, Val(N)) -> NTuple{N, PencilArray}\n\nAllocate uninitialised PencilArray that can hold output data for the given plan.\n\nIf p is an in-place plan, a ManyPencilArray is allocated.\n\nSee allocate_input for details.\n\n\n\n\n\n","category":"function"},{"location":"PencilFFTs/#Methods","page":"Distributed FFT plans","title":"Methods","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"get_comm(::PencilFFTPlan)\nscale_factor(::PencilFFTPlan)\ntimer(::PencilFFTPlan)\nis_inplace(::PencilFFTPlan)","category":"page"},{"location":"PencilFFTs/#PencilArrays.Pencils.MPITopologies.get_comm-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilArrays.Pencils.MPITopologies.get_comm","text":"get_comm(p::PencilFFTPlan)\n\nGet MPI communicator associated to a PencilFFTPlan.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilFFTs.Transforms.scale_factor-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilFFTs.Transforms.scale_factor","text":"scale_factor(p::PencilFFTPlan)\n\nGet scale factor associated to a PencilFFTPlan.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilArrays.Pencils.timer-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilArrays.Pencils.timer","text":"timer(p::PencilFFTPlan)\n\nGet TimerOutput attached to a PencilFFTPlan.\n\nSee Measuring performance for details.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilFFTs.Transforms.is_inplace-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilFFTs.Transforms.is_inplace","text":"Transforms.is_inplace(p::PencilFFTPlan)\n\nReturns true if the given plan operates in-place on the input data, false otherwise.\n\n\n\n\n\n","category":"method"}]
-}
diff --git a/previews/PR48/siteinfo.js b/previews/PR48/siteinfo.js
deleted file mode 100644
index d4d3f93b..00000000
--- a/previews/PR48/siteinfo.js
+++ /dev/null
@@ -1 +0,0 @@
-var DOCUMENTER_CURRENT_VERSION = "previews/PR48";
diff --git a/previews/PR55/GlobalFFTParams/index.html b/previews/PR55/GlobalFFTParams/index.html
deleted file mode 100644
index 2e928d14..00000000
--- a/previews/PR55/GlobalFFTParams/index.html
+++ /dev/null
@@ -1,9 +0,0 @@
-
-Global FFT parameters · PencilFFTs.jl
Specifies the global parameters for an N-dimensional distributed transform. These include the element type T and global data sizes of input and output data, as well as the transform types to be performed along each dimension.
transforms must be a tuple of length N specifying the transforms to be applied along each dimension. Each element must be a subtype of Transforms.AbstractTransform. For all the possible transforms, see Transform types.
The element type must be a real type accepted by FFTW, i.e. either Float32 or Float64.
Note that the transforms are applied one dimension at a time, with the leftmost dimension first for forward transforms.
Example
To perform a 3D FFT of real data, first a real-to-complex FFT must be applied along the first dimension, followed by two complex-to-complex FFTs along the other dimensions:
It is possible to measure the time spent in different sections of the distributed transforms using the TimerOutputs package. This has a (very small) performance overhead, so it is disabled by default. To enable time measurements, call TimerOutputs.enable_debug_timings after loading PencilFFTs (see below for an example). For more details see the TimerOutputs docs.
Minimal example:
using MPI
-using PencilFFTs
-using TimerOutputs
-
-# Enable timing of `PencilFFTs` functions
-TimerOutputs.enable_debug_timings(PencilFFTs)
-TimerOutputs.enable_debug_timings(PencilArrays)
-TimerOutputs.enable_debug_timings(Transpositions)
-
-MPI.Init()
-
-plan = PencilFFTPlan(#= args... =#)
-
-# [do stuff with `plan`...]
-
-# Retrieve and print timing data associated to `plan`
-to = timer(plan)
-print_timer(to)
By default, each PencilFFTPlan has its own TimerOutput. If you already have a TimerOutput, you can pass it to the PencilFFTPlan constructor:
to = TimerOutput()
-plan = PencilFFTPlan(..., timer=to)
-
-# [do stuff with `plan`...]
-
-print_timer(to)
Settings
This document was generated with Documenter.jl version 0.27.19 on Tuesday 28 June 2022. Using Julia version 1.7.3.
Like AbstractFFTs.bfft, this transform is not normalised. To obtain the inverse transform, divide the output by the length of the transformed dimension.
To obtain the inverse transform, divide the output by the length of the transformed dimension (of the real output array).
As described in the AbstractFFTs docs, the length of the output cannot be fully inferred from the input length. For this reason, the BRFFT constructor accepts an optional d argument indicating the output length.
For multidimensional datasets, a tuple of dimensions (d1, d2, ..., dN) may also be passed. This is equivalent to passing just dN. In this case, the last dimension (dN) is the one that changes size between the input and output. Note that this is the opposite of FFTW.brfft. The reason is that, in PencilFFTs, the last dimension is the one along which a complex-to-real transform is performed.
plan(transform::AbstractTransform, A, [dims];
- flags=FFTW.ESTIMATE, timelimit=Inf)
Create plan to transform array A along dimensions dims.
If dims is not specified, all dimensions of A are transformed.
For FFT plans, this function wraps the AbstractFFTs.jl and FFTW.jl plan creation functions. For more details on the function arguments, see AbstractFFTs.plan_fft.
Returns the backwards transform associated to the given transform.
The second argument must be the length of the first transformed dimension in the forward transform. It is used in particular when transform = RFFT(), to determine the length of the inverse (complex-to-real) transform. See the AbstractFFTs.irfft docs for details.
The backwards transform returned by this function is not normalised. The normalisation factor for a given array can be obtained by calling scale_factor.
Returns the length of the transform output, given the length of its input.
The input and output lengths are specified in terms of the respective input and output datatypes. For instance, for real-to-complex transforms, these are respectively the length of input real data and of output complex data.
The performance of PencilFFTs.jl is comparable to that of other open-source parallel FFT libraries implemented in lower-level languages. Below, we show comparisons with the Fortran implementation of P3DFFT, possibly the most popular of these libraries. The benchmarks were performed on the Jean–Zay cluster of the IDRIS French computing centre (CNRS).
The figure below shows strong scaling benchmarks of 3D real-to-complex FFTs using 2D ("pencil") decomposition. The benchmarks were run for input arrays of dimensions $N_x × N_y × N_z = 512^3$, $1024^3$ and $2048^3$. Each timing is averaged over 100 repetitions.
-
-
-
As seen above, PencilFFTs generally outperforms P3DFFT in its default setting. This is largely explained by the choice of using non-blocking point-to-point MPI communications (via MPI_Isend and MPI_Irecv), while P3DFFT uses collective MPI_Alltoallv calls. This enables PencilFFTs to perform data reordering operations on the partially received data while waiting for the incoming data, leading to better performance. Moreover, in contrast with P3DFFT, the high performance and scalability of PencilFFTs results from a highly generic code, handling decompositions in arbitrary dimensions and a relatively large (and extensible) variety of transformations.
Note that PencilFFTs can optionally use collective communications (using MPI_Alltoallv) instead of point-to-point communications. For details, see the docs for PencilFFTPlan and for PencilArray transpositions. As seen above, collective communications generally perform worse than point-to-point ones, and runtimes are nearly indistinguishable from those of P3DFFT.
The benchmarks were performed using Julia 1.7-beta3 and Intel MPI 2019. We used PencilFFTs v0.12.5 with FFTW.jl v1.4.3 and MPI.jl v0.19.0. We used the Fortran implementation of P3DFFT, version 2.7.6, which was built with Intel 2019 compilers and linked to FFTW 3.3.8. The cluster where the benchmarks were run has Intel Cascade Lake 6248 processors with 2×20 cores per node.
The number of MPI processes along each decomposed dimension, $P_1$ and $P_2$, was automatically determined by a call to MPI_Dims_create, which tends to create a balanced decomposition with $P_1 ≈ P_2$. For instance, a total of 1024 processes is divided into $P_1 = P_2 = 32$. Different results may be obtained with other combinations, but this was not benchmarked.
The source files used to generate this benchmark, as well as the raw benchmark results, are all available in the PencilFFTs repo.
Settings
This document was generated with Documenter.jl version 0.27.19 on Tuesday 28 June 2022. Using Julia version 1.7.3.
where $\bm{v}(\bm{x}, t)$ and $p(\bm{x}, t)$ are respectively the velocity and pressure fields, $ν$ is the fluid kinematic viscosity and $ρ$ is the fluid density.
We solve the above equations a 3D periodic domain using a standard Fourier pseudo-spectral method.
Let's check the number of MPI processes over which we're running our simulation:
MPI.Comm_size(comm)
2
We can now create a partitioning of the domain based on the number of grid points (Ns) and on the number of MPI processes. There are different ways to do this. For simplicity, here we do it automatically following the PencilArrays.jl docs:
pen = Pencil(Ns, comm)
Decomposition of 3D data
- Data dimensions: (64, 64, 64)
- Decomposed dimensions: (2, 3)
- Data permutation: NoPermutation()
- Array type: Array
The subdomain associated to the local MPI process can be obtained using range_local:
range_local(pen)
(1:64, 1:32, 1:64)
We now construct a distributed vector field that follows the decomposition configuration we just created:
where $u₀$ and $k₀$ are two parameters setting the amplitude and the period of the velocity field.
To set the initial condition, each MPI process needs to know which portion of the physical grid it has been attributed. For this, PencilArrays.jl includes a localgrid helper function:
where $\bm{k} = (k_x, k_y, k_z)$ are the discrete wave numbers.
The wave numbers can be obtained using the fftfreq function. Since we perform a real-to-complex transform along the first dimension, we use rfftfreq instead for $k_x$:
Note that, in Fourier space, the domain decomposition is performed along the directions $x$ and $y$:
pencil(v̂s[1])
Decomposition of 3D data
- Data dimensions: (33, 64, 64)
- Decomposed dimensions: (1, 2)
- Data permutation: Permutation(3, 2, 1)
- Array type: Array
This is because the 3D FFTs are performed one dimension at a time, with the $x$ direction first and the $z$ direction last. To efficiently perform an FFT along a given direction (taking advantage of serial FFT implementations like FFTW), all the data along that direction must be contained locally within a single MPI process. For that reason, data redistributions (or transpositions) among MPI processes are performed behind the scenes during each FFT computation. Such transpositions require important communications between MPI processes, and are usually the most time-consuming aspect of massively-parallel simulations using this kind of methods.
To solve the Navier–Stokes equations in Fourier space, we will also need the respective wave numbers $\bm{k}$ associated to the local MPI process. Similarly to the local grid points, these are obtained using the localgrid function:
As an example, let's first use this to compute and plot the vorticity associated to the initial condition. The vorticity is defined as the curl of the velocity, $\bm{ω} = \bm{∇} × \bm{v}$. In Fourier space, this becomes $\hat{\bm{ω}} = i \bm{k} × \hat{\bm{v}}$.
using StaticArrays: SVector
-using LinearAlgebra: ×
-
-function curl_fourier!(
- ω̂s::NTuple{N, <:PencilArray}, v̂s::NTuple{N, <:PencilArray}, grid_fourier,
- ) where {N}
- @inbounds for I ∈ eachindex(grid_fourier)
- # We use StaticArrays for the cross product between small vectors.
- ik⃗ = im * SVector(grid_fourier[I])
- v⃗ = SVector(getindex.(v̂s, Ref(I))) # = (v̂s[1][I], v̂s[2][I], ...)
- ω⃗ = ik⃗ × v⃗
- for n ∈ eachindex(ω⃗)
- ω̂s[n][I] = ω⃗[n]
- end
- end
- ω̂s
-end
-
-ω̂s = similar.(v̂s)
-curl_fourier!(ω̂s, v̂s, grid_fourier);
We finally transform back to physical space and plot the result:
where $\mathcal{P}_{\bm{k}}$ is a projection operator allowing to preserve the incompressibility condition $\bm{∇} ⋅ \bm{v} = 0$. This operator encodes the action of the pressure gradient term, which serves precisely to enforce incompressibility. Note that, because of this, the pressure gradient dissapears from the equations.
Now that we have the wave numbers $\bm{k}$, computing the linear viscous term in Fourier space is straighforward once we have the Fourier coefficients $\hat{\bm{v}}_{\bm{k}}$ of the velocity field. What is slightly more challenging (and much more costly) is the computation of the non-linear term in Fourier space, $\hat{\bm{F}}_{\bm{k}} = \left[ \widehat{(\bm{v} ⋅ \bm{∇}) \bm{v}} \right]_{\bm{k}}$. In the pseudo-spectral method, the quadratic nonlinearity is computed by collocation in physical space (i.e. this term is evaluated at grid points), while derivatives are computed in Fourier space. This requires transforming fields back and forth between both spaces.
Below we implement a function that computes the non-linear term in Fourier space based on its convective form $(\bm{v} ⋅ \bm{∇}) \bm{v} = \bm{∇} ⋅ (\bm{v} ⊗ \bm{v})$. Note that this equivalence uses the incompressibility condition $\bm{∇} ⋅ \bm{v} = 0$.
using LinearAlgebra: mul!, ldiv! # for applying FFT plans in-place
-
-# Compute non-linear term in Fourier space from velocity field in physical
-# space. Optional keyword arguments may be passed to avoid allocations.
-function ns_nonlinear!(
- F̂s, vs, plan, grid_fourier;
- vbuf = similar(vs[1]), v̂buf = similar(F̂s[1]),
- )
- # Compute F_i = ∂_j (v_i v_j) for each i.
- # In Fourier space: F̂_i = im * k_j * FFT(v_i * v_j)
- w, ŵ = vbuf, v̂buf
- @inbounds for (i, F̂i) ∈ enumerate(F̂s)
- F̂i .= 0
- vi = vs[i]
- for (j, vj) ∈ enumerate(vs)
- w .= vi .* vj # w = v_i * v_j in physical space
- mul!(ŵ, plan, w) # same in Fourier space
- # Add derivative in Fourier space
- for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I] # = (kx, ky, kz)
- kj = k⃗[j]
- F̂i[I] += im * kj * ŵ[I]
- end
- end
- end
- F̂s
-end
ns_nonlinear! (generic function with 1 method)
As an example, let's use this function on our initial velocity field:
Strictly speaking, computing the non-linear term by collocation can lead to aliasing errors, as the quadratic term excites Fourier modes that fall beyond the range of resolved wave numbers. The typical solution is to apply Orzsag's 2/3 rule to zero-out the Fourier coefficients associated to the highest wave numbers. We define a function that applies this procedure below.
function dealias_twothirds!(ŵs::Tuple, grid_fourier, ks_global)
- ks_max = maximum.(abs, ks_global) # maximum stored wave numbers (kx_max, ky_max, kz_max)
- ks_lim = (2 / 3) .* ks_max
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I]
- if any(abs.(k⃗) .> ks_lim)
- for ŵ ∈ ŵs
- ŵ[I] = 0
- end
- end
- end
- ŵs
-end
-
-# We can apply this on the previously computed non-linear term:
-dealias_twothirds!(F̂s, grid_fourier, ks_global);
Finally, we implement the projection associated to the incompressibility condition:
function project_divergence_free!(ûs, grid_fourier)
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I]
- k² = sum(abs2, k⃗)
- iszero(k²) && continue # avoid division by zero
- û = getindex.(ûs, Ref(I)) # (ûs[1][I], ûs[2][I], ...)
- for i ∈ eachindex(û)
- ŵ = û[i]
- for j ∈ eachindex(û)
- ŵ -= k⃗[i] * k⃗[j] * û[j] / k²
- end
- ûs[i][I] = ŵ
- end
- end
- ûs
-end
project_divergence_free! (generic function with 1 method)
We can verify the correctness of the projection operator by checking that the initial velocity field is not modified by it, since it is already incompressible:
v̂s_proj = project_divergence_free!(copy.(v̂s), grid_fourier)
-v̂s_proj .≈ v̂s # the last one may be false because v_z = 0 initially
To perform the time integration of the Navier–Stokes equations, we will use the timestepping routines implemented in the DifferentialEquations.jl suite. For simplicity, we use here an explicit Runge–Kutta scheme. In this case, we just need to write a function that computes the right-hand side of the Navier–Stokes equations in Fourier space:
function ns_rhs!(
- dvs::NTuple{N, <:PencilArray}, vs::NTuple{N, <:PencilArray}, p, t,
- ) where {N}
- # 1. Compute non-linear term and dealias it
- (; plan, cache, ks_global, grid_fourier) = p
- F̂s = cache.F̂s
- ns_nonlinear!(F̂s, vs, plan, grid_fourier; vbuf = dvs[1], v̂buf = cache.v̂s[1])
- dealias_twothirds!(F̂s, grid_fourier, ks_global)
-
- # 2. Project onto divergence-free space
- project_divergence_free!(F̂s, grid_fourier)
-
- # 3. Transform velocity to Fourier space
- v̂s = cache.v̂s
- map((v, v̂) -> mul!(v̂, plan, v), vs, v̂s)
-
- # 4. Add viscous term (and multiply projected non-linear term by -1)
- ν = p.ν
- for n ∈ eachindex(v̂s)
- v̂ = v̂s[n]
- F̂ = F̂s[n]
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I] # = (kx, ky, kz)
- k² = sum(abs2, k⃗)
- F̂[I] = -F̂[I] - ν * k² * v̂[I]
- end
- end
-
- # 5. Transform RHS back to physical space
- map((dv, dv̂) -> ldiv!(dv, plan, dv̂), dvs, F̂s)
-
- nothing
-end
ns_rhs! (generic function with 1 method)
For the time-stepping, we load OrdinaryDiffEq.jl from the DifferentialEquations.jl suite and set-up the simulation. Since DifferentialEquations.jl can't directly deal with tuples of arrays, we convert the input data to the ArrayPartition type and write an interface function to make things work with our functions defined above.
We finally solve the problem over time and plot the vorticity associated to the solution. It is also useful to look at the energy spectrum $E(k)$, to see if the small scales are correctly resolved. To obtain a turbulent flow, the viscosity $ν$ must be small enough to allow the transient appearance of an energy cascade towards the small scales (i.e. from small to large $k$), while high enough to allow the small-scale motions to be correctly resolved.
This package provides multidimensional FFTs and related transforms on MPI-distributed Julia arrays via the PencilArrays package.
The name of this package originates from the decomposition of 3D domains along two out of three dimensions, sometimes called pencil decomposition. This is illustrated by the figure below,[1] where each coloured block is managed by a different MPI process. Typically, one wants to compute FFTs on a scalar or vector field along the three spatial dimensions. In the case of a pencil decomposition, 3D FFTs are performed one dimension at a time, along the non-decomposed direction. Transforms must then be interleaved with global data transpositions to switch between pencil configurations. In high-performance computing environments, such data transpositions are generally the most expensive part of a parallel FFT computation, due to the large cost of communications between computing nodes.
-
-
More generally, PencilFFTs allows to decompose and perform FFTs on geometries of arbitrary dimension $N$. The decompositions can be performed along an arbitrary number $M < N$ of dimensions.[2] Moreover, the transforms applied along each dimension can be arbitrarily chosen (and combined) among those supported by FFTW.jl, including complex-to-complex, real-to-complex and real-to-real transforms.
The generic and efficient implementation of this package is greatly enabled by the use of zero-cost abstractions in Julia. As shown in the Benchmarks section, PencilFFTs scales well to large numbers of processes, and performs similarly to the Fortran implementation of P3DFFT, possibly the most popular library for computing parallel FFTs using 2D domain decomposition.
FFTW3 implements distributed-memory transforms using MPI, but these are limited to 1D decompositions. Also, this functionality is not currently included in the FFTW.jl wrappers.
PFFT is a very general parallel FFT library written in C.
P3DFFT implements parallel 3D FFTs using pencil decomposition in Fortran and C++.
2DECOMP&FFT is another parallel 3D FFT library using pencil decomposition written in Fortran.
where $\bm{v}(\bm{x}, t)$ and $p(\bm{x}, t)$ are respectively the velocity and pressure fields, $ν$ is the fluid kinematic viscosity and $ρ$ is the fluid density.
We solve the above equations a 3D periodic domain using a standard Fourier pseudo-spectral method.
Let's check the number of MPI processes over which we're running our simulation:
MPI.Comm_size(comm)
2
We can now create a partitioning of the domain based on the number of grid points (Ns) and on the number of MPI processes. There are different ways to do this. For simplicity, here we do it automatically following the PencilArrays.jl docs:
pen = Pencil(Ns, comm)
Decomposition of 3D data
- Data dimensions: (64, 64, 64)
- Decomposed dimensions: (2, 3)
- Data permutation: NoPermutation()
- Array type: Array
The subdomain associated to the local MPI process can be obtained using range_local:
range_local(pen)
(1:64, 1:32, 1:64)
We now construct a distributed vector field that follows the decomposition configuration we just created:
where $u₀$ and $k₀$ are two parameters setting the amplitude and the period of the velocity field.
To set the initial condition, each MPI process needs to know which portion of the physical grid it has been attributed. For this, PencilArrays.jl includes a localgrid helper function:
where $\bm{k} = (k_x, k_y, k_z)$ are the discrete wave numbers.
The wave numbers can be obtained using the fftfreq function. Since we perform a real-to-complex transform along the first dimension, we use rfftfreq instead for $k_x$:
Note that, in Fourier space, the domain decomposition is performed along the directions $x$ and $y$:
pencil(v̂s[1])
Decomposition of 3D data
- Data dimensions: (33, 64, 64)
- Decomposed dimensions: (1, 2)
- Data permutation: Permutation(3, 2, 1)
- Array type: Array
This is because the 3D FFTs are performed one dimension at a time, with the $x$ direction first and the $z$ direction last. To efficiently perform an FFT along a given direction (taking advantage of serial FFT implementations like FFTW), all the data along that direction must be contained locally within a single MPI process. For that reason, data redistributions (or transpositions) among MPI processes are performed behind the scenes during each FFT computation. Such transpositions require important communications between MPI processes, and are usually the most time-consuming aspect of massively-parallel simulations using this kind of methods.
To solve the Navier–Stokes equations in Fourier space, we will also need the respective wave numbers $\bm{k}$ associated to the local MPI process. Similarly to the local grid points, these are obtained using the localgrid function:
As an example, let's first use this to compute and plot the vorticity associated to the initial condition. The vorticity is defined as the curl of the velocity, $\bm{ω} = \bm{∇} × \bm{v}$. In Fourier space, this becomes $\hat{\bm{ω}} = i \bm{k} × \hat{\bm{v}}$.
using StaticArrays: SVector
-using LinearAlgebra: ×
-
-function curl_fourier!(
- ω̂s::NTuple{N, <:PencilArray}, v̂s::NTuple{N, <:PencilArray}, grid_fourier,
- ) where {N}
- @inbounds for I ∈ eachindex(grid_fourier)
- # We use StaticArrays for the cross product between small vectors.
- ik⃗ = im * SVector(grid_fourier[I])
- v⃗ = SVector(getindex.(v̂s, Ref(I))) # = (v̂s[1][I], v̂s[2][I], ...)
- ω⃗ = ik⃗ × v⃗
- for n ∈ eachindex(ω⃗)
- ω̂s[n][I] = ω⃗[n]
- end
- end
- ω̂s
-end
-
-ω̂s = similar.(v̂s)
-curl_fourier!(ω̂s, v̂s, grid_fourier);
We finally transform back to physical space and plot the result:
where $\mathcal{P}_{\bm{k}}$ is a projection operator allowing to preserve the incompressibility condition $\bm{∇} ⋅ \bm{v} = 0$. This operator encodes the action of the pressure gradient term, which serves precisely to enforce incompressibility. Note that, because of this, the pressure gradient dissapears from the equations.
Now that we have the wave numbers $\bm{k}$, computing the linear viscous term in Fourier space is straighforward once we have the Fourier coefficients $\hat{\bm{v}}_{\bm{k}}$ of the velocity field. What is slightly more challenging (and much more costly) is the computation of the non-linear term in Fourier space, $\hat{\bm{F}}_{\bm{k}} = \left[ \widehat{(\bm{v} ⋅ \bm{∇}) \bm{v}} \right]_{\bm{k}}$. In the pseudo-spectral method, the quadratic nonlinearity is computed by collocation in physical space (i.e. this term is evaluated at grid points), while derivatives are computed in Fourier space. This requires transforming fields back and forth between both spaces.
Below we implement a function that computes the non-linear term in Fourier space based on its convective form $(\bm{v} ⋅ \bm{∇}) \bm{v} = \bm{∇} ⋅ (\bm{v} ⊗ \bm{v})$. Note that this equivalence uses the incompressibility condition $\bm{∇} ⋅ \bm{v} = 0$.
using LinearAlgebra: mul!, ldiv! # for applying FFT plans in-place
-
-# Compute non-linear term in Fourier space from velocity field in physical
-# space. Optional keyword arguments may be passed to avoid allocations.
-function ns_nonlinear!(
- F̂s, vs, plan, grid_fourier;
- vbuf = similar(vs[1]), v̂buf = similar(F̂s[1]),
- )
- # Compute F_i = ∂_j (v_i v_j) for each i.
- # In Fourier space: F̂_i = im * k_j * FFT(v_i * v_j)
- w, ŵ = vbuf, v̂buf
- @inbounds for (i, F̂i) ∈ enumerate(F̂s)
- F̂i .= 0
- vi = vs[i]
- for (j, vj) ∈ enumerate(vs)
- w .= vi .* vj # w = v_i * v_j in physical space
- mul!(ŵ, plan, w) # same in Fourier space
- # Add derivative in Fourier space
- for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I] # = (kx, ky, kz)
- kj = k⃗[j]
- F̂i[I] += im * kj * ŵ[I]
- end
- end
- end
- F̂s
-end
ns_nonlinear! (generic function with 1 method)
As an example, let's use this function on our initial velocity field:
Strictly speaking, computing the non-linear term by collocation can lead to aliasing errors, as the quadratic term excites Fourier modes that fall beyond the range of resolved wave numbers. The typical solution is to apply Orzsag's 2/3 rule to zero-out the Fourier coefficients associated to the highest wave numbers. We define a function that applies this procedure below.
function dealias_twothirds!(ŵs::Tuple, grid_fourier, ks_global)
- ks_max = maximum.(abs, ks_global) # maximum stored wave numbers (kx_max, ky_max, kz_max)
- ks_lim = (2 / 3) .* ks_max
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I]
- if any(abs.(k⃗) .> ks_lim)
- for ŵ ∈ ŵs
- ŵ[I] = 0
- end
- end
- end
- ŵs
-end
-
-# We can apply this on the previously computed non-linear term:
-dealias_twothirds!(F̂s, grid_fourier, ks_global);
Finally, we implement the projection associated to the incompressibility condition:
function project_divergence_free!(ûs, grid_fourier)
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I]
- k² = sum(abs2, k⃗)
- iszero(k²) && continue # avoid division by zero
- û = getindex.(ûs, Ref(I)) # (ûs[1][I], ûs[2][I], ...)
- for i ∈ eachindex(û)
- ŵ = û[i]
- for j ∈ eachindex(û)
- ŵ -= k⃗[i] * k⃗[j] * û[j] / k²
- end
- ûs[i][I] = ŵ
- end
- end
- ûs
-end
project_divergence_free! (generic function with 1 method)
We can verify the correctness of the projection operator by checking that the initial velocity field is not modified by it, since it is already incompressible:
v̂s_proj = project_divergence_free!(copy.(v̂s), grid_fourier)
-v̂s_proj .≈ v̂s # the last one may be false because v_z = 0 initially
To perform the time integration of the Navier–Stokes equations, we will use the timestepping routines implemented in the DifferentialEquations.jl suite. For simplicity, we use here an explicit Runge–Kutta scheme. In this case, we just need to write a function that computes the right-hand side of the Navier–Stokes equations in Fourier space:
function ns_rhs!(
- dvs::NTuple{N, <:PencilArray}, vs::NTuple{N, <:PencilArray}, p, t,
- ) where {N}
- # 1. Compute non-linear term and dealias it
- (; plan, cache, ks_global, grid_fourier) = p
- F̂s = cache.F̂s
- ns_nonlinear!(F̂s, vs, plan, grid_fourier; vbuf = dvs[1], v̂buf = cache.v̂s[1])
- dealias_twothirds!(F̂s, grid_fourier, ks_global)
-
- # 2. Project onto divergence-free space
- project_divergence_free!(F̂s, grid_fourier)
-
- # 3. Transform velocity to Fourier space
- v̂s = cache.v̂s
- map((v, v̂) -> mul!(v̂, plan, v), vs, v̂s)
-
- # 4. Add viscous term (and multiply projected non-linear term by -1)
- ν = p.ν
- for n ∈ eachindex(v̂s)
- v̂ = v̂s[n]
- F̂ = F̂s[n]
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I] # = (kx, ky, kz)
- k² = sum(abs2, k⃗)
- F̂[I] = -F̂[I] - ν * k² * v̂[I]
- end
- end
-
- # 5. Transform RHS back to physical space
- map((dv, dv̂) -> ldiv!(dv, plan, dv̂), dvs, F̂s)
-
- nothing
-end
ns_rhs! (generic function with 1 method)
For the time-stepping, we load OrdinaryDiffEq.jl from the DifferentialEquations.jl suite and set-up the simulation. Since DifferentialEquations.jl can't directly deal with tuples of arrays, we convert the input data to the ArrayPartition type and write an interface function to make things work with our functions defined above.
We finally solve the problem over time and plot the vorticity associated to the solution. It is also useful to look at the energy spectrum $E(k)$, to see if the small scales are correctly resolved. To obtain a turbulent flow, the viscosity $ν$ must be small enough to allow the transient appearance of an energy cascade towards the small scales (i.e. from small to large $k$), while high enough to allow the small-scale motions to be correctly resolved.
where $\bm{v}(\bm{x}, t)$ and $p(\bm{x}, t)$ are respectively the velocity and pressure fields, $ν$ is the fluid kinematic viscosity and $ρ$ is the fluid density.
We solve the above equations a 3D periodic domain using a standard Fourier pseudo-spectral method.
Let's check the number of MPI processes over which we're running our simulation:
MPI.Comm_size(comm)
2
We can now create a partitioning of the domain based on the number of grid points (Ns) and on the number of MPI processes. There are different ways to do this. For simplicity, here we do it automatically following the PencilArrays.jl docs:
pen = Pencil(Ns, comm)
Decomposition of 3D data
- Data dimensions: (64, 64, 64)
- Decomposed dimensions: (2, 3)
- Data permutation: NoPermutation()
- Array type: Array
The subdomain associated to the local MPI process can be obtained using range_local:
range_local(pen)
(1:64, 1:32, 1:64)
We now construct a distributed vector field that follows the decomposition configuration we just created:
where $u₀$ and $k₀$ are two parameters setting the amplitude and the period of the velocity field.
To set the initial condition, each MPI process needs to know which portion of the physical grid it has been attributed. For this, PencilArrays.jl includes a localgrid helper function:
where $\bm{k} = (k_x, k_y, k_z)$ are the discrete wave numbers.
The wave numbers can be obtained using the fftfreq function. Since we perform a real-to-complex transform along the first dimension, we use rfftfreq instead for $k_x$:
Note that, in Fourier space, the domain decomposition is performed along the directions $x$ and $y$:
pencil(v̂s[1])
Decomposition of 3D data
- Data dimensions: (33, 64, 64)
- Decomposed dimensions: (1, 2)
- Data permutation: Permutation(3, 2, 1)
- Array type: Array
This is because the 3D FFTs are performed one dimension at a time, with the $x$ direction first and the $z$ direction last. To efficiently perform an FFT along a given direction (taking advantage of serial FFT implementations like FFTW), all the data along that direction must be contained locally within a single MPI process. For that reason, data redistributions (or transpositions) among MPI processes are performed behind the scenes during each FFT computation. Such transpositions require important communications between MPI processes, and are usually the most time-consuming aspect of massively-parallel simulations using this kind of methods.
To solve the Navier–Stokes equations in Fourier space, we will also need the respective wave numbers $\bm{k}$ associated to the local MPI process. Similarly to the local grid points, these are obtained using the localgrid function:
As an example, let's first use this to compute and plot the vorticity associated to the initial condition. The vorticity is defined as the curl of the velocity, $\bm{ω} = \bm{∇} × \bm{v}$. In Fourier space, this becomes $\hat{\bm{ω}} = i \bm{k} × \hat{\bm{v}}$.
using StaticArrays: SVector
-using LinearAlgebra: ×
-
-function curl_fourier!(
- ω̂s::NTuple{N, <:PencilArray}, v̂s::NTuple{N, <:PencilArray}, grid_fourier,
- ) where {N}
- @inbounds for I ∈ eachindex(grid_fourier)
- # We use StaticArrays for the cross product between small vectors.
- ik⃗ = im * SVector(grid_fourier[I])
- v⃗ = SVector(getindex.(v̂s, Ref(I))) # = (v̂s[1][I], v̂s[2][I], ...)
- ω⃗ = ik⃗ × v⃗
- for n ∈ eachindex(ω⃗)
- ω̂s[n][I] = ω⃗[n]
- end
- end
- ω̂s
-end
-
-ω̂s = similar.(v̂s)
-curl_fourier!(ω̂s, v̂s, grid_fourier);
We finally transform back to physical space and plot the result:
where $\mathcal{P}_{\bm{k}}$ is a projection operator allowing to preserve the incompressibility condition $\bm{∇} ⋅ \bm{v} = 0$. This operator encodes the action of the pressure gradient term, which serves precisely to enforce incompressibility. Note that, because of this, the pressure gradient dissapears from the equations.
Now that we have the wave numbers $\bm{k}$, computing the linear viscous term in Fourier space is straighforward once we have the Fourier coefficients $\hat{\bm{v}}_{\bm{k}}$ of the velocity field. What is slightly more challenging (and much more costly) is the computation of the non-linear term in Fourier space, $\hat{\bm{F}}_{\bm{k}} = \left[ \widehat{(\bm{v} ⋅ \bm{∇}) \bm{v}} \right]_{\bm{k}}$. In the pseudo-spectral method, the quadratic nonlinearity is computed by collocation in physical space (i.e. this term is evaluated at grid points), while derivatives are computed in Fourier space. This requires transforming fields back and forth between both spaces.
Below we implement a function that computes the non-linear term in Fourier space based on its convective form $(\bm{v} ⋅ \bm{∇}) \bm{v} = \bm{∇} ⋅ (\bm{v} ⊗ \bm{v})$. Note that this equivalence uses the incompressibility condition $\bm{∇} ⋅ \bm{v} = 0$.
using LinearAlgebra: mul!, ldiv! # for applying FFT plans in-place
-
-# Compute non-linear term in Fourier space from velocity field in physical
-# space. Optional keyword arguments may be passed to avoid allocations.
-function ns_nonlinear!(
- F̂s, vs, plan, grid_fourier;
- vbuf = similar(vs[1]), v̂buf = similar(F̂s[1]),
- )
- # Compute F_i = ∂_j (v_i v_j) for each i.
- # In Fourier space: F̂_i = im * k_j * FFT(v_i * v_j)
- w, ŵ = vbuf, v̂buf
- @inbounds for (i, F̂i) ∈ enumerate(F̂s)
- F̂i .= 0
- vi = vs[i]
- for (j, vj) ∈ enumerate(vs)
- w .= vi .* vj # w = v_i * v_j in physical space
- mul!(ŵ, plan, w) # same in Fourier space
- # Add derivative in Fourier space
- for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I] # = (kx, ky, kz)
- kj = k⃗[j]
- F̂i[I] += im * kj * ŵ[I]
- end
- end
- end
- F̂s
-end
ns_nonlinear! (generic function with 1 method)
As an example, let's use this function on our initial velocity field:
Strictly speaking, computing the non-linear term by collocation can lead to aliasing errors, as the quadratic term excites Fourier modes that fall beyond the range of resolved wave numbers. The typical solution is to apply Orzsag's 2/3 rule to zero-out the Fourier coefficients associated to the highest wave numbers. We define a function that applies this procedure below.
function dealias_twothirds!(ŵs::Tuple, grid_fourier, ks_global)
- ks_max = maximum.(abs, ks_global) # maximum stored wave numbers (kx_max, ky_max, kz_max)
- ks_lim = (2 / 3) .* ks_max
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I]
- if any(abs.(k⃗) .> ks_lim)
- for ŵ ∈ ŵs
- ŵ[I] = 0
- end
- end
- end
- ŵs
-end
-
-# We can apply this on the previously computed non-linear term:
-dealias_twothirds!(F̂s, grid_fourier, ks_global);
Finally, we implement the projection associated to the incompressibility condition:
function project_divergence_free!(ûs, grid_fourier)
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I]
- k² = sum(abs2, k⃗)
- iszero(k²) && continue # avoid division by zero
- û = getindex.(ûs, Ref(I)) # (ûs[1][I], ûs[2][I], ...)
- for i ∈ eachindex(û)
- ŵ = û[i]
- for j ∈ eachindex(û)
- ŵ -= k⃗[i] * k⃗[j] * û[j] / k²
- end
- ûs[i][I] = ŵ
- end
- end
- ûs
-end
project_divergence_free! (generic function with 1 method)
We can verify the correctness of the projection operator by checking that the initial velocity field is not modified by it, since it is already incompressible:
v̂s_proj = project_divergence_free!(copy.(v̂s), grid_fourier)
-v̂s_proj .≈ v̂s # the last one may be false because v_z = 0 initially
To perform the time integration of the Navier–Stokes equations, we will use the timestepping routines implemented in the DifferentialEquations.jl suite. For simplicity, we use here an explicit Runge–Kutta scheme. In this case, we just need to write a function that computes the right-hand side of the Navier–Stokes equations in Fourier space:
function ns_rhs!(
- dvs::NTuple{N, <:PencilArray}, vs::NTuple{N, <:PencilArray}, p, t,
- ) where {N}
- # 1. Compute non-linear term and dealias it
- (; plan, cache, ks_global, grid_fourier) = p
- F̂s = cache.F̂s
- ns_nonlinear!(F̂s, vs, plan, grid_fourier; vbuf = dvs[1], v̂buf = cache.v̂s[1])
- dealias_twothirds!(F̂s, grid_fourier, ks_global)
-
- # 2. Project onto divergence-free space
- project_divergence_free!(F̂s, grid_fourier)
-
- # 3. Transform velocity to Fourier space
- v̂s = cache.v̂s
- map((v, v̂) -> mul!(v̂, plan, v), vs, v̂s)
-
- # 4. Add viscous term (and multiply projected non-linear term by -1)
- ν = p.ν
- for n ∈ eachindex(v̂s)
- v̂ = v̂s[n]
- F̂ = F̂s[n]
- @inbounds for I ∈ eachindex(grid_fourier)
- k⃗ = grid_fourier[I] # = (kx, ky, kz)
- k² = sum(abs2, k⃗)
- F̂[I] = -F̂[I] - ν * k² * v̂[I]
- end
- end
-
- # 5. Transform RHS back to physical space
- map((dv, dv̂) -> ldiv!(dv, plan, dv̂), dvs, F̂s)
-
- nothing
-end
ns_rhs! (generic function with 1 method)
For the time-stepping, we load OrdinaryDiffEq.jl from the DifferentialEquations.jl suite and set-up the simulation. Since DifferentialEquations.jl can't directly deal with tuples of arrays, we convert the input data to the ArrayPartition type and write an interface function to make things work with our functions defined above.
We finally solve the problem over time and plot the vorticity associated to the solution. It is also useful to look at the energy spectrum $E(k)$, to see if the small scales are correctly resolved. To obtain a turbulent flow, the viscosity $ν$ must be small enough to allow the transient appearance of an energy cascade towards the small scales (i.e. from small to large $k$), while high enough to allow the small-scale motions to be correctly resolved.
This document was generated with Documenter.jl version 0.27.24 on Wednesday 15 March 2023. Using Julia version 1.9.0-rc1.
diff --git a/previews/PR62/generated/vorticity_proc1.mp4 b/previews/PR62/generated/vorticity_proc1.mp4
deleted file mode 100644
index f73f165f..00000000
Binary files a/previews/PR62/generated/vorticity_proc1.mp4 and /dev/null differ
diff --git a/previews/PR62/search_index.js b/previews/PR62/search_index.js
deleted file mode 100644
index 0b97683d..00000000
--- a/previews/PR62/search_index.js
+++ /dev/null
@@ -1,3 +0,0 @@
-var documenterSearchIndex = {"docs":
-[{"location":"tutorial/#Tutorial","page":"Tutorial","title":"Tutorial","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The following tutorial shows how to perform a 3D FFT of real periodic data defined on a grid of N_x N_y N_z points.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"
\n \n \n
","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"By default, the domain is distributed on a 2D MPI topology of dimensions N_1 N_2. As an example, the above figure shows such a topology with N_1 = 4 and N_2 = 3, for a total of 12 MPI processes.","category":"page"},{"location":"tutorial/#tutorial:creating_plans","page":"Tutorial","title":"Creating plans","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The first thing to do is to create a domain decomposition configuration for the given dataset dimensions N_x N_y N_z. In the framework of PencilArrays, such a configuration is described by a Pencil object. As described in the PencilArrays docs, we can let the Pencil constructor automatically determine such a configuration. For this, only an MPI communicator and the dataset dimensions are needed:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"using MPI\nusing PencilFFTs\n\nMPI.Init()\ncomm = MPI.COMM_WORLD\n\n# Input data dimensions (Nx × Ny × Nz)\ndims = (16, 32, 64)\npen = Pencil(dims, comm)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"By default this creates a 2D decomposition (for the case of a 3D dataset), but one can change this as detailed in the PencilArrays documentation linked above.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"We can now create a PencilFFTPlan, which requires information on decomposition configuration (the Pencil object) and on the transforms that will be applied:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# Apply a 3D real-to-complex (r2c) FFT.\ntransform = Transforms.RFFT()\n\n# Note that, for more control, one can instead separately specify the transforms along each dimension:\n# transform = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT())\n\n# Create plan\nplan = PencilFFTPlan(pen, transform)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"See the PencilFFTPlan constructor for details on the accepted options, and the Transforms module for the possible transforms. It is also possible to enable fine-grained performance measurements via the TimerOutputs package, as described in Measuring performance.","category":"page"},{"location":"tutorial/#Allocating-data","page":"Tutorial","title":"Allocating data","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Next, we want to apply the plan on some data. Transforms may only be applied on PencilArrays, which are array wrappers that include MPI decomposition information (in some sense, analogous to DistributedArrays in Julia's distributed computing approach). The helper function allocate_input can be used to allocate a PencilArray that is compatible with our plan:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# In our example, this returns a 3D PencilArray of real data (Float64).\nu = allocate_input(plan)\n\n# Fill the array with some (random) data\nusing Random\nrandn!(u)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"PencilArrays are a subtype of AbstractArray, and thus they support all common array operations.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Similarly, to preallocate output data, one can use allocate_output:","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"# In our example, this returns a 3D PencilArray of complex data (Complex{Float64}).\nv = allocate_output(plan)","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"This is only required if one wants to apply the plans using a preallocated output (with mul!, see right below).","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The data types returned by allocate_input and allocate_output are slightly different when working with in-place transforms. See the in-place example for details.","category":"page"},{"location":"tutorial/#Applying-plans","page":"Tutorial","title":"Applying plans","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The interface to apply plans is consistent with that of AbstractFFTs. Namely, * and mul! are respectively used for forward transforms without and with preallocated output data. Similarly, \\ and ldiv! are used for backward transforms.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"using LinearAlgebra # for mul!, ldiv!\n\n# Apply plan on `u` with `v` as an output\nmul!(v, plan, u)\n\n# Apply backward plan on `v` with `w` as an output\nw = similar(u)\nldiv!(w, plan, v) # now w ≈ u","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"Note that, consistently with AbstractFFTs, normalisation is performed at the end of a backward transform, so that the original data is recovered when applying a forward followed by a backward transform.","category":"page"},{"location":"tutorial/#Accessing-and-modifying-data","page":"Tutorial","title":"Accessing and modifying data","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For any given MPI process, a PencilArray holds the data associated to its local partition in the global geometry. PencilArrays are accessed using local indices that start at 1, regardless of the location of the local process in the MPI topology. Note that PencilArrays, being based on regular Arrays, support both linear and Cartesian indexing (see the Julia docs for details).","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For convenience, the global_view function can be used to generate an OffsetArray wrapper that takes global indices.","category":"page"},{"location":"tutorial/#tutorial:output_data_layout","page":"Tutorial","title":"Output data layout","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"In memory, the dimensions of the transform output are by default reversed with respect to the input. That is, if the order of indices in the input data is (x, y, z), then the output has order (z, y, x) in memory. This detail is hidden from the user, and output arrays are always accessed in the same order as the input data, regardless of the underlying output dimension permutation. This applies to PencilArrays and to OffsetArrays returned by global_view.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The reasoning behind dimension permutations, is that they allow to always perform FFTs along the fastest array dimension and to avoid a local data transposition, resulting in performance gains. A similar approach is followed by other parallel FFT libraries. FFTW itself, in its distributed-memory routines, includes a flag that enables a similar behaviour. In PencilFFTs, index permutation is the default, but it can be disabled via the permute_dims flag of PencilFFTPlan.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"A great deal of work has been spent in making generic index permutations as efficient as possible, both in intermediate and in the output state of the multidimensional transforms. This has been achieved, in part, by making sure that permutations such as (3, 2, 1) are compile-time constants.","category":"page"},{"location":"tutorial/#Further-reading","page":"Tutorial","title":"Further reading","text":"","category":"section"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"For details on working with PencilArrays see the PencilArrays docs.","category":"page"},{"location":"tutorial/","page":"Tutorial","title":"Tutorial","text":"The examples on the sidebar further illustrate the use of transforms and provide an introduction to working with MPI-distributed data in the form of PencilArrays. In particular, the gradient example illustrates different ways of computing things using Fourier-transformed distributed arrays. Then, the incompressible Navier–Stokes example is a more advanced and complete example of a possible application of the PencilFFTs package.","category":"page"},{"location":"benchmarks/#Benchmarks","page":"Benchmarks","title":"Benchmarks","text":"","category":"section"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The performance of PencilFFTs.jl is comparable to that of other open-source parallel FFT libraries implemented in lower-level languages. Below, we show comparisons with the Fortran implementation of P3DFFT, possibly the most popular of these libraries. The benchmarks were performed on the Jean–Zay cluster of the IDRIS French computing centre (CNRS).","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The figure below shows strong scaling benchmarks of 3D real-to-complex FFTs using 2D (\"pencil\") decomposition. The benchmarks were run for input arrays of dimensions N_x N_y N_z = 512^3, 1024^3 and 2048^3. Each timing is averaged over 100 repetitions.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"
\n \n \n
","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"As seen above, PencilFFTs generally outperforms P3DFFT in its default setting. This is largely explained by the choice of using non-blocking point-to-point MPI communications (via MPI_Isend and MPI_Irecv), while P3DFFT uses collective MPI_Alltoallv calls. This enables PencilFFTs to perform data reordering operations on the partially received data while waiting for the incoming data, leading to better performance. Moreover, in contrast with P3DFFT, the high performance and scalability of PencilFFTs results from a highly generic code, handling decompositions in arbitrary dimensions and a relatively large (and extensible) variety of transformations.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"Note that PencilFFTs can optionally use collective communications (using MPI_Alltoallv) instead of point-to-point communications. For details, see the docs for PencilFFTPlan and for PencilArray transpositions. As seen above, collective communications generally perform worse than point-to-point ones, and runtimes are nearly indistinguishable from those of P3DFFT.","category":"page"},{"location":"benchmarks/#Benchmark-details","page":"Benchmarks","title":"Benchmark details","text":"","category":"section"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The benchmarks were performed using Julia 1.7-beta3 and Intel MPI 2019. We used PencilFFTs v0.12.5 with FFTW.jl v1.4.3 and MPI.jl v0.19.0. We used the Fortran implementation of P3DFFT, version 2.7.6, which was built with Intel 2019 compilers and linked to FFTW 3.3.8. The cluster where the benchmarks were run has Intel Cascade Lake 6248 processors with 2×20 cores per node.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The number of MPI processes along each decomposed dimension, P_1 and P_2, was automatically determined by a call to MPI_Dims_create, which tends to create a balanced decomposition with P_1 P_2. For instance, a total of 1024 processes is divided into P_1 = P_2 = 32. Different results may be obtained with other combinations, but this was not benchmarked.","category":"page"},{"location":"benchmarks/","page":"Benchmarks","title":"Benchmarks","text":"The source files used to generate this benchmark, as well as the raw benchmark results, are all available in the PencilFFTs repo.","category":"page"},{"location":"GlobalFFTParams/#Global-FFT-parameters","page":"Global FFT parameters","title":"Global FFT parameters","text":"","category":"section"},{"location":"GlobalFFTParams/","page":"Global FFT parameters","title":"Global FFT parameters","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"GlobalFFTParams/","page":"Global FFT parameters","title":"Global FFT parameters","text":"GlobalFFTParams","category":"page"},{"location":"GlobalFFTParams/#PencilFFTs.GlobalFFTParams","page":"Global FFT parameters","title":"PencilFFTs.GlobalFFTParams","text":"GlobalFFTParams{T, N, inplace}\n\nSpecifies the global parameters for an N-dimensional distributed transform. These include the element type T and global data sizes of input and output data, as well as the transform types to be performed along each dimension.\n\n\n\nGlobalFFTParams(size_global, transforms, [real_type=Float64])\n\nDefine parameters for N-dimensional transform.\n\ntransforms must be a tuple of length N specifying the transforms to be applied along each dimension. Each element must be a subtype of Transforms.AbstractTransform. For all the possible transforms, see Transform types.\n\nThe element type must be a real type accepted by FFTW, i.e. either Float32 or Float64.\n\nNote that the transforms are applied one dimension at a time, with the leftmost dimension first for forward transforms.\n\nExample\n\nTo perform a 3D FFT of real data, first a real-to-complex FFT must be applied along the first dimension, followed by two complex-to-complex FFTs along the other dimensions:\n\njulia> size_global = (64, 32, 128); # size of real input data\n\njulia> transforms = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT());\n\njulia> fft_params = PencilFFTs.GlobalFFTParams(size_global, transforms)\nTransforms: (RFFT, FFT, FFT)\nInput type: Float64\nGlobal dimensions: (64, 32, 128) -> (33, 32, 128)\n\n\n\n\n\n","category":"type"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/in-place.jl\"","category":"page"},{"location":"generated/in-place/#In-place-transforms","page":"In-place transforms","title":"In-place transforms","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Complex-to-complex and real-to-real transforms can be performed in-place, enabling important memory savings. The procedure is very similar to that of out-of-place transforms described in the tutorial. The differences are illustrated in the sections below.","category":"page"},{"location":"generated/in-place/#Creating-a-domain-partition","page":"In-place transforms","title":"Creating a domain partition","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"We start by partitioning a domain of dimensions 163264 along all available MPI processes.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"using PencilFFTs\nusing MPI\nMPI.Init()\n\ndims_global = (16, 32, 64) # global dimensions","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Such a partitioning is described by a Pencil object. Here we choose to decompose the domain along the last two dimensions. In this case, the actual number of processes along each of these dimensions is chosen automatically.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"decomp_dims = (2, 3)\ncomm = MPI.COMM_WORLD\npen = Pencil(dims_global, decomp_dims, comm)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"warning: Allowed decompositions\nDistributed transforms using PencilFFTs.jl require that the first dimension is not decomposed. In other words, if one wants to perform transforms, then decomp_dims above must not contain 1.","category":"page"},{"location":"generated/in-place/#Creating-in-place-plans","page":"In-place transforms","title":"Creating in-place plans","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"To create an in-place plan, pass an in-place transform such as Transforms.FFT! or Transforms.R2R! to PencilFFTPlan. For instance:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"# Perform a 3D in-place complex-to-complex FFT.\ntransform = Transforms.FFT!()\n\n# Note that one can also combine different types of in-place transforms.\n# For instance:\n# transform = (\n# Transforms.R2R!(FFTW.REDFT01),\n# Transforms.FFT!(),\n# Transforms.R2R!(FFTW.DHT),\n# )","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"We can now create a distributed plan from the previously-created domain partition and the chosen transform.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan = PencilFFTPlan(pen, transform)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Note that in-place real-to-complex transforms are not currently supported. (In other words, the RFFT! transform type is not defined.)","category":"page"},{"location":"generated/in-place/#Allocating-data","page":"In-place transforms","title":"Allocating data","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"As with out-of-place plans, data should be allocated using allocate_input. The difference is that, for in-place plans, this function returns a ManyPencilArray object, which is a container holding multiple PencilArray views sharing the same memory space.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"# Allocate data for the plan.\n# Since `plan` is in-place, this returns a `ManyPencilArray` container.\nA = allocate_input(plan)\nsummary(A)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Note that allocate_output also works for in-place plans. In this case, it returns exactly the same thing as allocate_input.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"As shown in the next section, in-place plans must be applied on the returned ManyPencilArray. On the other hand, one usually wants to access and modify data, and for this one needs the PencilArray views contained in the ManyPencilArray. The input and output array views can be obtained by calling first(::ManyPencilArray) and last(::ManyPencilArray).","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"For instance, we can initialise the input array with some data before transforming:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"using Random\nu_in = first(A) # input data view\nrandn!(u_in)\nsummary(u_in)","category":"page"},{"location":"generated/in-place/#Applying-plans","page":"In-place transforms","title":"Applying plans","text":"","category":"section"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Like in FFTW.jl, one can perform in-place transforms using the * and \\ operators. As mentioned above, in-place plans must be applied on the ManyPencilArray containers returned by allocate_input.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan * A; # performs in-place forward transform\nnothing #hide","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"After performing an in-place transform, data contained in u_in has been overwritten and has no \"physical\" meaning. In other words, u_in should not be used at this point. To access the transformed data, one should retrieve the output data view using last(A).","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"For instance, to compute the global sum of the transformed data:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"u_out = last(A) # output data view\nsum(u_out) # sum of transformed data (note that `sum` reduces over all processes)","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"Finally, we can perform a backward transform and do stuff with the input view:","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"plan \\ A; # perform in-place backward transform\nnothing #hide","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"At this point, the data can be once again found in the input view u_in, while u_out should not be accessed.","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"","category":"page"},{"location":"generated/in-place/","page":"In-place transforms","title":"In-place transforms","text":"This page was generated using Literate.jl.","category":"page"},{"location":"Transforms/#Available-transforms","page":"Available transforms","title":"Available transforms","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"CurrentModule = PencilFFTs.Transforms","category":"page"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"Transforms","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms","page":"Available transforms","title":"PencilFFTs.Transforms","text":"Defines different one-dimensional FFT-based transforms.\n\nThe transforms are all subtypes of an AbstractTransform type.\n\nWhen possible, the names of the transforms are kept consistent with the functions exported by AbstractFFTs.jl and FFTW.jl.\n\n\n\n\n\n","category":"module"},{"location":"Transforms/#Transform-types","page":"Available transforms","title":"Transform types","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"FFT\nFFT!\nBFFT\nBFFT!\n\nRFFT\nBRFFT\n\nR2R\nR2R!\n\nNoTransform\nNoTransform!","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.FFT","page":"Available transforms","title":"PencilFFTs.Transforms.FFT","text":"FFT()\n\nComplex-to-complex FFT.\n\nSee also AbstractFFTs.fft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.FFT!","page":"Available transforms","title":"PencilFFTs.Transforms.FFT!","text":"FFT!()\n\nIn-place version of FFT.\n\nSee also AbstractFFTs.fft!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BFFT","page":"Available transforms","title":"PencilFFTs.Transforms.BFFT","text":"BFFT()\n\nUnnormalised backward complex-to-complex FFT.\n\nLike AbstractFFTs.bfft, this transform is not normalised. To obtain the inverse transform, divide the output by the length of the transformed dimension.\n\nSee also AbstractFFTs.bfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BFFT!","page":"Available transforms","title":"PencilFFTs.Transforms.BFFT!","text":"BFFT()\n\nIn-place version of BFFT.\n\nSee also AbstractFFTs.bfft!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.RFFT","page":"Available transforms","title":"PencilFFTs.Transforms.RFFT","text":"RFFT()\n\nReal-to-complex FFT.\n\nSee also AbstractFFTs.rfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.BRFFT","page":"Available transforms","title":"PencilFFTs.Transforms.BRFFT","text":"BRFFT(d::Integer)\nBRFFT((d1, d2, ..., dN))\n\nUnnormalised inverse of RFFT.\n\nTo obtain the inverse transform, divide the output by the length of the transformed dimension (of the real output array).\n\nAs described in the AbstractFFTs docs, the length of the output cannot be fully inferred from the input length. For this reason, the BRFFT constructor accepts an optional d argument indicating the output length.\n\nFor multidimensional datasets, a tuple of dimensions (d1, d2, ..., dN) may also be passed. This is equivalent to passing just dN. In this case, the last dimension (dN) is the one that changes size between the input and output. Note that this is the opposite of FFTW.brfft. The reason is that, in PencilFFTs, the last dimension is the one along which a complex-to-real transform is performed.\n\nSee also AbstractFFTs.brfft.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.R2R","page":"Available transforms","title":"PencilFFTs.Transforms.R2R","text":"R2R(kind)\n\nReal-to-real transform of type kind.\n\nThe possible values of kind are those described in the FFTW.r2r docs and the FFTW manual:\n\ndiscrete cosine transforms: FFTW.REDFT00, FFTW.REDFT01, FFTW.REDFFT10, FFTW.REDFFT11\ndiscrete sine transforms: FFTW.RODFT00, FFTW.RODFT01, FFTW.RODFFT10, FFTW.RODFFT11\ndiscrete Hartley transform: FFTW.DHT\n\nNote: half-complex format DFTs (FFTW.R2HC, FFTW.HC2R) are not currently supported.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.R2R!","page":"Available transforms","title":"PencilFFTs.Transforms.R2R!","text":"R2R!(kind)\n\nIn-place version of R2R.\n\nSee also FFTW.r2r!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.NoTransform","page":"Available transforms","title":"PencilFFTs.Transforms.NoTransform","text":"NoTransform()\n\nIdentity transform.\n\nSpecifies that no transformation should be applied.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.NoTransform!","page":"Available transforms","title":"PencilFFTs.Transforms.NoTransform!","text":"NoTransform!()\n\nIn-place version of NoTransform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#Internals","page":"Available transforms","title":"Internals","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"What follows is used internally in PencilFFTs.","category":"page"},{"location":"Transforms/#Types","page":"Available transforms","title":"Types","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"AbstractCustomPlan\nAbstractTransform\nIdentityPlan\nIdentityPlan!\nPlan","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.AbstractCustomPlan","page":"Available transforms","title":"PencilFFTs.Transforms.AbstractCustomPlan","text":"AbstractCustomPlan\n\nAbstract type defining a custom plan, to be used as an alternative to FFTW plans (FFTW.FFTWPlan).\n\nThe only custom plan defined in this module is IdentityPlan. The user can define other custom plans that are also subtypes of AbstractCustomPlan.\n\nNote that plan returns a subtype of either AbstractFFTs.Plan or AbstractCustomPlan.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.AbstractTransform","page":"Available transforms","title":"PencilFFTs.Transforms.AbstractTransform","text":"AbstractTransform\n\nSpecifies a one-dimensional FFT-based transform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.IdentityPlan","page":"Available transforms","title":"PencilFFTs.Transforms.IdentityPlan","text":"IdentityPlan\n\nType of plan associated to NoTransform.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.IdentityPlan!","page":"Available transforms","title":"PencilFFTs.Transforms.IdentityPlan!","text":"IdentityPlan!\n\nType of plan associated to NoTransform!.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#PencilFFTs.Transforms.Plan","page":"Available transforms","title":"PencilFFTs.Transforms.Plan","text":"Plan = Union{AbstractFFTs.Plan, AbstractCustomPlan}\n\nUnion type representing any plan returned by plan.\n\nSee also AbstractCustomPlan.\n\n\n\n\n\n","category":"type"},{"location":"Transforms/#Functions","page":"Available transforms","title":"Functions","text":"","category":"section"},{"location":"Transforms/","page":"Available transforms","title":"Available transforms","text":"plan\n\nbinv\nscale_factor\n\neltype_input\neltype_output\nexpand_dims\nis_inplace\nkind\nlength_output","category":"page"},{"location":"Transforms/#PencilFFTs.Transforms.plan","page":"Available transforms","title":"PencilFFTs.Transforms.plan","text":"plan(transform::AbstractTransform, A, [dims];\n flags=FFTW.ESTIMATE, timelimit=Inf)\n\nCreate plan to transform array A along dimensions dims.\n\nIf dims is not specified, all dimensions of A are transformed.\n\nFor FFT plans, this function wraps the AbstractFFTs.jl and FFTW.jl plan creation functions. For more details on the function arguments, see AbstractFFTs.plan_fft.\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.binv","page":"Available transforms","title":"PencilFFTs.Transforms.binv","text":"binv(transform::AbstractTransform, d::Integer)\n\nReturns the backwards transform associated to the given transform.\n\nThe second argument must be the length of the first transformed dimension in the forward transform. It is used in particular when transform = RFFT(), to determine the length of the inverse (complex-to-real) transform. See the AbstractFFTs.irfft docs for details.\n\nThe backwards transform returned by this function is not normalised. The normalisation factor for a given array can be obtained by calling scale_factor.\n\nExample\n\njulia> binv(Transforms.FFT(), 42)\nBFFT\n\njulia> binv(Transforms.BRFFT(9), 42)\nRFFT\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.scale_factor","page":"Available transforms","title":"PencilFFTs.Transforms.scale_factor","text":"scale_factor(transform::AbstractTransform, A, [dims = 1:ndims(A)])\n\nGet factor required to normalise the given array after a transformation along dimensions dims (all dimensions by default).\n\nThe array A must have the dimensions of the transform input.\n\nImportant: the dimensions dims must be the same that were passed to plan.\n\nExamples\n\njulia> C = zeros(ComplexF32, 3, 4, 5);\n\njulia> scale_factor(Transforms.FFT(), C)\n60\n\njulia> scale_factor(Transforms.BFFT(), C)\n60\n\njulia> scale_factor(Transforms.BFFT(), C, 2:3)\n20\n\njulia> R = zeros(Float64, 3, 4, 5);\n\njulia> scale_factor(Transforms.RFFT(), R, 2)\n4\n\njulia> scale_factor(Transforms.RFFT(), R, 2:3)\n20\n\njulia> scale_factor(Transforms.BRFFT(8), C)\n96\n\njulia> scale_factor(Transforms.BRFFT(9), C)\n108\n\nThis will fail because the input of RFFT is real, and R is a complex array:\n\njulia> scale_factor(Transforms.RFFT(), C, 2:3)\nERROR: MethodError: no method matching scale_factor(::PencilFFTs.Transforms.RFFT, ::Array{ComplexF32, 3}, ::UnitRange{Int64})\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.eltype_input","page":"Available transforms","title":"PencilFFTs.Transforms.eltype_input","text":"eltype_input(transform::AbstractTransform, real_type<:AbstractFloat)\n\nDetermine input data type for a given transform given the floating point precision of the input data.\n\nSome transforms, such as R2R and NoTransform, can take both real and complex data. For those kinds of transforms, nothing is returned.\n\nExample\n\njulia> eltype_input(Transforms.FFT(), Float32)\nComplexF32 (alias for Complex{Float32})\n\njulia> eltype_input(Transforms.RFFT(), Float64)\nFloat64\n\njulia> eltype_input(Transforms.R2R(FFTW.REDFT01), Float64) # nothing\n\njulia> eltype_input(Transforms.NoTransform(), Float64) # nothing\n\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.eltype_output","page":"Available transforms","title":"PencilFFTs.Transforms.eltype_output","text":"eltype_output(transform::AbstractTransform, eltype_input)\n\nReturns the output data type for a given transform given the input type.\n\nThrows ArgumentError if the input data type is incompatible with the transform type.\n\nExample\n\njulia> eltype_output(Transforms.NoTransform(), Float32)\nFloat32\n\njulia> eltype_output(Transforms.RFFT(), Float64)\nComplexF64 (alias for Complex{Float64})\n\njulia> eltype_output(Transforms.BRFFT(4), ComplexF32)\nFloat32\n\njulia> eltype_output(Transforms.FFT(), Float64)\nERROR: ArgumentError: invalid input data type for PencilFFTs.Transforms.FFT: Float64\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.expand_dims","page":"Available transforms","title":"PencilFFTs.Transforms.expand_dims","text":"expand_dims(transform::AbstractTransform, Val(N))\n\nExpand a single multidimensional transform into one transform per dimension.\n\nExample\n\n# Expand a real-to-complex transform in 3 dimensions.\njulia> expand_dims(Transforms.RFFT(), Val(3))\n(RFFT, FFT, FFT)\n\njulia> expand_dims(Transforms.BRFFT(4), Val(3))\n(BFFT, BFFT, BRFFT{even})\n\njulia> expand_dims(Transforms.NoTransform(), Val(2))\n(NoTransform, NoTransform)\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.is_inplace","page":"Available transforms","title":"PencilFFTs.Transforms.is_inplace","text":"is_inplace(transform::AbstractTransform) -> Bool\nis_inplace(transforms::Vararg{AbtractTransform}) -> Union{Bool, Nothing}\n\nCheck whether a transform or a list of transforms is performed in-place.\n\nIf the list of transforms has a combination of in-place and out-of-place transforms, nothing is returned.\n\nExample\n\njulia> is_inplace(Transforms.RFFT())\nfalse\n\njulia> is_inplace(Transforms.NoTransform!())\ntrue\n\njulia> is_inplace(Transforms.FFT!(), Transforms.R2R!(FFTW.REDFT01))\ntrue\n\njulia> is_inplace(Transforms.FFT(), Transforms.R2R(FFTW.REDFT01))\nfalse\n\njulia> is_inplace(Transforms.FFT(), Transforms.R2R!(FFTW.REDFT01)) === nothing\ntrue\n\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.kind","page":"Available transforms","title":"PencilFFTs.Transforms.kind","text":"kind(transform::R2R)\n\nGet kind of real-to-real transform.\n\n\n\n\n\n","category":"function"},{"location":"Transforms/#PencilFFTs.Transforms.length_output","page":"Available transforms","title":"PencilFFTs.Transforms.length_output","text":"length_output(transform::AbstractTransform, length_in::Integer)\n\nReturns the length of the transform output, given the length of its input.\n\nThe input and output lengths are specified in terms of the respective input and output datatypes. For instance, for real-to-complex transforms, these are respectively the length of input real data and of output complex data.\n\n\n\n\n\n","category":"function"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/navier_stokes.jl\"","category":"page"},{"location":"generated/navier_stokes/#Navier–Stokes-equations","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"In this example, we numerically solve the incompressible Navier–Stokes equations","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"_t bmv + (bmv bm) bmv = -frac1ρ bm p + ν ^2 bmv\nquad bm bmv = 0","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where bmv(bmx t) and p(bmx t) are respectively the velocity and pressure fields, ν is the fluid kinematic viscosity and ρ is the fluid density.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We solve the above equations a 3D periodic domain using a standard Fourier pseudo-spectral method.","category":"page"},{"location":"generated/navier_stokes/#First-steps","page":"Navier–Stokes equations","title":"First steps","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We start by loading the required packages, initialising MPI and setting the simulation parameters.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using MPI\nusing PencilFFTs\n\nMPI.Init()\ncomm = MPI.COMM_WORLD\nprocid = MPI.Comm_rank(comm) + 1\n\n# Simulation parameters\nNs = (64, 64, 64) # = (Nx, Ny, Nz)\nLs = (2π, 2π, 2π) # = (Lx, Ly, Lz)\n\n# Collocation points (\"global\" = over all processes).\n# We include the endpoint (length = N + 1) for convenience.\nxs_global = map((N, L) -> range(0, L; length = N + 1), Ns, Ls) # = (x, y, z)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's check the number of MPI processes over which we're running our simulation:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"MPI.Comm_size(comm)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can now create a partitioning of the domain based on the number of grid points (Ns) and on the number of MPI processes. There are different ways to do this. For simplicity, here we do it automatically following the PencilArrays.jl docs:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"pen = Pencil(Ns, comm)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"The subdomain associated to the local MPI process can be obtained using range_local:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"range_local(pen)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We now construct a distributed vector field that follows the decomposition configuration we just created:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v⃗₀ = (\n PencilArray{Float64}(undef, pen), # vx\n PencilArray{Float64}(undef, pen), # vy\n PencilArray{Float64}(undef, pen), # vz\n)\nsummary(v⃗₀[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We still need to fill this array with interesting values that represent a physical velocity field.","category":"page"},{"location":"generated/navier_stokes/#Initial-condition","page":"Navier–Stokes equations","title":"Initial condition","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's set the initial condition in physical space. In this example, we choose the Taylor–Green vortex configuration as an initial condition:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"beginaligned\nv_x(x y z) = u₀ sin(k₀ x) cos(k₀ y) cos(k₀ z) \nv_y(x y z) = -u₀ cos(k₀ x) sin(k₀ y) cos(k₀ z) \nv_z(x y z) = 0\nendaligned","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where u₀ and k₀ are two parameters setting the amplitude and the period of the velocity field.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To set the initial condition, each MPI process needs to know which portion of the physical grid it has been attributed. For this, PencilArrays.jl includes a localgrid helper function:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"grid = localgrid(pen, xs_global)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can use this to initialise the velocity field:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"u₀ = 1.0\nk₀ = 2π / Ls[1] # should be integer if L = 2π (to preserve periodicity)\n\n@. v⃗₀[1] = u₀ * sin(k₀ * grid.x) * cos(k₀ * grid.y) * cos(k₀ * grid.z)\n@. v⃗₀[2] = -u₀ * cos(k₀ * grid.x) * sin(k₀ * grid.y) * cos(k₀ * grid.z)\n@. v⃗₀[3] = 0\nnothing # hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Let's plot a 2D slice of the velocity field managed by the local MPI process:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using GLMakie\n\n# Compute the norm of a vector field represented by a tuple of arrays.\nfunction vecnorm(v⃗::NTuple)\n vnorm = similar(v⃗[1])\n for n ∈ eachindex(v⃗[1])\n w = zero(eltype(vnorm))\n for v ∈ v⃗\n w += v[n]^2\n end\n vnorm[n] = sqrt(w)\n end\n vnorm\nend\n\nlet fig = Figure(resolution = (700, 600))\n ax = Axis3(fig[1, 1]; aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\")\n vnorm = vecnorm(v⃗₀)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, vnorm;\n alpha = 0.2, levels = 4,\n colormap = :viridis, colorrange = (0.0, 1.0),\n )\n cb = Colorbar(fig[1, 2], ct; label = \"Velocity magnitude\")\n fig\nend","category":"page"},{"location":"generated/navier_stokes/#Velocity-in-Fourier-space","page":"Navier–Stokes equations","title":"Velocity in Fourier space","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"In the Fourier pseudo-spectral method, the periodic velocity field is discretised in space as a truncated Fourier series","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"bmv(bmx t) =\n_bmk hatbmv_bmk(t) e^i bmk bmx","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where bmk = (k_x k_y k_z) are the discrete wave numbers.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"The wave numbers can be obtained using the fftfreq function. Since we perform a real-to-complex transform along the first dimension, we use rfftfreq instead for k_x:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using AbstractFFTs: fftfreq, rfftfreq\n\nks_global = (\n rfftfreq(Ns[1], 2π * Ns[1] / Ls[1]), # kx | real-to-complex\n fftfreq(Ns[2], 2π * Ns[2] / Ls[2]), # ky | complex-to-complex\n fftfreq(Ns[3], 2π * Ns[3] / Ls[3]), # kz | complex-to-complex\n)\n\nks_global[1]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ks_global[2]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ks_global[3]'","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To transform the velocity field to Fourier space, we first create a real-to-complex FFT plan to be applied to one of the velocity components:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"plan = PencilFFTPlan(v⃗₀[1], Transforms.RFFT())","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"See PencilFFTPlan for details on creating plans and on optional keyword arguments.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can now apply this plan to the three velocity components to obtain the respective Fourier coefficients hatbmv_bmk:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v̂s = plan .* v⃗₀\nsummary(v̂s[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Note that, in Fourier space, the domain decomposition is performed along the directions x and y:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"pencil(v̂s[1])","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"This is because the 3D FFTs are performed one dimension at a time, with the x direction first and the z direction last. To efficiently perform an FFT along a given direction (taking advantage of serial FFT implementations like FFTW), all the data along that direction must be contained locally within a single MPI process. For that reason, data redistributions (or transpositions) among MPI processes are performed behind the scenes during each FFT computation. Such transpositions require important communications between MPI processes, and are usually the most time-consuming aspect of massively-parallel simulations using this kind of methods.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To solve the Navier–Stokes equations in Fourier space, we will also need the respective wave numbers bmk associated to the local MPI process. Similarly to the local grid points, these are obtained using the localgrid function:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"grid_fourier = localgrid(v̂s[1], ks_global)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"As an example, let's first use this to compute and plot the vorticity associated to the initial condition. The vorticity is defined as the curl of the velocity, bmω = bm bmv. In Fourier space, this becomes hatbmω = i bmk hatbmv.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using StaticArrays: SVector\nusing LinearAlgebra: ×\n\nfunction curl_fourier!(\n ω̂s::NTuple{N, <:PencilArray}, v̂s::NTuple{N, <:PencilArray}, grid_fourier,\n ) where {N}\n @inbounds for I ∈ eachindex(grid_fourier)\n # We use StaticArrays for the cross product between small vectors.\n ik⃗ = im * SVector(grid_fourier[I])\n v⃗ = SVector(getindex.(v̂s, Ref(I))) # = (v̂s[1][I], v̂s[2][I], ...)\n ω⃗ = ik⃗ × v⃗\n for n ∈ eachindex(ω⃗)\n ω̂s[n][I] = ω⃗[n]\n end\n end\n ω̂s\nend\n\nω̂s = similar.(v̂s)\ncurl_fourier!(ω̂s, v̂s, grid_fourier);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We finally transform back to physical space and plot the result:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"ωs = plan .\\ ω̂s\n\nlet fig = Figure(resolution = (700, 600))\n ax = Axis3(fig[1, 1]; aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\")\n ω_norm = vecnorm(ωs)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, ω_norm;\n alpha = 0.1, levels = 0.8:0.2:2.0,\n colormap = :viridis, colorrange = (0.8, 2.0),\n )\n cb = Colorbar(fig[1, 2], ct; label = \"Vorticity magnitude\")\n fig\nend","category":"page"},{"location":"generated/navier_stokes/#Computing-the-non-linear-term","page":"Navier–Stokes equations","title":"Computing the non-linear term","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"One can show that, in Fourier space, the incompressible Navier–Stokes equations can be written as","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"_t hatbmv_bmk =\n- mathcalP_bmk left widehat(bmv bm) bmv right\n- ν bmk^2 hatbmv_bmk\nquad text with quad\nmathcalP_bmk(hatbmF_bmk) = left( I - fracbmk \nbmkbmk^2 right) hatbmF_bmk","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"where mathcalP_bmk is a projection operator allowing to preserve the incompressibility condition bm bmv = 0. This operator encodes the action of the pressure gradient term, which serves precisely to enforce incompressibility. Note that, because of this, the pressure gradient dissapears from the equations.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Now that we have the wave numbers bmk, computing the linear viscous term in Fourier space is straighforward once we have the Fourier coefficients hatbmv_bmk of the velocity field. What is slightly more challenging (and much more costly) is the computation of the non-linear term in Fourier space, hatbmF_bmk = left widehat(bmv bm) bmv right_bmk. In the pseudo-spectral method, the quadratic nonlinearity is computed by collocation in physical space (i.e. this term is evaluated at grid points), while derivatives are computed in Fourier space. This requires transforming fields back and forth between both spaces.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Below we implement a function that computes the non-linear term in Fourier space based on its convective form (bmv bm) bmv = bm (bmv bmv). Note that this equivalence uses the incompressibility condition bm bmv = 0.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using LinearAlgebra: mul!, ldiv! # for applying FFT plans in-place\n\n# Compute non-linear term in Fourier space from velocity field in physical\n# space. Optional keyword arguments may be passed to avoid allocations.\nfunction ns_nonlinear!(\n F̂s, vs, plan, grid_fourier;\n vbuf = similar(vs[1]), v̂buf = similar(F̂s[1]),\n )\n # Compute F_i = ∂_j (v_i v_j) for each i.\n # In Fourier space: F̂_i = im * k_j * FFT(v_i * v_j)\n w, ŵ = vbuf, v̂buf\n @inbounds for (i, F̂i) ∈ enumerate(F̂s)\n F̂i .= 0\n vi = vs[i]\n for (j, vj) ∈ enumerate(vs)\n w .= vi .* vj # w = v_i * v_j in physical space\n mul!(ŵ, plan, w) # same in Fourier space\n # Add derivative in Fourier space\n for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n kj = k⃗[j]\n F̂i[I] += im * kj * ŵ[I]\n end\n end\n end\n F̂s\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"As an example, let's use this function on our initial velocity field:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"F̂s = similar.(v̂s)\nns_nonlinear!(F̂s, v⃗₀, plan, grid_fourier);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Strictly speaking, computing the non-linear term by collocation can lead to aliasing errors, as the quadratic term excites Fourier modes that fall beyond the range of resolved wave numbers. The typical solution is to apply Orzsag's 2/3 rule to zero-out the Fourier coefficients associated to the highest wave numbers. We define a function that applies this procedure below.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function dealias_twothirds!(ŵs::Tuple, grid_fourier, ks_global)\n ks_max = maximum.(abs, ks_global) # maximum stored wave numbers (kx_max, ky_max, kz_max)\n ks_lim = (2 / 3) .* ks_max\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I]\n if any(abs.(k⃗) .> ks_lim)\n for ŵ ∈ ŵs\n ŵ[I] = 0\n end\n end\n end\n ŵs\nend\n\n# We can apply this on the previously computed non-linear term:\ndealias_twothirds!(F̂s, grid_fourier, ks_global);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"Finally, we implement the projection associated to the incompressibility condition:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function project_divergence_free!(ûs, grid_fourier)\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I]\n k² = sum(abs2, k⃗)\n iszero(k²) && continue # avoid division by zero\n û = getindex.(ûs, Ref(I)) # (ûs[1][I], ûs[2][I], ...)\n for i ∈ eachindex(û)\n ŵ = û[i]\n for j ∈ eachindex(û)\n ŵ -= k⃗[i] * k⃗[j] * û[j] / k²\n end\n ûs[i][I] = ŵ\n end\n end\n ûs\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We can verify the correctness of the projection operator by checking that the initial velocity field is not modified by it, since it is already incompressible:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"v̂s_proj = project_divergence_free!(copy.(v̂s), grid_fourier)\nv̂s_proj .≈ v̂s # the last one may be false because v_z = 0 initially","category":"page"},{"location":"generated/navier_stokes/#Putting-it-all-together","page":"Navier–Stokes equations","title":"Putting it all together","text":"","category":"section"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"To perform the time integration of the Navier–Stokes equations, we will use the timestepping routines implemented in the DifferentialEquations.jl suite. For simplicity, we use here an explicit Runge–Kutta scheme. In this case, we just need to write a function that computes the right-hand side of the Navier–Stokes equations in Fourier space:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function ns_rhs!(\n dvs::NTuple{N, <:PencilArray}, vs::NTuple{N, <:PencilArray}, p, t,\n ) where {N}\n # 1. Compute non-linear term and dealias it\n (; plan, cache, ks_global, grid_fourier) = p\n F̂s = cache.F̂s\n ns_nonlinear!(F̂s, vs, plan, grid_fourier; vbuf = dvs[1], v̂buf = cache.v̂s[1])\n dealias_twothirds!(F̂s, grid_fourier, ks_global)\n\n # 2. Project onto divergence-free space\n project_divergence_free!(F̂s, grid_fourier)\n\n # 3. Transform velocity to Fourier space\n v̂s = cache.v̂s\n map((v, v̂) -> mul!(v̂, plan, v), vs, v̂s)\n\n # 4. Add viscous term (and multiply projected non-linear term by -1)\n ν = p.ν\n for n ∈ eachindex(v̂s)\n v̂ = v̂s[n]\n F̂ = F̂s[n]\n @inbounds for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n k² = sum(abs2, k⃗)\n F̂[I] = -F̂[I] - ν * k² * v̂[I]\n end\n end\n\n # 5. Transform RHS back to physical space\n map((dv, dv̂) -> ldiv!(dv, plan, dv̂), dvs, F̂s)\n\n nothing\nend","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"For the time-stepping, we load OrdinaryDiffEq.jl from the DifferentialEquations.jl suite and set-up the simulation. Since DifferentialEquations.jl can't directly deal with tuples of arrays, we convert the input data to the ArrayPartition type and write an interface function to make things work with our functions defined above.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"using OrdinaryDiffEq\nusing RecursiveArrayTools: ArrayPartition\n\nns_rhs!(dv::ArrayPartition, v::ArrayPartition, args...) = ns_rhs!(dv.x, v.x, args...)\nvs_init_ode = ArrayPartition(v⃗₀)\nsummary(vs_init_ode)","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We now define solver parameters and temporary variables, and initialise the problem:","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"params = (;\n ν = 5e-3, # kinematic viscosity\n plan, grid_fourier, ks_global,\n cache = (\n v̂s = similar.(v̂s),\n F̂s = similar.(v̂s),\n )\n)\n\ntspan = (0.0, 10.0)\nprob = ODEProblem(ns_rhs!, vs_init_ode, tspan, params)\nintegrator = init(prob, RK4(); dt = 1e-3, save_everystep = false);\nnothing #hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"We finally solve the problem over time and plot the vorticity associated to the solution. It is also useful to look at the energy spectrum E(k), to see if the small scales are correctly resolved. To obtain a turbulent flow, the viscosity ν must be small enough to allow the transient appearance of an energy cascade towards the small scales (i.e. from small to large k), while high enough to allow the small-scale motions to be correctly resolved.","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"function energy_spectrum!(Ek, ks, v̂s, grid_fourier)\n Nk = length(Ek)\n @assert Nk == length(ks)\n Ek .= 0\n for I ∈ eachindex(grid_fourier)\n k⃗ = grid_fourier[I] # = (kx, ky, kz)\n knorm = sqrt(sum(abs2, k⃗))\n i = searchsortedfirst(ks, knorm)\n i > Nk && continue\n v⃗ = getindex.(v̂s, Ref(I)) # = (v̂s[1][I], v̂s[2][I], ...)\n factor = k⃗[1] == 0 ? 1 : 2 # account for Hermitian symmetry and r2c transform\n Ek[i] += factor * sum(abs2, v⃗) / 2\n end\n MPI.Allreduce!(Ek, +, get_comm(v̂s[1])) # sum across all processes\n Ek\nend\n\nks = rfftfreq(Ns[1], 2π * Ns[1] / Ls[1])\nEk = similar(ks)\nv̂s = plan .* integrator.u.x\nenergy_spectrum!(Ek, ks, v̂s, grid_fourier)\nEk ./= scale_factor(plan)^2 # rescale energy\n\ncurl_fourier!(ω̂s, v̂s, grid_fourier)\nldiv!.(ωs, plan, ω̂s)\nω⃗_plot = Observable(ωs)\nk_plot = @view ks[2:end]\nE_plot = Observable(@view Ek[2:end])\nt_plot = Observable(integrator.t)\n\nfig = let\n fig = Figure(resolution = (1200, 600))\n ax = Axis3(\n fig[1, 1][1, 1]; title = @lift(\"t = $(round($t_plot, digits = 3))\"),\n aspect = :data, xlabel = \"x\", ylabel = \"y\", zlabel = \"z\",\n )\n ω_mag = @lift vecnorm($ω⃗_plot)\n ω_mag_norm = @lift $ω_mag ./ maximum($ω_mag)\n ct = contour!(\n ax, grid.x, grid.y, grid.z, ω_mag_norm;\n alpha = 0.3, levels = 3,\n colormap = :viridis, colorrange = (0.0, 1.0),\n )\n cb = Colorbar(fig[1, 1][1, 2], ct; label = \"Normalised vorticity magnitude\")\n ax_sp = Axis(\n fig[1, 2];\n xlabel = \"k\", ylabel = \"E(k)\", xscale = log2, yscale = log10,\n title = \"Kinetic energy spectrum\",\n )\n ylims!(ax_sp, 1e-8, 1e0)\n scatterlines!(ax_sp, k_plot, E_plot)\n ks_slope = exp.(range(log(2.5), log(25.0), length = 3))\n E_fivethirds = @. 0.3 * ks_slope^(-5/3)\n @views lines!(ax_sp, ks_slope, E_fivethirds; color = :black, linestyle = :dot)\n text!(ax_sp, L\"k^{-5/3}\"; position = (ks_slope[2], E_fivethirds[2]), align = (:left, :bottom))\n fig\nend\n\nusing Printf # hide\nwith_xvfb = ENV[\"DISPLAY\"] == \":99\" # hide\nnstep = 0 # hide\nconst tmpdir = mktempdir() # hide\nfilename_frame(procid, nstep) = joinpath(tmpdir, @sprintf(\"proc%d_%04d.png\", procid, nstep)) # hide\nrecord(fig, \"vorticity_proc$procid.mp4\"; framerate = 10) do io\n with_xvfb && recordframe!(io) # hide\n while integrator.t < 20\n dt = 0.001\n step!(integrator, dt)\n t_plot[] = integrator.t\n mul!.(v̂s, plan, integrator.u.x) # current velocity in Fourier space\n curl_fourier!(ω̂s, v̂s, grid_fourier)\n ldiv!.(ω⃗_plot[], plan, ω̂s)\n ω⃗_plot[] = ω⃗_plot[] # to force updating the plot\n energy_spectrum!(Ek, ks, v̂s, grid_fourier)\n Ek ./= scale_factor(plan)^2 # rescale energy\n E_plot[] = E_plot[]\n global nstep += 1 # hide\n with_xvfb ? # hide\n save(filename_frame(procid, nstep), fig) : # hide\n recordframe!(io)\n end\nend;\n\nif with_xvfb # hide\n run(pipeline(`ffmpeg -y -r 10 -i $tmpdir/proc$(procid)_%04d.png -c:v libx264 -vf \"fps=25,format=yuv420p\" vorticity_proc$procid.mp4`; stdout = \"ffmpeg.out\", stderr = \"ffmpeg.err\")) # hide\nend # hide\nnothing # hide","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"","category":"page"},{"location":"generated/navier_stokes/","page":"Navier–Stokes equations","title":"Navier–Stokes equations","text":"This page was generated using Literate.jl.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"EditURL = \"https://github.com/jipolanco/PencilFFTs.jl/blob/master/docs/examples/gradient.jl\"","category":"page"},{"location":"generated/gradient/#Gradient-of-a-scalar-field","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This example shows different methods to compute the gradient of a real-valued 3D scalar field θ(bmx) in Fourier space, where bmx = (x y z). It is assumed that the field is periodic with period L = 2π along all dimensions.","category":"page"},{"location":"generated/gradient/#General-procedure","page":"Gradient of a scalar field","title":"General procedure","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The discrete Fourier expansion of θ writes","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"θ(bmx) = _bmk Z^3 hatθ(bmk) e^i bmk bmx","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"where bmk = (k_x k_y k_z) are the Fourier wave numbers and hatθ is the discrete Fourier transform of θ. Then, the spatial derivatives of θ are given by","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"frac θ(bmx) x_i =\n_bmk Z^3 i k_i hatθ(bmk) e^i bmk bmx","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"where the subscript i denotes one of the spatial components x, y or z.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In other words, to compute bm θ = (_x θ _y θ _z θ), one has to:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"transform θ to Fourier space to obtain hatθ,\nmultiply hatθ by i bmk,\ntransform the result back to physical space to obtain bm θ.","category":"page"},{"location":"generated/gradient/#Preparation","page":"Gradient of a scalar field","title":"Preparation","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In this section, we initialise a random real-valued scalar field θ and compute its FFT. For more details see the Tutorial.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"using MPI\nusing PencilFFTs\nusing Random\n\nMPI.Init()\n\n# Input data dimensions (Nx × Ny × Nz)\ndims = (64, 32, 64)\n\n# Apply a 3D real-to-complex (r2c) FFT.\ntransform = Transforms.RFFT()\n\n# Automatically create decomposition configuration\ncomm = MPI.COMM_WORLD\npen = Pencil(dims, comm)\n\n# Create plan\nplan = PencilFFTPlan(pen, transform)\n\n# Allocate data and initialise field\nθ = allocate_input(plan)\nrandn!(θ)\n\n# Perform distributed FFT\nθ_hat = plan * θ\nnothing # hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, we initialise the output that will hold ∇θ in Fourier space. Noting that ∇θ is a vector field, we choose to store it as a tuple of 3 PencilArrays.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"∇θ_hat = allocate_output(plan, Val(3))\n\n# This is equivalent:\n# ∇θ_hat = ntuple(d -> similar(θ_hat), Val(3))\n\nsummary(∇θ_hat)","category":"page"},{"location":"generated/gradient/#Fourier-wave-numbers","page":"Gradient of a scalar field","title":"Fourier wave numbers","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In general, the Fourier wave numbers are of the form k_i = 0 frac2πL_i frac4πL_i frac6πL_i , where L_i is the period along dimension i. When a real-to-complex Fourier transform is applied, roughly half of these wave numbers are redundant due to the Hermitian symmetry of the complex Fourier coefficients. In practice, this means that for the fastest dimension x (along which a real-to-complex transform is performed), the negative wave numbers are dropped, i.e. k_x = 0 frac2πL_x frac4πL_x .","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The AbstractFFTs package provides a convenient way to generate the Fourier wave numbers, using the functions fftfreq and rfftfreq. We can use these functions to initialise a \"grid\" of wave numbers associated to our 3D real-to-complex transform:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"using AbstractFFTs: fftfreq, rfftfreq\n\nbox_size = (2π, 2π, 2π) # Lx, Ly, Lz\nsample_rate = 2π .* dims ./ box_size\n\n# In our case (Lx = 2π and Nx even), this gives kx = [0, 1, 2, ..., Nx/2].\nkx = rfftfreq(dims[1], sample_rate[1])\n\n# In our case (Ly = 2π and Ny even), this gives\n# ky = [0, 1, 2, ..., Ny/2-1, -Ny/2, -Ny/2+1, ..., -1] (and similarly for kz).\nky = fftfreq(dims[2], sample_rate[2])\nkz = fftfreq(dims[3], sample_rate[3])\n\nkvec = (kx, ky, kz)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that kvec now contains the wave numbers associated to the global domain. In the following, we will only need the wave numbers associated to the portion of the domain handled by the local MPI process.","category":"page"},{"location":"generated/gradient/#gradient_method_global","page":"Gradient of a scalar field","title":"Method 1: global views","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"PencilArrays, returned for instance by allocate_input and allocate_output, take indices that start at 1, regardless of the location of the subdomain associated to the local process on the global grid. (In other words, PencilArrays take local indices.) On the other hand, we have defined the wave number vector kvec which, for each MPI process, is defined over the global domain, and as such it takes global indices.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"One straightforward way of making data arrays compatible with wave numbers is to use global views, i.e. arrays that take global indices. These are generated from PencilArrays by calling the global_view function. Note that, in general, global indices do not start at 1 for a given MPI process. A given process will own a range of data given by indices in (i1:i2, j1:j2, k1:k2).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"θ_glob = global_view(θ_hat)\n∇θ_glob = global_view.(∇θ_hat)\nsummary(θ_glob)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Once we have global views, we can combine data and wave numbers using the portion of global indices owned by the local MPI process, as shown below. We can use CartesianIndices to iterate over the global indices associated to the local process.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"for I in CartesianIndices(θ_glob)\n i, j, k = Tuple(I) # unpack indices\n\n # Wave number vector associated to current Cartesian index.\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n # Compute gradient in Fourier space.\n # Note that modifying ∇θ_glob also modifies the original PencilArray ∇θ_hat.\n ∇θ_glob[1][I] = im * kx * θ_glob[I]\n ∇θ_glob[2][I] = im * ky * θ_glob[I]\n ∇θ_glob[3][I] = im * kz * θ_glob[I]\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The above loop can be written in a slightly more efficient manner by precomputing im * θ_glob[I]:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@inbounds for I in CartesianIndices(θ_glob)\n i, j, k = Tuple(I)\n\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n u = im * θ_glob[I]\n\n ∇θ_glob[1][I] = kx * u\n ∇θ_glob[2][I] = ky * u\n ∇θ_glob[3][I] = kz * u\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Also note that the above can be easily written in a more generic way, e.g. for arbitrary dimensions, thanks in part to the use of CartesianIndices. Moreover, in the above there is no notion of the dimension permutations discussed in the tutorial, as it is all hidden behind the implementation of PencilArrays. And as seen later in the benchmarks, these (hidden) permutations have zero cost, as the speed is identical to that of a function that explicitly takes into account these permutations.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, we can perform a backwards transform to obtain bm θ in physical space:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"∇θ = plan \\ ∇θ_hat;\nnothing #hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that the transform is automatically broadcast over the three fields of the ∇θ_hat vector, and the result ∇θ is also a tuple of three PencilArrays.","category":"page"},{"location":"generated/gradient/#gradient_method_global_explicit","page":"Gradient of a scalar field","title":"Method 2: explicit global indexing","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Sometimes, one does not need to write generic code. In our case, one often knows the dimensionality of the problem and the memory layout of the data (i.e. the underlying index permutation).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Below is a reimplementation of the above loop, using explicit indices instead of CartesianIndices, and assuming that the underlying index permutation is (3, 2, 1), that is, data is stored in (z y x) order. As discussed in the tutorial, this is the default for transformed arrays. This example also serves as a more explicit explanation for what is going on in the first method.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"# Get local data range in the global grid.\nrng = axes(θ_glob) # = (i1:i2, j1:j2, k1:k2)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"For the loop below, we're assuming that the permutation is (3, 2, 1). In other words, the fastest index is the last one, and not the first one as it is usually in Julia. If the permutation is not (3, 2, 1), things will still work (well, except for the assertion below!), but the loop order will not be optimal.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@assert permutation(θ_hat) === Permutation(3, 2, 1)\n\n@inbounds for i in rng[1], j in rng[2], k in rng[3]\n local kx, ky, kz # hide\n kx = kvec[1][i]\n ky = kvec[2][j]\n kz = kvec[3][k]\n\n # Note that we still access the arrays in (i, j, k) order.\n # (The permutation happens behind the scenes!)\n u = im * θ_glob[i, j, k]\n\n ∇θ_glob[1][i, j, k] = kx * u\n ∇θ_glob[2][i, j, k] = ky * u\n ∇θ_glob[3][i, j, k] = kz * u\nend","category":"page"},{"location":"generated/gradient/#gradient_method_local","page":"Gradient of a scalar field","title":"Method 3: using local indices","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Alternatively, we can avoid global views and work directly on PencilArrays using local indices that start at 1. In this case, part of the strategy is to construct a \"local\" grid of wave numbers that can also be accessed with local indices. This can be conveniently done using the localgrid function of the PencilArrays.jl package, which accepts a PencilArray (or its associated Pencil) and the global coordinates (here kvec):","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"grid_fourier = localgrid(θ_hat, kvec)","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Note that one can directly iterate on the returned grid object:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@inbounds for I in CartesianIndices(grid_fourier)\n # Wave number vector associated to current Cartesian index.\n local k⃗ # hide\n k⃗ = grid_fourier[I]\n u = im * θ_hat[I]\n ∇θ_hat[1][I] = k⃗[1] * u\n ∇θ_hat[2][I] = k⃗[2] * u\n ∇θ_hat[3][I] = k⃗[3] * u\nend","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This implementation is as efficient as the other examples, while being slightly shorter to write. Moreover, it is quite generic, and can be made independent of the number of dimensions with little effort.","category":"page"},{"location":"generated/gradient/#gradient_method_broadcast","page":"Gradient of a scalar field","title":"Method 4: using broadcasting","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Finally, note that the local grid object returned by localgrid makes it is possible to compute the gradient using broadcasting, thus fully avoiding scalar indexing. This can be quite convenient in some cases, and can also be very useful if one is working on GPUs (where scalar indexing is prohibitively expensive). Using broadcasting, the above examples simply become:","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"@. ∇θ_hat[1] = im * grid_fourier[1] * θ_hat\n@. ∇θ_hat[2] = im * grid_fourier[2] * θ_hat\n@. ∇θ_hat[3] = im * grid_fourier[3] * θ_hat\nnothing # hide","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"Once again, as shown in the benchmarks further below, this method performs quite similarly to the other ones.","category":"page"},{"location":"generated/gradient/#Summary","page":"Gradient of a scalar field","title":"Summary","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The PencilArrays module provides different alternatives to deal with MPI-distributed data that may be subject to dimension permutations. In particular, one can choose to work with global indices (first two examples), with local indices (third example), or to avoid scalar indexing altogether (fourth example).","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"If one wants to stay generic, making sure that the same code will work for arbitrary dimensions and will be efficient regardless of the underlying dimension permutation, methods 1, 3 or 4 should be preferred. These use CartesianIndices and make no assumptions on possible dimension permutations, which are by default enabled in the output of PencilFFTs transforms. In fact, such permutations are completely invisible in the implementations.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The second method uses explicit (i, j, k) indices. It assumes that the underlying permutation is (3, 2, 1) to loop with i as the slowest index and k as the fastest, which is the optimal order in this case given the permutation. As such, the implementation is less generic than the others, and differences in performance are negligible with respect to more generic variants.","category":"page"},{"location":"generated/gradient/#gradient_benchmarks","page":"Gradient of a scalar field","title":"Benchmark results","text":"","category":"section"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"The following are the benchmark results obtained from running examples/gradient.jl on a laptop, using 2 MPI processes and Julia 1.7.2, with an input array of global dimensions 64 32 64. The different methods detailed above are marked on the right. The \"lazy\" marks indicate runs where the wave numbers were represented by lazy Frequencies objects (returned by rfftfreq and fftfreq). Otherwise, they were collected into Vectors. For some reason, plain Vectors are faster when working with grids generated by localgrid.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"In the script, additional implementations can be found which rely on a more advanced understanding of permutations and on the internals of the PencilArrays package. For instance, gradient_local_parent! directly works with the raw data stored in Julia Arrays, while gradient_local_linear! completely avoids CartesianIndices while staying generic and efficient. Nevertheless, these display roughly the same performance as the above examples.","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":" gradient_global_view!... 89.900 μs\n gradient_global_view! (lazy)... 92.060 μs [Method 1]\n gradient_global_view_explicit!... 88.958 μs\n gradient_global_view_explicit! (lazy)... 81.055 μs [Method 2]\n gradient_local!... 92.305 μs\n gradient_grid!... 92.770 μs\n gradient_grid! (lazy)... 101.388 μs [Method 3]\n gradient_grid_broadcast!... 88.606 μs\n gradient_grid_broadcast! (lazy)... 151.020 μs [Method 4]\n gradient_local_parent!... 92.248 μs\n gradient_local_linear!... 91.212 μs\n gradient_local_linear_explicit!... 90.992 μs","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"","category":"page"},{"location":"generated/gradient/","page":"Gradient of a scalar field","title":"Gradient of a scalar field","text":"This page was generated using Literate.jl.","category":"page"},{"location":"PencilFFTs_timers/#PencilFFTs.measuring_performance","page":"Measuring performance","title":"Measuring performance","text":"","category":"section"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"It is possible to measure the time spent in different sections of the distributed transforms using the TimerOutputs package. This has a (very small) performance overhead, so it is disabled by default. To enable time measurements, call TimerOutputs.enable_debug_timings after loading PencilFFTs (see below for an example). For more details see the TimerOutputs docs.","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"Minimal example:","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"using MPI\nusing PencilFFTs\nusing TimerOutputs\n\n# Enable timing of `PencilFFTs` functions\nTimerOutputs.enable_debug_timings(PencilFFTs)\nTimerOutputs.enable_debug_timings(PencilArrays)\nTimerOutputs.enable_debug_timings(Transpositions)\n\nMPI.Init()\n\nplan = PencilFFTPlan(#= args... =#)\n\n# [do stuff with `plan`...]\n\n# Retrieve and print timing data associated to `plan`\nto = timer(plan)\nprint_timer(to)","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"By default, each PencilFFTPlan has its own TimerOutput. If you already have a TimerOutput, you can pass it to the PencilFFTPlan constructor:","category":"page"},{"location":"PencilFFTs_timers/","page":"Measuring performance","title":"Measuring performance","text":"to = TimerOutput()\nplan = PencilFFTPlan(..., timer=to)\n\n# [do stuff with `plan`...]\n\nprint_timer(to)","category":"page"},{"location":"","page":"Home","title":"Home","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"#PencilFFTs","page":"Home","title":"PencilFFTs","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Fast Fourier transforms of MPI-distributed Julia arrays.","category":"page"},{"location":"#Introduction","page":"Home","title":"Introduction","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This package provides multidimensional FFTs and related transforms on MPI-distributed Julia arrays via the PencilArrays package.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The name of this package originates from the decomposition of 3D domains along two out of three dimensions, sometimes called pencil decomposition. This is illustrated by the figure below,[1] where each coloured block is managed by a different MPI process. Typically, one wants to compute FFTs on a scalar or vector field along the three spatial dimensions. In the case of a pencil decomposition, 3D FFTs are performed one dimension at a time, along the non-decomposed direction. Transforms must then be interleaved with global data transpositions to switch between pencil configurations. In high-performance computing environments, such data transpositions are generally the most expensive part of a parallel FFT computation, due to the large cost of communications between computing nodes.","category":"page"},{"location":"","page":"Home","title":"Home","text":"
\n \n
","category":"page"},{"location":"","page":"Home","title":"Home","text":"More generally, PencilFFTs allows to decompose and perform FFTs on geometries of arbitrary dimension N. The decompositions can be performed along an arbitrary number M N of dimensions.[2] Moreover, the transforms applied along each dimension can be arbitrarily chosen (and combined) among those supported by FFTW.jl, including complex-to-complex, real-to-complex and real-to-real transforms.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The generic and efficient implementation of this package is greatly enabled by the use of zero-cost abstractions in Julia. As shown in the Benchmarks section, PencilFFTs scales well to large numbers of processes, and performs similarly to the Fortran implementation of P3DFFT, possibly the most popular library for computing parallel FFTs using 2D domain decomposition.","category":"page"},{"location":"#Features","page":"Home","title":"Features","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"distributed N-dimensional FFTs of MPI-distributed Julia arrays, using the PencilArrays package;\nFFTs and related transforms (e.g. DCTs / Chebyshev transforms) may be arbitrarily combined along different dimensions;\nin-place and out-of-place transforms;\nhigh scalability up to (at least) tens of thousands of MPI processes.","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"PencilFFTs can be installed using the Julia package manager:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> ] add PencilFFTs","category":"page"},{"location":"#Similar-projects","page":"Home","title":"Similar projects","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"FFTW3 implements distributed-memory transforms using MPI, but these are limited to 1D decompositions. Also, this functionality is not currently included in the FFTW.jl wrappers.\nPFFT is a very general parallel FFT library written in C.\nP3DFFT implements parallel 3D FFTs using pencil decomposition in Fortran and C++.\n2DECOMP&FFT is another parallel 3D FFT library using pencil decomposition written in Fortran.","category":"page"},{"location":"","page":"Home","title":"Home","text":"[1]: Figure adapted from this PhD thesis.","category":"page"},{"location":"","page":"Home","title":"Home","text":"[2]: For the pencil decomposition represented in the figure, N = 3 and M = 2.","category":"page"},{"location":"PencilFFTs/#Distributed-FFT-plans","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"Distributed FFTs are implemented in the PencilFFTs module, and are built on top of the PencilArrays package.","category":"page"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"CurrentModule = PencilFFTs","category":"page"},{"location":"PencilFFTs/#Creating-plans","page":"Distributed FFT plans","title":"Creating plans","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"PencilFFTPlan","category":"page"},{"location":"PencilFFTs/#PencilFFTs.PencilFFTPlan","page":"Distributed FFT plans","title":"PencilFFTs.PencilFFTPlan","text":"PencilFFTPlan{T,N} <: AbstractFFTs.Plan{T}\n\nPlan for N-dimensional FFT-based transform on MPI-distributed data, where input data has type T.\n\n\n\nPencilFFTPlan(p::Pencil, transforms; kwargs...)\n\nCreate a PencilFFTPlan for distributed arrays following a given Pencil configuration. See variant below for details on the specification of transforms and on possible keyword arguments.\n\n\n\nPencilFFTPlan(\n A::PencilArray, transforms;\n fftw_flags = FFTW.ESTIMATE,\n fftw_timelimit = FFTW.NO_TIMELIMIT,\n permute_dims = Val(true),\n transpose_method = Transpositions.PointToPoint(),\n timer = timer(pencil(A)),\n)\n\nCreate plan for N-dimensional transform on MPI-distributed PencilArrays.\n\nExtended help\n\nThis creates a PencilFFTPlan for arrays sharing the same properties as A (dimensions, MPI decomposition, memory layout, ...), which describe data on an N-dimensional domain.\n\nTransforms\n\nThe transforms to be applied along each dimension are specified by the transforms argument. Possible transforms are defined as subtypes of Transforms.AbstractTransform, and are listed in Transform types. This argument may be either:\n\na tuple of N transforms to be applied along each dimension. For instance, transforms = (Transforms.R2R(FFTW.REDFT01), Transforms.RFFT(), Transforms.FFT());\na single transform to be applied along all dimensions. The input is automatically expanded into N equivalent transforms. For instance, for a three-dimensional array, transforms = Transforms.RFFT() specifies a 3D real-to-complex transform, and is equivalent to passing (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT()).\n\nNote that forward transforms are applied from left to right. In the last example, this means that a real-to-complex transform (RFFT) is first performed along the first dimension. This is followed by complex-to-complex transforms (FFT) along the second and third dimensions.\n\nInput data layout\n\nThe input PencilArray must satisfy the following constraints:\n\narray dimensions must not be permuted. This is the default when constructing PencilArrays.\nfor an M-dimensional domain decomposition (with M < N), the input array must be decomposed along the last M dimensions. For example, for a 2D decomposition of 3D data, the decomposed dimensions must be (2, 3). In particular, the first array dimension must not be distributed among different MPI processes.\nIn the PencilArrays package, the decomposed dimensions are specified at the moment of constructing a Pencil.\nthe element type must be compatible with the specified transform. For instance, real-to-complex transforms (Transforms.RFFT) require the input to be real floating point values. Other transforms, such as Transforms.R2R, accept both real and complex data.\n\nKeyword arguments\n\nThe keyword arguments fftw_flags and fftw_timelimit are passed to the FFTW plan creation functions (see AbstractFFTs docs).\npermute_dims determines whether the indices of the output data should be reversed. For instance, if the input data has global dimensions (Nx, Ny, Nz), then the output of a complex-to-complex FFT would have dimensions (Nz, Ny, Nx). This enables FFTs to always be performed along the first (i.e. fastest) array dimension, which could lead to performance gains. This option is enabled by default. For type inference reasons, it must be a value type (Val(true) or Val(false)).\ntranspose_method allows to select between implementations of the global data transpositions. See PencilArrays docs docs for details.\ntimer should be a TimerOutput object. See Measuring performance for details.\n\n\n\nPencilFFTPlan(\n dims_global::Dims{N}, transforms, proc_dims::Dims{M}, comm::MPI.Comm,\n [real_type = Float64]; extra_dims = (), kws...\n)\n\nCreate plan for N-dimensional transform.\n\nExtended help\n\nInstead of taking a PencilArray or a Pencil, this constructor requires the global dimensions of the input data, passed via the size_global argument.\n\nThe data is distributed over the MPI processes in the comm communicator. The distribution is performed over M dimensions (with M < N) according to the values in proc_dims, which specifies the number of MPI processes to put along each dimension.\n\nPencilArrays that may be transformed with the returned plan can be created using allocate_input.\n\nOptional arguments\n\nThe floating point precision can be selected by setting real_type parameter, which is Float64 by default.\nextra_dims may be used to specify the sizes of one or more extra dimensions that should not be transformed. These dimensions will be added to the rightmost (i.e. slowest) indices of the arrays. See Extra dimensions below for usage hints.\nsee the other constructor for more keyword arguments.\n\nExtra dimensions\n\nOne possible application of extra_dims is for describing the components of a vector or tensor field. However, this means that different PencilFFTPlans would need to be created for each kind of field (scalar, vector, ...). To avoid the creation of multiple plans, a possibly better alternative is to create tuples (or arrays) of PencilArrays using allocate_input and allocate_output.\n\nAnother more legitimate usage of extra_dims is to specify one or more Cartesian dimensions that should not be transformed nor split among MPI processes.\n\nExample\n\nSuppose we want to perform a 3D FFT of real data. The data is to be decomposed along two dimensions, over 8 MPI processes:\n\nsize_global = (64, 32, 128) # size of real input data\n\n# Perform real-to-complex transform along the first dimension, then\n# complex-to-complex transforms along the other dimensions.\ntransforms = (Transforms.RFFT(), Transforms.FFT(), Transforms.FFT())\n# transforms = Transforms.RFFT() # this is equivalent to the above line\n\nproc_dims = (4, 2) # 2D decomposition\ncomm = MPI.COMM_WORLD\n\nplan = PencilFFTPlan(size_global, transforms, proc_dims, comm)\n\n\n\n\n\n","category":"type"},{"location":"PencilFFTs/#Allocating-data","page":"Distributed FFT plans","title":"Allocating data","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"allocate_input\nallocate_output","category":"page"},{"location":"PencilFFTs/#PencilFFTs.allocate_input","page":"Distributed FFT plans","title":"PencilFFTs.allocate_input","text":"allocate_input(p::PencilFFTPlan) -> PencilArray\nallocate_input(p::PencilFFTPlan, dims...) -> Array{PencilArray}\nallocate_input(p::PencilFFTPlan, Val(N)) -> NTuple{N, PencilArray}\n\nAllocate uninitialised PencilArray that can hold input data for the given plan.\n\nThe second and third forms respectively allocate an array of PencilArrays of size dims, and a tuple of N PencilArrays.\n\nnote: In-place plans\nIf p is an in-place plan, a ManyPencilArray is allocated. This type holds PencilArray wrappers for the input and output transforms (as well as for intermediate transforms) which share the same space in memory. The input and output PencilArrays should be respectively accessed by calling first(::ManyPencilArray) and last(::ManyPencilArray).ExampleSuppose p is an in-place PencilFFTPlan. Then,@assert is_inplace(p)\nA = allocate_input(p) :: ManyPencilArray\nv_in = first(A) :: PencilArray # input data view\nv_out = last(A) :: PencilArray # output data viewAlso note that in-place plans must be performed directly on the returned ManyPencilArray, and not on the contained PencilArray views:p * A # perform forward transform in-place\np \\ A # perform backward transform in-place\n# p * v_in # not allowed!!\n\n\n\n\n\n","category":"function"},{"location":"PencilFFTs/#PencilFFTs.allocate_output","page":"Distributed FFT plans","title":"PencilFFTs.allocate_output","text":"allocate_output(p::PencilFFTPlan) -> PencilArray\nallocate_output(p::PencilFFTPlan, dims...) -> Array{PencilArray}\nallocate_output(p::PencilFFTPlan, Val(N)) -> NTuple{N, PencilArray}\n\nAllocate uninitialised PencilArray that can hold output data for the given plan.\n\nIf p is an in-place plan, a ManyPencilArray is allocated.\n\nSee allocate_input for details.\n\n\n\n\n\n","category":"function"},{"location":"PencilFFTs/#Methods","page":"Distributed FFT plans","title":"Methods","text":"","category":"section"},{"location":"PencilFFTs/","page":"Distributed FFT plans","title":"Distributed FFT plans","text":"get_comm(::PencilFFTPlan)\nscale_factor(::PencilFFTPlan)\ntimer(::PencilFFTPlan)\nis_inplace(::PencilFFTPlan)","category":"page"},{"location":"PencilFFTs/#PencilArrays.Pencils.MPITopologies.get_comm-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilArrays.Pencils.MPITopologies.get_comm","text":"get_comm(p::PencilFFTPlan)\n\nGet MPI communicator associated to a PencilFFTPlan.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilFFTs.Transforms.scale_factor-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilFFTs.Transforms.scale_factor","text":"scale_factor(p::PencilFFTPlan)\n\nGet scale factor associated to a PencilFFTPlan.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilArrays.Pencils.timer-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilArrays.Pencils.timer","text":"timer(p::PencilFFTPlan)\n\nGet TimerOutput attached to a PencilFFTPlan.\n\nSee Measuring performance for details.\n\n\n\n\n\n","category":"method"},{"location":"PencilFFTs/#PencilFFTs.Transforms.is_inplace-Tuple{PencilFFTPlan}","page":"Distributed FFT plans","title":"PencilFFTs.Transforms.is_inplace","text":"Transforms.is_inplace(p::PencilFFTPlan)\n\nReturns true if the given plan operates in-place on the input data, false otherwise.\n\n\n\n\n\n","category":"method"}]
-}
diff --git a/previews/PR62/siteinfo.js b/previews/PR62/siteinfo.js
deleted file mode 100644
index 34c224d6..00000000
--- a/previews/PR62/siteinfo.js
+++ /dev/null
@@ -1 +0,0 @@
-var DOCUMENTER_CURRENT_VERSION = "previews/PR62";
diff --git a/previews/PR59/GlobalFFTParams/index.html b/previews/PR63/GlobalFFTParams/index.html
similarity index 95%
rename from previews/PR59/GlobalFFTParams/index.html
rename to previews/PR63/GlobalFFTParams/index.html
index 233ced58..bd67cf0e 100644
--- a/previews/PR59/GlobalFFTParams/index.html
+++ b/previews/PR63/GlobalFFTParams/index.html
@@ -6,4 +6,4 @@
julia> fft_params = PencilFFTs.GlobalFFTParams(size_global, transforms)
Transforms: (RFFT, FFT, FFT)
Input type: Float64
-Global dimensions: (64, 32, 128) -> (33, 32, 128)
Allocate uninitialised PencilArray that can hold input data for the given plan.
The second and third forms respectively allocate an array of PencilArrays of size dims, and a tuple of NPencilArrays.
In-place plans
If p is an in-place plan, a ManyPencilArray is allocated. This type holds PencilArray wrappers for the input and output transforms (as well as for intermediate transforms) which share the same space in memory. The input and output PencilArrays should be respectively accessed by calling first(::ManyPencilArray) and last(::ManyPencilArray).
Example
Suppose p is an in-place PencilFFTPlan. Then,
@assert is_inplace(p)
A = allocate_input(p) :: ManyPencilArray
v_in = first(A) :: PencilArray # input data view
v_out = last(A) :: PencilArray # output data view
Also note that in-place plans must be performed directly on the returned ManyPencilArray, and not on the contained PencilArray views:
p * A # perform forward transform in-place
p \ A # perform backward transform in-place
-# p * v_in # not allowed!!
Like AbstractFFTs.bfft, this transform is not normalised. To obtain the inverse transform, divide the output by the length of the transformed dimension.
To obtain the inverse transform, divide the output by the length of the transformed dimension (of the real output array).
As described in the AbstractFFTs docs, the length of the output cannot be fully inferred from the input length. For this reason, the BRFFT constructor accepts an optional d argument indicating the output length.
For multidimensional datasets, a tuple of dimensions (d1, d2, ..., dN) may also be passed. This is equivalent to passing just dN. In this case, the last dimension (dN) is the one that changes size between the input and output. Note that this is the opposite of FFTW.brfft. The reason is that, in PencilFFTs, the last dimension is the one along which a complex-to-real transform is performed.
plan(transform::AbstractTransform, A, [dims];
- flags=FFTW.ESTIMATE, timelimit=Inf)
Create plan to transform array A along dimensions dims.
If dims is not specified, all dimensions of A are transformed.
For FFT plans, this function wraps the AbstractFFTs.jl and FFTW.jl plan creation functions. For more details on the function arguments, see AbstractFFTs.plan_fft.
Returns the backwards transform associated to the given transform.
The second argument must be the length of the first transformed dimension in the forward transform. It is used in particular when transform = RFFT(), to determine the length of the inverse (complex-to-real) transform. See the AbstractFFTs.irfft docs for details.
The backwards transform returned by this function is not normalised. The normalisation factor for a given array can be obtained by calling scale_factor.
Like AbstractFFTs.bfft, this transform is not normalised. To obtain the inverse transform, divide the output by the length of the transformed dimension.
To obtain the inverse transform, divide the output by the length of the transformed dimension (of the real output array).
As described in the AbstractFFTs docs, the length of the output cannot be fully inferred from the input length. For this reason, the BRFFT constructor accepts an optional d argument indicating the output length.
For multidimensional datasets, a tuple of dimensions (d1, d2, ..., dN) may also be passed. This is equivalent to passing just dN. In this case, the last dimension (dN) is the one that changes size between the input and output. Note that this is the opposite of FFTW.brfft. The reason is that, in PencilFFTs, the last dimension is the one along which a complex-to-real transform is performed.
plan(transform::AbstractTransform, A, [dims];
+ flags=FFTW.ESTIMATE, timelimit=Inf)
Create plan to transform array A along dimensions dims.
If dims is not specified, all dimensions of A are transformed.
For FFT plans, this function wraps the AbstractFFTs.jl and FFTW.jl plan creation functions. For more details on the function arguments, see AbstractFFTs.plan_fft.
Returns the backwards transform associated to the given transform.
The second argument must be the length of the first transformed dimension in the forward transform. It is used in particular when transform = RFFT(), to determine the length of the inverse (complex-to-real) transform. See the AbstractFFTs.irfft docs for details.
The backwards transform returned by this function is not normalised. The normalisation factor for a given array can be obtained by calling scale_factor.
Returns the length of the transform output, given the length of its input.
The input and output lengths are specified in terms of the respective input and output datatypes. For instance, for real-to-complex transforms, these are respectively the length of input real data and of output complex data.
Returns the length of the transform output, given the length of its input.
The input and output lengths are specified in terms of the respective input and output datatypes. For instance, for real-to-complex transforms, these are respectively the length of input real data and of output complex data.
This document was generated with Documenter.jl version 0.27.24 on Thursday 11 May 2023. Using Julia version 1.9.0.
diff --git a/previews/PR39/assets/custom.css b/previews/PR63/assets/custom.css
similarity index 100%
rename from previews/PR39/assets/custom.css
rename to previews/PR63/assets/custom.css
diff --git a/previews/PR59/assets/documenter.js b/previews/PR63/assets/documenter.js
similarity index 100%
rename from previews/PR59/assets/documenter.js
rename to previews/PR63/assets/documenter.js
diff --git a/previews/PR39/assets/logo.svg b/previews/PR63/assets/logo.svg
similarity index 100%
rename from previews/PR39/assets/logo.svg
rename to previews/PR63/assets/logo.svg
diff --git a/dev/assets/search.js b/previews/PR63/assets/search.js
similarity index 100%
rename from dev/assets/search.js
rename to previews/PR63/assets/search.js
diff --git a/previews/PR48/assets/themes/documenter-dark.css b/previews/PR63/assets/themes/documenter-dark.css
similarity index 100%
rename from previews/PR48/assets/themes/documenter-dark.css
rename to previews/PR63/assets/themes/documenter-dark.css
diff --git a/previews/PR48/assets/themes/documenter-light.css b/previews/PR63/assets/themes/documenter-light.css
similarity index 100%
rename from previews/PR48/assets/themes/documenter-light.css
rename to previews/PR63/assets/themes/documenter-light.css
diff --git a/previews/PR39/assets/themeswap.js b/previews/PR63/assets/themeswap.js
similarity index 100%
rename from previews/PR39/assets/themeswap.js
rename to previews/PR63/assets/themeswap.js
diff --git a/previews/PR39/assets/tomate.js b/previews/PR63/assets/tomate.js
similarity index 100%
rename from previews/PR39/assets/tomate.js
rename to previews/PR63/assets/tomate.js
diff --git a/previews/PR48/assets/warner.js b/previews/PR63/assets/warner.js
similarity index 100%
rename from previews/PR48/assets/warner.js
rename to previews/PR63/assets/warner.js
diff --git a/previews/PR62/benchmarks/index.html b/previews/PR63/benchmarks/index.html
similarity index 97%
rename from previews/PR62/benchmarks/index.html
rename to previews/PR63/benchmarks/index.html
index dd0010bc..ec6c0359 100644
--- a/previews/PR62/benchmarks/index.html
+++ b/previews/PR63/benchmarks/index.html
@@ -9,4 +9,4 @@
width="75%"
src="../img/benchmark_idris.svg"
alt="Strong scaling of PencilFFTs">
-
As seen above, PencilFFTs generally outperforms P3DFFT in its default setting. This is largely explained by the choice of using non-blocking point-to-point MPI communications (via MPI_Isend and MPI_Irecv), while P3DFFT uses collective MPI_Alltoallv calls. This enables PencilFFTs to perform data reordering operations on the partially received data while waiting for the incoming data, leading to better performance. Moreover, in contrast with P3DFFT, the high performance and scalability of PencilFFTs results from a highly generic code, handling decompositions in arbitrary dimensions and a relatively large (and extensible) variety of transformations.
Note that PencilFFTs can optionally use collective communications (using MPI_Alltoallv) instead of point-to-point communications. For details, see the docs for PencilFFTPlan and for PencilArray transpositions. As seen above, collective communications generally perform worse than point-to-point ones, and runtimes are nearly indistinguishable from those of P3DFFT.
The benchmarks were performed using Julia 1.7-beta3 and Intel MPI 2019. We used PencilFFTs v0.12.5 with FFTW.jl v1.4.3 and MPI.jl v0.19.0. We used the Fortran implementation of P3DFFT, version 2.7.6, which was built with Intel 2019 compilers and linked to FFTW 3.3.8. The cluster where the benchmarks were run has Intel Cascade Lake 6248 processors with 2×20 cores per node.
The number of MPI processes along each decomposed dimension, $P_1$ and $P_2$, was automatically determined by a call to MPI_Dims_create, which tends to create a balanced decomposition with $P_1 ≈ P_2$. For instance, a total of 1024 processes is divided into $P_1 = P_2 = 32$. Different results may be obtained with other combinations, but this was not benchmarked.
The source files used to generate this benchmark, as well as the raw benchmark results, are all available in the PencilFFTs repo.
Settings
This document was generated with Documenter.jl version 0.27.24 on Wednesday 15 March 2023. Using Julia version 1.9.0-rc1.
+
As seen above, PencilFFTs generally outperforms P3DFFT in its default setting. This is largely explained by the choice of using non-blocking point-to-point MPI communications (via MPI_Isend and MPI_Irecv), while P3DFFT uses collective MPI_Alltoallv calls. This enables PencilFFTs to perform data reordering operations on the partially received data while waiting for the incoming data, leading to better performance. Moreover, in contrast with P3DFFT, the high performance and scalability of PencilFFTs results from a highly generic code, handling decompositions in arbitrary dimensions and a relatively large (and extensible) variety of transformations.
Note that PencilFFTs can optionally use collective communications (using MPI_Alltoallv) instead of point-to-point communications. For details, see the docs for PencilFFTPlan and for PencilArray transpositions. As seen above, collective communications generally perform worse than point-to-point ones, and runtimes are nearly indistinguishable from those of P3DFFT.
The benchmarks were performed using Julia 1.7-beta3 and Intel MPI 2019. We used PencilFFTs v0.12.5 with FFTW.jl v1.4.3 and MPI.jl v0.19.0. We used the Fortran implementation of P3DFFT, version 2.7.6, which was built with Intel 2019 compilers and linked to FFTW 3.3.8. The cluster where the benchmarks were run has Intel Cascade Lake 6248 processors with 2×20 cores per node.
The number of MPI processes along each decomposed dimension, $P_1$ and $P_2$, was automatically determined by a call to MPI_Dims_create, which tends to create a balanced decomposition with $P_1 ≈ P_2$. For instance, a total of 1024 processes is divided into $P_1 = P_2 = 32$. Different results may be obtained with other combinations, but this was not benchmarked.
The source files used to generate this benchmark, as well as the raw benchmark results, are all available in the PencilFFTs repo.
Settings
This document was generated with Documenter.jl version 0.27.24 on Thursday 11 May 2023. Using Julia version 1.9.0.