-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
53 lines (38 loc) · 1.57 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
dfftlib 0.2 - Distributed FFT library
=====================================
This library supports fast d-dimensional distributed FFT over MPI, using a
local FFT library as a hardware-optimized backend. The supported local FFT
implementations include host (MKL) and CUDA device (CUFFT) implementations.
The local FFT functions have been abstracted in a way that allows adding new
backends easily.
All standard MPI implementations are supported. In addition, the library
supports "CUDA-aware" MPI-libraries (MVAPICH2 >= 1.8, OpenMPI 1.7).
Please see their documentation for how to enable CUDA-MPI communication.
The distributed FFT is based upon a modified algorithm from
"Parallel Scientific Computation", Rob H. Bisseling,
Oxford University Press 2004.
INSTALLATION
------------
To install, type
$ tar xvfz dfftlib-0.2.tar.gz
$ mkdir dfftlib-build
$ cd dfftlib-build
$ cmake -D CMAKE_INSTALL_PREFIX=<your-install-root> ../dfftlib-0.2
$ make install
Prerequisites (optional):
- CUDA (tested with 5.0)
- Intel MKL (tested with 11.0.4.183)
If no CUDA toolkit is available, only the host backend will be built.
If MKL is not available, an internal radix-2 FFT routine will be used.
Using the internal FFT is not recommended for benchmarking.
DOCUMENTATION
-------------
The documentation is rudimentary at this point. For examples, please have a
look at the unit tests, test/unit_test_host.c and test/unit_test_cuda.c.
LIMITATIONS
-----------
All FFT dimensions have to be powers of two. The number of processor
has to be a power of two as well.
CONTACT
-------
Author contact: [email protected]