-
Notifications
You must be signed in to change notification settings - Fork 4
Parallel selection on GPUs
License
GPL-3.0 and 2 other licenses found
Licenses found
GPL-3.0
LICENSE
BSL-1.0
LICENSE-catch2
BSD-3-Clause
LICENSE-cub
upsj/gpu_selection
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This library implements a bucket-based selection algorithm on GPUs More details can be found in * T. Ribizel and H. Anzt, "Approximate and Exact Selection on GPUs," 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Rio de Janeiro, Brazil, 2019, pp. 471-478. doi: 10.1109/IPDPSW.2019.00088 * T. Ribizel, H. Anzt, "Parallel selection on GPUs," Parallel Computing, Volume 91, 2020, doi: 10.1016/j.parco.2019.102588 It uses Catch2 as a test framework and the CUB library as a reference implementation for sorting. The tests can be run by simply executing `app/unittests`, the benchmarks can be run by executing `app/benchmark` with one of the following parameters [full] The full benchmark for exact single and multiple selection and the individual kernels (sample, count, reduce, filter) [full-multionly] The full benchmark for multiple selection only [approx] The full benchmark for approximate selection with shared-memory atomics [approx-g] The full benchmark for approximate selection with global-memory atomics [multi] The full benchmark for multiple selection with different numbers of ranks [test] A small benchmark that only executes a single benchmark with small input size The output of these tests is the following: On stdout, they print error messages in case the algorithm execution produces invalid results. For the approx tests, additionally the exact and approximate rank are being output in CSV format. On stderr, they print the individual timings of the kernels in CSV format for different input sizes given by the first CSV field. Runtime breakdowns are listed within parentheses (). `app/benchmark-sort` contains a benchmark for the CUB radix sort implementation as a performance baseline for the multiple selection. Structure of the project include/cpu_reference.hpp - Reference implementations for testing include/verification.hpp - Validation functions for testing include/cuda_definitions.cuh - Type definitions and hardware limits include/cuda_error.cuh - Wrapper for CUDA error handling include/cuda_memory.cuh - Wrapper for CUDA memory allocations include/cuda_timer.cuh - Wrapper for CUDA timing measurements include/kernel_config.cuh - Configuration struct for kernel templates include/launcher_fwd.cuh - Forward-declarations of launcher and kernel templates lib/generated/* - Explicit template instantiations to parallelize compilation lib/cpu_reference.cpp - Reference implementations for testing lib/verification.cpp - Validation functions for testing lib/qs_launchers.cuh - Wrappers for quickselect kernels lib/qs_recursion.cuh - Kernels for quickselect single-selection lib/qs_recursion_multi.cuh - Kernels for quickselect multi-selection lib/qs_reduce.cuh - Kernels for reducing quickselect partial sums lib/qs_scan.cuh - Kernels for quickselect bipartitioning lib/ssss_build_searchtree.cuh - Kernels for sampleselect sampling lib/ssss_collect.cuh - Kernels for sampleselect single-selection filtering lib/ssss_collect_multi.cuh - Kernels for sampleselect multi-selection filtering lib/ssss_count.cuh - Kernels for sampleselect counting lib/ssss_launchers.cuh - Wrappers for sampleselect kernels lib/ssss_merged.cuh - Kernels for multiple simultaneous sampleselects lib/ssss_merged_memory.cuh - Auxiliary data structure for sampleselect multi-selection lib/ssss_recursion.cuh - Kernels for sampleselect single-selection lib/ssss_recursion_multi.cuh - Kernels for sampleselect multi-selection lib/ssss_reduce.cuh - Kernels for reducing sampleselect partial sums lib/utils_basecase.cuh - Kernels for recursion basecase lib/utils_bytestorage.cuh - Auxiliary functions for reading/writing unaligned bytes lib/utils_mask.cuh - Auxiliary functions for bitmasks lib/utils_prefixsum.cuh - Auxiliary functions for tree-based partial sums lib/utils_sampling.cuh - Auxiliary functions for sampling lib/utils_search.cuh - Auxiliary functions for binary and warp-ary searches lib/utils_sort.cuh - Auxiliary functions for bitonic sorting lib/utils_warpaggr.cuh - Auxiliary functions for warp-aggregation lib/utils_work.cuh - Auxiliary functions for work-distribution lib/utils.cuh - Auxiliary wrappers for basic operations
About
Parallel selection on GPUs
Topics
Resources
License
GPL-3.0 and 2 other licenses found
Licenses found
GPL-3.0
LICENSE
BSL-1.0
LICENSE-catch2
BSD-3-Clause
LICENSE-cub
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published