Allow specification for GPU device index #96

jwallwork23 · 2024-03-22T09:24:36Z

Closes #85.

The main change associated with this PR is allowing the GPU device index to be specified for the following functions and subroutines:

torch_zeros (C++) / torch_tensor_zeros (Fortran)
torch_ones (C++) / torch_tensor_zeros (Fortran)
torch_empty (C++)
torch_from_blob (C++) / torch_tensor_from_blob (Fotran)
torch_jit_load (C++) / torch_module_load (Fortran)
torch_tensor_from_array_${PREC}$_${RANK}$d (Fortran)

To avoid confusion/ambiguity, device is replaced by device_type in several places in the code, as device_type and device_index are consistent with the naming used in CUDA.

The GPU device index is specified using an additional argument, although this is made optional both in C++ and Fortran to ensure that the examples can be run without modification. In the case of torch_jit_load / torch_module_load, the device_type also needed to be added as an optional argument to support the new functionality.

If unset:

device_type defaults to torch_kCPU
device_index defaults to -1 if device_type is torch_kCPU and 0 if device_type is torch_kGPU.

New functions called torch_tensor_get_device_index are introduced so that we can test the new functionality.

jwallwork23 · 2024-03-22T09:27:15Z

Here is the test that I used:

! Import precision info from iso
use, intrinsic :: iso_fortran_env, only : sp => real32

! Import our library for interfacing with PyTorch
use ftorch

! Import MPI
use mpi

implicit none

! Set precision for reals
integer, parameter :: wp = sp

! Set up Fortran data structures
real(wp), dimension(5), target :: in_data
integer :: tensor_layout(1) = [1]

! Set up Torch data structures
type(torch_tensor) :: in_tensor
integer :: device_type
integer :: device_index

! MPI configuration
integer rank, ierr

call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, rank, ierr)

! Initialise data
in_data = [0.0, 1.0, 2.0, 3.0, 4.0]

! Loop over device type torch_kCPU and torch_kGPU
do device_type = 0, 1
  if (device_type == torch_kCPU) then
    device_index = - 1
  else
    device_index = rank
  end if

  ! Create Torch input tensor from the above arrays
  in_tensor = torch_tensor_from_array(in_data, tensor_layout, device_type, device_index)

  ! Print some information
  if (torch_tensor_get_device_index(in_tensor) == device_index) then
    write(*, *) rank, "PASS"
  else
    write(*, *) rank, "expected index ", device_index, "got ", torch_tensor_get_device_index(in_tensor)
  end if

  ! Cleanup
  call torch_tensor_delete(in_tensor)
end do
call mpi_finalize(ierr)

end program test_device_index

If run on my laptop (CPU-only), I get the output

           0 PASS
           1 PASS
           2 PASS
           3 PASS
[ERROR]: invalid device index 0 for device count [ERROR]: invalid device index 1 for device count 0, using zero instead
[ERROR]: invalid device index 2 for device count 0, using zero instead
[ERROR]: invalid device index 3 for device count 0, using zero instead
0, using zero instead
[ERROR]: PyTorch is not linked with support for cuda devices
[ERROR]: PyTorch is not linked with support for cuda devices
[ERROR]: PyTorch is not linked with support for cuda devices
[ERROR]: PyTorch is not linked with support for cuda devices

which confirms that the CPU case works, but obviously the GPU case isn't going to work.

If I run on Wilkes3 with four GPUs and four MPI processes, I get the output

           3 PASS
           2 PASS
           0 PASS
           1 PASS
           0 PASS
           1 PASS
           3 PASS
           2 PASS

which confirms that the GPU case works, too.

jatkinson1000

Amazing!
At a quick glance this looks good - I'll do a detailed review later when I have some time.
One quick comment before then - can you provide some simple instructions on how I can check/verify this is working on CSD3/elsewhere?

We will probably want an example adding to the examples/ and some info adding to the docs once the code is settled before it goes in.

jwallwork23 · 2024-03-22T09:43:52Z

One quick comment before then - can you provide some simple instructions on how I can check/verify this is working on CSD3/elsewhere?

Sure. I created a new branch to demonstrate the testing: 85_gpu_device_number_test. Would you like me to include the Slurm scripts, too?

jwallwork23 · 2024-03-27T15:19:23Z

Okay, this is ready for re-review! I added some docs and managed to get example 3 working on Wilkes3, giving the following output for 2 GPUs:

input on rank0: [  0.0,  1.0,  2.0,  3.0,  4.0]
input on rank1: [  1.0,  2.0,  3.0,  4.0,  5.0]
output on rank1: [  2.0,  4.0,  6.0,  8.0, 10.0]
output on rank0: [  0.0,  2.0,  4.0,  6.0,  8.0]

Will test it for 4 GPUs, too, but don't anticipate any issues.

jatkinson1000

This is great addition @jwallwork23
The new docs and example read really well.

Added a couple of points that I feel would make things clearer for me as an external reader, feel free to incorporate or not.

Once we've resolved these I think we're good to go!

examples/3_MultiGPU/README.md

pages/gpu.md

src/ctorch.cpp

Co-authored-by: jatkinson1000 <[email protected]>

jwallwork23 · 2024-03-28T09:24:30Z

Thanks @jatkinson1000, this is now ready for re-review.

Will test it for 4 GPUs, too, but don't anticipate any issues.

I can confirm that this worked (with the updated 85_gpu_device_number_test branch) on Wilkes3, giving output

input on rank1: [  1.0,  2.0,  3.0,  4.0,  5.0]
input on rank2: [  2.0,  3.0,  4.0,  5.0,  6.0]
input on rank3: [  3.0,  4.0,  5.0,  6.0,  7.0]
input on rank0: [  0.0,  1.0,  2.0,  3.0,  4.0]
output on rank1: [  2.0,  4.0,  6.0,  8.0, 10.0]
output on rank2: [  4.0,  6.0,  8.0, 10.0, 12.0]
output on rank3: [  6.0,  8.0, 10.0, 12.0, 14.0]
output on rank0: [  0.0,  2.0,  4.0,  6.0,  8.0]

jatkinson1000

Thanks @jwallwork23 This is a great addition!

All looking good to me now so I'll squash and merge shortly.

* Have get_device use torch::Device * Add device_number arg for get_device * Throw error if device_number used in CPU-only case * Disallow negative device number * Actually use the device number * Use device number for torch_zeros * Use device number for torch_ones * Use device number for torch_empty * Use device number for torch_from_blob * Device and device number args for torch_module_load * Pass device and device number to torch_jit_load by value * Make device number argument to torch_module_load optional * Make device number argument to torch_tensor_from_array optional * Make device number argument to other subroutines optional * Make device argument to torch_module_load optional * Add function for determining device_index * Rename device number as index * Rename device as device type * Device index defaults to -1 on CPU and 0 on GPU * Make device type and index optional on C++ side * Fix typo in torch_model_load * Fix typos in example 1 * Initial draft of example 3_MultiGPU * Differentiate between errors and warnings in C++ code * Formatting * Add mpi4py to requirements for example 3 * Use mpi4py to differ inputs in simplenet_infer_python * Raise ValueError for Python inference with invalid device * Print rank in Python case; updates to README * Setup MPI for simplenet_infer_fortran, too * Write formatting for example 3 * Add note on building with Make * Print before and after; mpi_finalise; output on CPU; comments * Docs: device->device_type for consistency * Add docs on MultiGPU * Update warning text for defaulting to 0 Co-authored-by: jatkinson1000 <[email protected]> * Mention MPI in requirements * Update outputs for example 3 * Use NP rather than 4 GPUs * Implement SimpleNet in example 3 but with a twist * Add code snippets for multi-GPU doc section * Add note about multiple GPU support to README.md. --------- Co-authored-by: jatkinson1000 <[email protected]> Co-authored-by: Jack Atkinson <[email protected]>

jwallwork23 added 21 commits March 19, 2024 10:44

Have get_device use torch::Device

83ff00a

Add device_number arg for get_device

a392900

Throw error if device_number used in CPU-only case

2552c91

Disallow negative device number

9b0b7dd

Actually use the device number

e44e3e6

Use device number for torch_zeros

cf39472

Use device number for torch_ones

01b8063

Use device number for torch_empty

530fa19

Use device number for torch_from_blob

af7a8af

Device and device number args for torch_module_load

e2fe070

Pass device and device number to torch_jit_load by value

fd729a3

Make device number argument to torch_module_load optional

3b3e62c

Make device number argument to torch_tensor_from_array optional

5fe34b0

Make device number argument to other subroutines optional

3fe5258

Make device argument to torch_module_load optional

9ed2452

Add function for determining device_index

fbc6a12

Rename device number as index

58d28ed

Rename device as device type

682d887

Device index defaults to -1 on CPU and 0 on GPU

2d9698c

Make device type and index optional on C++ side

ca40777

Fix typo in torch_model_load

e37f743

jwallwork23 added the enhancement New feature or request label Mar 22, 2024

jwallwork23 requested review from TomMelt and jatkinson1000 March 22, 2024 09:24

jwallwork23 self-assigned this Mar 22, 2024

jatkinson1000 reviewed Mar 22, 2024

View reviewed changes

This was referenced Mar 22, 2024

User ability to decide GPU device number #85

Closed

GPU Device Number #86

Closed

jwallwork23 added 4 commits March 25, 2024 09:49

Add mpi4py to requirements for example 3

fc18b52

Use mpi4py to differ inputs in simplenet_infer_python

2b0086a

Raise ValueError for Python inference with invalid device

fced4c1

Print rank in Python case; updates to README

188b305

jwallwork23 force-pushed the 85_gpu_device_number branch from afa93d8 to 188b305 Compare March 25, 2024 12:31

jwallwork23 added 3 commits March 25, 2024 13:01

Setup MPI for simplenet_infer_fortran, too

dcfb153

Write formatting for example 3

392afb9

Add note on building with Make

9fd3040

jwallwork23 mentioned this pull request Mar 26, 2024

Add Long-form API documentation #53

Merged

7 tasks

jwallwork23 added 4 commits March 27, 2024 13:38

Print before and after; mpi_finalise; output on CPU; comments

24d5b6a

Merge branch 'main' into 85_gpu_device_number

a44e262

Docs: device->device_type for consistency

5ebe845

Add docs on MultiGPU

18fca7b

jwallwork23 requested a review from jatkinson1000 March 27, 2024 15:19

jatkinson1000 reviewed Mar 27, 2024

View reviewed changes

jwallwork23 and others added 6 commits March 28, 2024 09:01

Update warning text for defaulting to 0

475a859

Co-authored-by: jatkinson1000 <[email protected]>

Mention MPI in requirements

3f26457

Update outputs for example 3

3dba29a

Use NP rather than 4 GPUs

0e3272e

Implement SimpleNet in example 3 but with a twist

99d3b5b

Add code snippets for multi-GPU doc section

99002d5

jwallwork23 requested a review from jatkinson1000 March 28, 2024 09:24

jatkinson1000 approved these changes Mar 28, 2024

View reviewed changes

Add note about multiple GPU support to README.md.

e2b68bd

jatkinson1000 merged commit 0efa2ba into main Mar 28, 2024
4 checks passed

jatkinson1000 deleted the 85_gpu_device_number branch March 28, 2024 15:08

jwallwork23 mentioned this pull request Mar 28, 2024

GPU example and documentation #71

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow specification for GPU device index #96

Allow specification for GPU device index #96

jwallwork23 commented Mar 22, 2024

jwallwork23 commented Mar 22, 2024

jatkinson1000 left a comment

jwallwork23 commented Mar 22, 2024

jwallwork23 commented Mar 27, 2024

jatkinson1000 left a comment

jwallwork23 commented Mar 28, 2024

jatkinson1000 left a comment

Allow specification for GPU device index #96

Allow specification for GPU device index #96

Conversation

jwallwork23 commented Mar 22, 2024

jwallwork23 commented Mar 22, 2024

jatkinson1000 left a comment

Choose a reason for hiding this comment

jwallwork23 commented Mar 22, 2024

jwallwork23 commented Mar 27, 2024

jatkinson1000 left a comment

Choose a reason for hiding this comment

jwallwork23 commented Mar 28, 2024

jatkinson1000 left a comment

Choose a reason for hiding this comment