change to_array routines to fix pointer issues #175

TomMelt · 2024-10-04T20:46:30Z

this fixes an issue seen in PRs #173 and #166 where the test fails with error:

[ERROR]: Array allocated with wrong shape

Whilst it looks like I have fixed the problem by simply removing the lines... I can assure the reviewer that I didn't do that deliberately 😉

I think I have an idea why it doesn't work, and why those lines need to be removed. It essentially stems from 2 points:

the data_out pointer is declared as intent(out)

FTorch/src/ftorch.f90

Line 2415 in e0d8269

    
           real(kind=real32), pointer, intent(out) :: data_out(:) !! Pointer to tensor data

This means that the following lines don't have much meaning

FTorch/src/ftorch.f90

Lines 2426 to 2442 in e0d8269

    
             if (all(shape(data_out) == 0)) then 
        
               ! If the sizes array has been provided and the output array has not 
        
               ! been allocated (i.e., its shape is all zeros) then allocate it 
        
               allocate(data_out(sizes(1))) 
        
             else if (any(shape(data_out) /= sizes)) then 
        
               ! Raise an error if the sizes array has been provided and the output 
        
               ! array has already been allocated but its shape differs from the sizes 
        
               ! argument 
        
               write (*,*) "[ERROR]: Array allocated with wrong shape" 
        
               stop 
        
             end if 
        
           else if ((.not. associated(data_out)) .or. (all(shape(data_out) == 0))) then 
        
             ! Raise an error if the sizes array has not been provided and the pointer 
        
             ! array has not been allocated 
        
             write (*,*) "[ERROR]: Pointer array has not been allocated" 
        
             stop 
        
           end if

Secondly, sizes is marked optional

FTorch/src/ftorch.f90

Line 2416 in e0d8269

integer, optional, intent(in) :: sizes(1) !! Number of entries for each rank

But that would make the following line undefined, because if sizes is missing on input, it could have any value or just crash

FTorch/src/ftorch.f90

Line 2446 in e0d8269

call c_f_pointer(cptr, data_out, sizes)

I don't see a use case (other than convenience) where you wouldn't need to know the size of your output data. It can be passed in rather easily using shape() etc.

Happy to discuss in our next meeting. I might be overlooking something, but I think this works (at least on my machine)

Update

I have merged in 2 PRs

These resolve the issues mentioned above.

I plan to add a test to the integration test examples/6_Autograd/autograd.f90 to check the new rank/shape functionality I have added.

add test of get_shape() and get_rank() to examples/6_Autograd/autograd.f90

jatkinson1000

Thanks for digging @TomMelt

When @jwallwork23 was setting this up the argument was that if the tensor has only been created on the Torch side (as will happen during backprop etc.) then a user may want to map it back over to a Fortran array that has not yet been allocated - hence the logic required to check this and allocate if required, and size being potentially 'unknown'. I agree we can discuss this, but may also want @jwallwork23's input as I am not sure how likely this situation is to arise (or not) with the autograd applications.

On 1) can you clarify what you you mean by "don't have much meaning" for my own understanding please? As far as I understood intent has no enforcement but can be monitored by compilers.

On 2) I agree that this is an issue, and if the lines were not removed then we would still need to add an assignment of the appropriate value to sizes

If you look at #174 and the corresponding logs from the CI then the shape of out_data is being read as 95 at ingress, even when explicitly passed in as a rank-2 to match sizes=2, and this occurs before we hit this point. So my thoughts were that there was perhaps an issue with how data_out was coming in to the subroutine or being declared. I realise I didn't send you a link to this before, only linked it from the bottom of #166 apologies.
Screenshots and detail also in this comment: #166 (comment)

jatkinson1000 · 2024-10-04T21:51:13Z

After a fresh look I think the issue may arise in the line

        allocate(data_out(sizes(1)#{for i in range(1,RANK)}#,sizes(${i+1}$)#{endfor}#))

That for some reason causes the data_out to be allocated the wrong size on certain occasions.
However, looking at the generated F90 file I see no clear issues, nor how the error seen in the CI would arise...!!

I should also note that I could not generate the error on my local machine.

TomMelt · 2024-10-04T21:55:59Z

Yeah, I didn't do the best job of explaining...

Yes, intent is more of a guideline but in our case (examples/6_Autograd/autograd.f90) we had no prior initializations (see below)

FTorch/examples/6_Autograd/autograd.f90

Lines 16 to 27 in e0d8269

    
           real(wp), dimension(:), pointer :: out_data 
        
           integer :: tensor_layout(1) = [1] 
        
           ! Set up Torch data structures 
        
           type(torch_tensor) :: a 
        
           ! Construct a Torch Tensor from a Fortran array 
        
           in_data(:) = [2.0, 3.0] 
        
           call torch_tensor_from_array(a, in_data, tensor_layout, torch_kCPU) 
        
           ! Extract a Fortran array from a Torch tensor 
        
           call torch_tensor_to_array(a, out_data, shape(in_data))

We define the pointer, but it isn't associated to a target (and in this case that would be a bad thing to do anyway). For our use case it should literally only be intent(out) because any other option would mean we require information coming in. It is in torch_tensor_to_array that we actually want to assign it (see snippet below c_f_pointer). This means that all of the checks we had before "don't make much sense", because we would never want them to be true (even if they were).

FTorch/src/ftorch.f90

Line 2486 in e0d8269

call c_f_pointer(cptr, data_out, sizes)

My point, similar to above is that out_data necessarily doesn't have a shape. My best guess (and it's a guess at best) is that sometimes the pointer is allocated to a random location in memory where it can sometimes have a shape. But as I try to allude to above, when we enter the torch_tensor_to_array function, the pointer should be unassociated and therefore shouldn't have a size or shape etc. The previous code would still work if we assigned out_data explicitly to the null pointer. I have just raised a new PR to demo what I mean (assign out_data to null ptr (DO NOT MERGE) #176) . This means we have to trust the user to be very careful indeed with pointers.... 👀

jatkinson1000 · 2024-10-14T09:20:51Z

Discussion on 14/10/2024

We discussed and agreed the following route forwards:

@TomMelt will bring tensor:sizes() over from the C++ API as its own function to query a torch tensor for size/shape data
This will then feed this PR where we will allow users to optionally specify sizes, but will query the above function if not.
- If sizes is passed in it will be checked against the tensor shape
  - If mismatch flag error to hint users made a mistake, else proceed
- If sizes is not passed in query the tensor and get sizes to assign shape during pointer assignment
Also update the exercises and documentation to be clear that users should set their pointers to =>null() before

This means that people can send a pointer and tensor to a function and get the data back to Fortran without prior knowledge/allocation of the array, as was intended by @jwallwork23 but is done in a safer way with a bit more C++ (which may well be useful anyway).

I will now close #174 #176 #177 #178 As this has been discussed and agreed.

TomMelt · 2024-10-17T16:23:59Z

Before merging this PR, we need to merge:

the tensor derived type now supports two methods: - `get_rank` - `get_shape` These methods return the shape and rank of the tensor

TomMelt · 2024-10-21T14:13:34Z

@jatkinson1000 , given comments in the related PRs (#180 #181) I plan to add a test of the new functionality to this PR to check the shape and size work as expected 👌

examples/6_Autograd/autograd.f90

jatkinson1000

Thanks @TomMelt

This looks great - nice to include testing of the new routines in the integration tests for now, and I like the "tests passed successfully" workaround for the regex ;)

I agree with @jwallwork23 that we would need to be careful in some instances with a matching condition if we were manipulating data, but given that in this particular case everything is pointing at the same memory I think we are OK in this instance.
Of course, as @jwallwork23 develops autograd further this may need to be revisited, or the components of this current example may become unit tests.

change to_array routines to fix pointer issues

ab8096d

TomMelt added the bug Something isn't working label Oct 4, 2024

TomMelt requested review from jwallwork23 and jatkinson1000 October 4, 2024 20:46

TomMelt self-assigned this Oct 4, 2024

jatkinson1000 reviewed Oct 4, 2024

View reviewed changes

TomMelt mentioned this pull request Oct 4, 2024

assign out_data to null ptr (DO NOT MERGE) #176

Closed

jatkinson1000 mentioned this pull request Oct 8, 2024

Set pointer in example 6 to null #177

Closed

This was referenced Oct 14, 2024

142 test assert rebase #174

Closed

DO NOT MERGE PROOF OF CONCEPT FOR #177 #178

Closed

jwallwork23 approved these changes Oct 21, 2024

View reviewed changes

jatkinson1000 mentioned this pull request Oct 21, 2024

Extend FTorch to cover 5-dimensional tensors. #173

Merged

TomMelt added 2 commits October 21, 2024 10:11

wrap libtorch rank and shape routines

178d40d

the tensor derived type now supports two methods: - `get_rank` - `get_shape` These methods return the shape and rank of the tensor

feat: use get_shape in to_array subroutines

f56cdd4

test: update 6_Autograd test to check shape and rank

1acdcab

TomMelt requested a review from jwallwork23 October 22, 2024 11:16

jwallwork23 reviewed Oct 22, 2024

View reviewed changes

examples/6_Autograd/autograd.f90 Show resolved Hide resolved

jatkinson1000 approved these changes Oct 22, 2024

View reviewed changes

TomMelt merged commit 6f54385 into main Oct 22, 2024
6 checks passed

TomMelt deleted the bugfix-pointer-problems branch October 22, 2024 11:31

jwallwork23 mentioned this pull request Oct 22, 2024

Use assertions rather than print statements in tests #166

Merged

jatkinson1000 mentioned this pull request Oct 23, 2024

Incompatibility with MacOS arm libtorch #183

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change to_array routines to fix pointer issues #175

change to_array routines to fix pointer issues #175

TomMelt commented Oct 4, 2024 •

edited

Loading

jatkinson1000 left a comment •

edited

Loading

jatkinson1000 commented Oct 4, 2024

TomMelt commented Oct 4, 2024 •

edited

Loading

jatkinson1000 commented Oct 14, 2024

TomMelt commented Oct 17, 2024

TomMelt commented Oct 21, 2024

jatkinson1000 left a comment

	if (all(shape(data_out) == 0)) then
	! If the sizes array has been provided and the output array has not
	! been allocated (i.e., its shape is all zeros) then allocate it
	allocate(data_out(sizes(1)))
	else if (any(shape(data_out) /= sizes)) then
	! Raise an error if the sizes array has been provided and the output
	! array has already been allocated but its shape differs from the sizes
	! argument
	write (,) "[ERROR]: Array allocated with wrong shape"
	stop
	end if
	else if ((.not. associated(data_out)) .or. (all(shape(data_out) == 0))) then
	! Raise an error if the sizes array has not been provided and the pointer
	! array has not been allocated
	write (,) "[ERROR]: Pointer array has not been allocated"
	stop
	end if

change to_array routines to fix pointer issues #175

change to_array routines to fix pointer issues #175

Conversation

TomMelt commented Oct 4, 2024 • edited Loading

Update

jatkinson1000 left a comment • edited Loading

Choose a reason for hiding this comment

jatkinson1000 commented Oct 4, 2024

TomMelt commented Oct 4, 2024 • edited Loading

jatkinson1000 commented Oct 14, 2024

TomMelt commented Oct 17, 2024

TomMelt commented Oct 21, 2024

jatkinson1000 left a comment

Choose a reason for hiding this comment

TomMelt commented Oct 4, 2024 •

edited

Loading

jatkinson1000 left a comment •

edited

Loading

TomMelt commented Oct 4, 2024 •

edited

Loading