Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change to_array routines to fix pointer issues #175

Merged
merged 4 commits into from
Oct 22, 2024
Merged

Conversation

TomMelt
Copy link
Member

@TomMelt TomMelt commented Oct 4, 2024

this fixes an issue seen in PRs #173 and #166 where the test fails with error:

[ERROR]: Array allocated with wrong shape

Whilst it looks like I have fixed the problem by simply removing the lines... I can assure the reviewer that I didn't do that deliberately 😉

I think I have an idea why it doesn't work, and why those lines need to be removed. It essentially stems from 2 points:

  1. the data_out pointer is declared as intent(out)
    real(kind=real32), pointer, intent(out) :: data_out(:) !! Pointer to tensor data

    This means that the following lines don't have much meaning

    FTorch/src/ftorch.f90

    Lines 2426 to 2442 in e0d8269

    if (all(shape(data_out) == 0)) then
    ! If the sizes array has been provided and the output array has not
    ! been allocated (i.e., its shape is all zeros) then allocate it
    allocate(data_out(sizes(1)))
    else if (any(shape(data_out) /= sizes)) then
    ! Raise an error if the sizes array has been provided and the output
    ! array has already been allocated but its shape differs from the sizes
    ! argument
    write (*,*) "[ERROR]: Array allocated with wrong shape"
    stop
    end if
    else if ((.not. associated(data_out)) .or. (all(shape(data_out) == 0))) then
    ! Raise an error if the sizes array has not been provided and the pointer
    ! array has not been allocated
    write (*,*) "[ERROR]: Pointer array has not been allocated"
    stop
    end if
  2. Secondly, sizes is marked optional
    integer, optional, intent(in) :: sizes(1) !! Number of entries for each rank

    But that would make the following line undefined, because if sizes is missing on input, it could have any value or just crash
    call c_f_pointer(cptr, data_out, sizes)

I don't see a use case (other than convenience) where you wouldn't need to know the size of your output data. It can be passed in rather easily using shape() etc.

Happy to discuss in our next meeting. I might be overlooking something, but I think this works (at least on my machine)

Update

I have merged in 2 PRs

These resolve the issues mentioned above.

I plan to add a test to the integration test examples/6_Autograd/autograd.f90 to check the new rank/shape functionality I have added.

  • add test of get_shape() and get_rank() to examples/6_Autograd/autograd.f90

@TomMelt TomMelt added the bug Something isn't working label Oct 4, 2024
@TomMelt TomMelt self-assigned this Oct 4, 2024
Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging @TomMelt

When @jwallwork23 was setting this up the argument was that if the tensor has only been created on the Torch side (as will happen during backprop etc.) then a user may want to map it back over to a Fortran array that has not yet been allocated - hence the logic required to check this and allocate if required, and size being potentially 'unknown'. I agree we can discuss this, but may also want @jwallwork23's input as I am not sure how likely this situation is to arise (or not) with the autograd applications.

On 1) can you clarify what you you mean by "don't have much meaning" for my own understanding please? As far as I understood intent has no enforcement but can be monitored by compilers.

On 2) I agree that this is an issue, and if the lines were not removed then we would still need to add an assignment of the appropriate value to sizes

If you look at #174 and the corresponding logs from the CI then the shape of out_data is being read as 95 at ingress, even when explicitly passed in as a rank-2 to match sizes=2, and this occurs before we hit this point. So my thoughts were that there was perhaps an issue with how data_out was coming in to the subroutine or being declared. I realise I didn't send you a link to this before, only linked it from the bottom of #166 apologies.
Screenshots and detail also in this comment: #166 (comment)

@jatkinson1000
Copy link
Member

After a fresh look I think the issue may arise in the line

        allocate(data_out(sizes(1)#{for i in range(1,RANK)}#,sizes(${i+1}$)#{endfor}#))

That for some reason causes the data_out to be allocated the wrong size on certain occasions.
However, looking at the generated F90 file I see no clear issues, nor how the error seen in the CI would arise...!!

I should also note that I could not generate the error on my local machine.

@TomMelt
Copy link
Member Author

TomMelt commented Oct 4, 2024

Yeah, I didn't do the best job of explaining...

  1. Yes, intent is more of a guideline but in our case (examples/6_Autograd/autograd.f90) we had no prior initializations (see below)

    real(wp), dimension(:), pointer :: out_data
    integer :: tensor_layout(1) = [1]
    ! Set up Torch data structures
    type(torch_tensor) :: a
    ! Construct a Torch Tensor from a Fortran array
    in_data(:) = [2.0, 3.0]
    call torch_tensor_from_array(a, in_data, tensor_layout, torch_kCPU)
    ! Extract a Fortran array from a Torch tensor
    call torch_tensor_to_array(a, out_data, shape(in_data))

    We define the pointer, but it isn't associated to a target (and in this case that would be a bad thing to do anyway). For our use case it should literally only be intent(out) because any other option would mean we require information coming in. It is in torch_tensor_to_array that we actually want to assign it (see snippet below c_f_pointer). This means that all of the checks we had before "don't make much sense", because we would never want them to be true (even if they were).
    call c_f_pointer(cptr, data_out, sizes)

  2. My point, similar to above is that out_data necessarily doesn't have a shape. My best guess (and it's a guess at best) is that sometimes the pointer is allocated to a random location in memory where it can sometimes have a shape. But as I try to allude to above, when we enter the torch_tensor_to_array function, the pointer should be unassociated and therefore shouldn't have a size or shape etc. The previous code would still work if we assigned out_data explicitly to the null pointer. I have just raised a new PR to demo what I mean (assign out_data to null ptr (DO NOT MERGE) #176) . This means we have to trust the user to be very careful indeed with pointers.... 👀

@jatkinson1000
Copy link
Member

Discussion on 14/10/2024

We discussed and agreed the following route forwards:

  • @TomMelt will bring tensor:sizes() over from the C++ API as its own function to query a torch tensor for size/shape data
  • This will then feed this PR where we will allow users to optionally specify sizes, but will query the above function if not.
    • If sizes is passed in it will be checked against the tensor shape
      • If mismatch flag error to hint users made a mistake, else proceed
    • If sizes is not passed in query the tensor and get sizes to assign shape during pointer assignment
  • Also update the exercises and documentation to be clear that users should set their pointers to =>null() before

This means that people can send a pointer and tensor to a function and get the data back to Fortran without prior knowledge/allocation of the array, as was intended by @jwallwork23 but is done in a safer way with a bit more C++ (which may well be useful anyway).

I will now close #174 #176 #177 #178 As this has been discussed and agreed.

@TomMelt
Copy link
Member Author

TomMelt commented Oct 17, 2024

the tensor derived type now supports two methods:
- `get_rank`
- `get_shape`

These methods return the shape and rank of the tensor
@TomMelt
Copy link
Member Author

TomMelt commented Oct 21, 2024

@jatkinson1000 , given comments in the related PRs (#180 #181) I plan to add a test of the new functionality to this PR to check the shape and size work as expected 👌

Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TomMelt

This looks great - nice to include testing of the new routines in the integration tests for now, and I like the "tests passed successfully" workaround for the regex ;)

I agree with @jwallwork23 that we would need to be careful in some instances with a matching condition if we were manipulating data, but given that in this particular case everything is pointing at the same memory I think we are OK in this instance.
Of course, as @jwallwork23 develops autograd further this may need to be revisited, or the components of this current example may become unit tests.

@TomMelt TomMelt merged commit 6f54385 into main Oct 22, 2024
6 checks passed
@TomMelt TomMelt deleted the bugfix-pointer-problems branch October 22, 2024 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants