Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy lib with shm + IPC scenario doesn't work #777

Open
bratpiorka opened this issue Oct 7, 2024 · 8 comments
Open

proxy lib with shm + IPC scenario doesn't work #777

bratpiorka opened this issue Oct 7, 2024 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@bratpiorka
Copy link
Contributor

Please provide a reproduction of the bug:

Currently, 2 process scenario where the first process (client) uses UMF through Proxy Lib with UMF_PROXY="page.disposition=shared-shm" and sends IPC handles to the server, and second process (server) got data from the client and opens IPC handle doesn't work.
This is because the server needs to know the name of the shm, as it is needed as an argument to the OS provider create.

How often bug is revealed:

always

Actual behavior:

Server process call to OpenIPCHandle() returns error.

Additional information about Priority and Help Requested:

Are you willing to submit a pull request with a proposed change? Yes

Requested priority: High

@bratpiorka bratpiorka added the bug Something isn't working label Oct 7, 2024
@bratpiorka bratpiorka self-assigned this Oct 7, 2024
@vinser52
Copy link
Contributor

vinser52 commented Oct 7, 2024

I thought the shared memory name is part of the IPC handle. Or I did not get the exact root cause of the problem. On the server side the os_open_ipc_handle function should use the name from the IPC handle.
@ldorau could you please clarify on this?

@vinser52
Copy link
Contributor

vinser52 commented Oct 7, 2024

I just checked the source code, and as I can see the OS provider does not use shm_name from the IPC handle. @ldorau is it a bug?

@bratpiorka
Copy link
Contributor Author

Currently, we open shm that was set during the provider initialization, see https://github.com/oneapi-src/unified-memory-framework/blob/main/src/provider/provider_os_memory.c#L1255

When the user calls OpenIPCHandle() function, it needs a pool handle as an argument, so the provider must already be initialized. Since the shm name is an argument in the provider create function, there is no way to set or change it after initialization. So the one way to support the scenario from this issue is to first get the name of shm in client process, then send it to the server and finally let the server create the provider with shm name as an arg.

I think we can't simply change the code and open the shm from the ipc handle - please consider the scenario where the provider was created with one shm and IPC handle points to the second shm - this is fine to return error in this case.

@vinser52
Copy link
Contributor

vinser52 commented Oct 7, 2024

Currently, we open shm that was set during the provider initialization, see https://github.com/oneapi-src/unified-memory-framework/blob/main/src/provider/provider_os_memory.c#L1255

When the user calls OpenIPCHandle() function, it needs a pool handle as an argument, so the provider must already be initialized. Since the shm name is an argument in the provider create function, there is no way to set or change it after initialization. So the one way to support the scenario from this issue is to first get the name of shm in client process, then send it to the server and finally let the server create the provider with shm name as an arg.

I think we can't simply change the code and open the shm from the ipc handle - please consider the scenario where the provider was created with one shm and IPC handle points to the second shm - this is fine to return error in this case.

First, let's forget about IPC and consider a simpler scenario. How two providers from different processes can use the same SHM region? I think it is an error.

The second question regarding IPC, the os_open_ipc_handle should use shm_name from the ipc handle not the the shm_name from the local provider. It is exactly the idea of IPC when one process create memory provider/pool with some config and can expose the memory to another process via IPC handles.

@bratpiorka
Copy link
Contributor Author

@vinser52 what about devdax and file provider? Can two processes use the same devdax or file provider?

@vinser52
Copy link
Contributor

vinser52 commented Oct 8, 2024

@vinser52 what about devdax and file provider? Can two processes use the same devdax or file provider?

The same considerations applied to all memory providers that works with memory that is visible/accessible by multiple processes. If two providers allocates from the same physical memory locations then you need a special pool manager that can coordinates between multiple pools (UMF does not have such pool manager, and AFAIK PMDK also had no such one). For cross-process communication we have IPC functionality when memory allocated by one process can be accessed by another process via IPC handle, but allocations from the same physical memory is not supported by multiple pools.

Regarding shared memory, file or dax memory providers, I can imagine when different process can use different locations from shm/file/devdax: e.g. pool 1 uses offsets from 0 to n1, pool2 uses offsets from n1 to n2, etc.

@bratpiorka
Copy link
Contributor Author

ok, so for the OS provider the fix is here: #779 ,
but other providers should also be fixed.

One additional question - what about opening an IPC handle from a provider of other types? E.g. the user requests a file-based pool to open a handle from GPU memory. This will not work and is a user error, right?

@vinser52
Copy link
Contributor

vinser52 commented Oct 8, 2024

ok, so for the OS provider the fix is here: #779

great.

but other providers should also be fixed.

Yes if they have similar issue.

One additional question - what about opening an IPC handle from a provider of other types? E.g. the user requests a file-based pool to open a handle from GPU memory. This will not work and is a user error, right?

Yeah, different providers are not allowed. The same memory provider type should be used to get and open IPC handles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants