Disable intra-process ,Only use inter-process communication #523

smileghp · 2024-11-22T08:56:28Z

Due to the specific application scenario, I only want to use inter-process communication and not intra-process communication. For example, within the same process, there is a writer_1 and a reader_1, while in another process, there is a reader_2. They are all on the same topic. When writer_1 sends a message, only reader_2 should receive the message, while reader_1 should not receive it. How can this be configured or changed?

elBoberido · 2024-11-22T17:26:33Z

Interesting idea. We did not think about something like this. Can you tell something about the use case for such a feature.

Currently, this is unfortunately not easily possible. A workaround could be to use the user-header feature to publish the PID and on the subscriber side drop all samples with the PID of the same process.

elfenpiff · 2024-11-24T01:43:09Z

@smileghp Currently this does not work, but you are able to identify the origin of a sample and can discard it.

The publisher has a unique id that is stored in every samples header. You can acquire it via

let publisher_id = publisher.publisher_id();

On the subscriber side you can acquire the publisher id in the same fashion and discard the sample when it comes from the local publisher

while let Some(sample) = subscriber.receive()? {
    if publisher_id == sample.header().publisher_id() {
        continue; // do not handle the sample
    }
}

smileghp · 2024-11-25T05:27:56Z

@elBoberido performance optimization is also one of my job responsibilities. For intra-process communication, sometimes passing object pointers of data entities is more efficient, thereby avoiding unnecessary serialization processes. In my own project, I operate this way, so I need to disable the intra-process functionality of IOX2."

smileghp · 2024-11-25T05:29:34Z

@elfenpiff Thank you very much. Before the new version comes out, I can try the method you mentioned, but this will waste some performance in data transmission

smileghp · 2024-11-25T05:33:31Z

Interesting idea. We did not think about something like this. Can you tell something about the use case for such a feature.

Currently, this is unfortunately not easily possible. A workaround could be to use the user-header feature to publish the PID and on the subscriber side drop all samples with the PID of the same process.

Yes, in DDS, it is implemented by identifying the recipient's PID in transmit process, rather than filtering only at the time of subscription. I think this approach is more efficient."

elfenpiff · 2024-11-25T07:58:00Z

@smileghp

performance optimization is also one of my job responsibilities. For intra-process communication, sometimes passing object pointers of data entities is more efficient, thereby avoiding unnecessary serialization processes.

This exactly what is happening with iceoryx2 under the hood. Also, you do not need serialization when using zero-copy communication. We already provide containers that are shared-memory compatible, see this example that introduces the FixedSize{String|Vec} https://github.com/eclipse-iceoryx/iceoryx2/blob/main/examples/rust/complex_data_types/complex_data_types.rs

When a publisher delivers a message to a subscriber, it iterates over a vector and copies a pointer (8 byte) into the receiver buffer of the subscriber. I am a bit skeptical if a filter on the publisher side is more efficient than on the subscriber side - maybe it is even slower. The reason might be that the publisher always has to check the filter for all subscribers, but on the subscriber's side, you only need to activate it when you explicitly need it.

However, when you combine this with events and wake up the other process/thread just so that it can filter out the sample, then it will cost you a lot of performance.

Yes, in DDS, it is implemented by identifying the recipient's PID in transmit process, rather than filtering only at the time of subscription. I think this approach is more efficient.

For cyclone dds for instance I know the details. It is far more efficient since it than utilizes a zero-copy behavior where just a pointer to the payload is shared (the are using classic iceoryx for this). This is far more efficient but this is how zero-copy works so such kind of optimizations will maybe gain you nothing when you already use a zero-copy framework.

zero-copy communication in iceoryx2

Here is a brief overview how zero-copy in iceoryx2 works:

Publisher::loan()&Publisher::loan_uninit() provides memory that is shared between processes. The returned sample is you payload and is written directly inside the shared memory. If you have a data producing function, the best way to utilize zero-copy is to provide a data pointer into the producing function and produce the data directly in shared memory.
Publisher::send() iterates over the list of all Subscribers and copies a relative pointer to the shared memory of 8 bytes into the Subscribers receiver buffer.
Subscriber::receive() copies out the 8 bytes of relative pointer and translates the relative pointer into a process local absolute pointer (it is just a simply addition (relative pointer value + shared memory start address)
Sample::payload() dereferences the absolut pointer and you have access directly to the memory you have written in step 1.

Since both applications share the actual memory you do not need serialization if your data types are shared memory compatible.

network communication

Here is a comparison to network communication - I marked the (high runtime cost). I ignore here the serialization/deserialization steps (that are additional huge bottlenecks) and just assume we want to transmit something like an array of uint8_t.

Send copies the actual data into the buffer of every subscriber (high runtime cost). The larger the data is, the longer it takes. The more subscribers you have, the longer it takes. Additionally, you have to use syscalls here to establish the socket based communication (high runtime cost) - meaning the OS will interfere. - In iceoryx2 this is not the case since our pubsub communication works without syscalls and is handled completely with lock-free queues and shared memory -.
Receive on subscriber side will copy the data out of the buffer and into the user space (high runtime cost), which again takes time and costs a lot of memory. And every subscriber has to do this. So an optimization that ignores local subscribers absolutely makes sense! So you share only the data pointer with the subscriber, like iceoryx2 by default does.

Summary

For network communication, you have at least 2 additional copies, from publisher to subscriber buffer and from subscriber buffer to user space - and this ignores the overhead of serialization
For network communication, you have to utilize sys-calls that cause OS interference and context-switches
In iceoryx2, the pubsub communication does not require sys-calls at all
Zero-copy communication produces data only once and then distributes a pointer to all recipients. So no additional copies at all!

So my recommendation would be to look into the complex data types example since I think the biggest performance gain can be made by getting rid of serialization. Additionally, check if you utilize the iceoryx2 API efficiently by using Publisher::loan_uninit() and by producing your data directly into shared memory you can access via SampleMut::payload_mut(). I think the overhead of checking the publisher id on the subscriber side is in the single-digit nanoseconds.

elfenpiff · 2024-11-25T10:32:00Z

@smileghp

If you want to know how expensive even a single copy is you can use our benchmark that comes with a --send-copy option.

With copy and a payload size of 40960 bytes:

cargo run --bin benchmark-publish-subscribe --release -- --bench-all -p 40960 --send-copy
# iceoryx2::service::ipc::Service ::: Iterations: 10000000, Time: 9.739591612, Latency: 486 ns, Sample Size: 40960

Without copy:

cargo run --bin benchmark-publish-subscribe --release -- --bench-all -p 40960 --send-copy
# iceoryx2::service::ipc::Service ::: Iterations: 10000000, Time: 2.7868404509999998, Latency: 139 ns, Sample Size: 40960

I measured both things on my laptop. So a unnecessary copy of 40kb increases the latency by a factor of 4.

smileghp added the bug Something isn't working label Nov 22, 2024

elBoberido added the needs info A bug report is waiting for more information label Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable intra-process ,Only use inter-process communication #523

Disable intra-process ,Only use inter-process communication #523

smileghp commented Nov 22, 2024

elBoberido commented Nov 22, 2024

elfenpiff commented Nov 24, 2024 •

edited

Loading

smileghp commented Nov 25, 2024

smileghp commented Nov 25, 2024

smileghp commented Nov 25, 2024

elfenpiff commented Nov 25, 2024

elfenpiff commented Nov 25, 2024

Disable intra-process ,Only use inter-process communication #523

Disable intra-process ,Only use inter-process communication #523

Comments

smileghp commented Nov 22, 2024

elBoberido commented Nov 22, 2024

elfenpiff commented Nov 24, 2024 • edited Loading

smileghp commented Nov 25, 2024

smileghp commented Nov 25, 2024

smileghp commented Nov 25, 2024

elfenpiff commented Nov 25, 2024

zero-copy communication in iceoryx2

network communication

Summary

elfenpiff commented Nov 25, 2024

elfenpiff commented Nov 24, 2024 •

edited

Loading