Sync RTU sequential reading failure #180

gustavowd · 2023-04-03T01:36:28Z

If i read sequentially two the sets of registers, the second read never return.

For example:
let rsp1 = ctx.read_holding_registers(32000, 20).unwrap();
let rsp2 = ctx.read_holding_registers(32300, 40).unwrap();

A simple thread sleep of 10 miliseconds between readings solve the problem. However, I beleive this issue should be solved.

uklotzde · 2023-04-03T09:17:14Z

What's the outcome if you use libmodbus instead of tokio-modbus for this particular device and invocation sequence?

gustavowd · 2023-04-03T11:34:08Z

It works with libmodbus. Maybe its the modbus RTU interframe space of 3,5 char that is not implemented in tokio-modbus.

uklotzde · 2023-04-05T09:21:43Z

For reference: 2.5.1.1 MODBUS Message RTU Framing

uklotzde · 2023-04-05T09:31:12Z

Adding a timeout unconditionally before and/or after each request would be a waste of time. Only the client knows if it will send a single request or a sequence of requests that require the insertion of silence intervals. The duration may also vary from device to device.

Ideas how to make this more convenient for clients through the API are welcome.

domenicquirl · 2024-11-19T07:38:47Z

We just hit this issue with a different setup: successive requests to a single device were working fine, but due to the missing inter-frame silence the library did not allow communicating with multiple devices on the same RTU bus. With just the single device, either side would know when its done sending by controlling the send buffer and receiving by looking at the function code and, where applicable, length field of the frame. A second request immediately after the response is therefore easily processed by the server because it knows that it is done responding.

However, if you add a second server device which listens on the same bus, then that device has no additional information / state that tells it when a response ends. It could of course decode the full frame and also inspect function code and length, but that requires processing and also the frame might be invalid, the length might not actually be correct, etc. and so it has to safeguard against that anyways.

So instead, devices will often listen for new data on the bus and only look at the slave ID at the beginning of the new frame. If that doesn't match, they know the frame is not addressing them, so they just wait for the bus to clear before they start listening again for the next frame which might be directed at them.

The problem is now that, because the tokio-modbus decoder also uses function code and length to immediately decode when sufficient bytes have been read, without the inter-frame pause the bus doesn't actually clear. So if I request data from one device and then another, the second device will see the response from the first device, ignore it, and think that my second request to it is still part of that response (which may then be malformed) because at no point did the data stream stop to indicate a delimiter between the two frames. The second request will thus be ignored as well and get no response, which in our case means running into a timeout.

The problem is exacerbated if you both regularly read from the devices and sometimes write to them, because presumably you might spend at least some processing time on the result of the read, while even if you synchronize access to the connection Context a write may happen immediately after a read releases the connection and vice versa.

Adding a timeout unconditionally before and/or after each request would be a waste of time. Only the client knows if it will send a single request or a sequence of requests that require the insertion of silence intervals. The duration may also vary from device to device.

Ideas how to make this more convenient for clients through the API are welcome.

So after spending a significant amount of time to track this down, I definitely think that this behaviour should be prominently displayed in the documentation. As evidenced by this thread, this is not an issue with other Rust Modbus crates and we've also cross-tested our hardware with libraries in other languages, such as Python's PyModbus, all of which handle this inside the library and thus "just work" no matter the setup. Since connecting multiple devices is also kind of the point of a bus, I think it's fair to say that the way tokio-modbus currently works here is not what one might expect having used any other Modbus RTU implementation and the documentation currently just links to the Modbus RTU spec, suggesting that the crate follows the spec, which includes the delay as part of the section which you already linked above.

In my personal opinion, the current behaviour is quite a big footgun: it subverts expectations (in other words: the behaviour is surprising) and at least in our case this issue was also very painful to debug - it only occurred when connecting multiple devices to the same bus, manifested differently for different devices (some would respond sometimes, some not at all, some almost always but would then fail), was dependent on the connection settings in use (like the baudrate), and then further it also didn't occur or was less prominent when using different devices as the client, because the different client devices have different processing speed and the request only gets lost when the second request is actually sent fast enough.

So regarding any API choices that are made around this, I think it would be much better if the default were to include the inter-frame silence interval (for RTU connections). This would make the crate "just work" like all the others and tbh I doubt that most applications even notice the delay (it's at most ~4ms at 9600 baud).

If you're really worried about performance, you could still expose a configuration on the client to explicitly disable automatic pausing after requests (and / or adjust the pause time, though I think the spec is quite clear here on how to calculate the default). This would also provide an additional place to document this behaviour and what to look out for when switching to "manual" / "single-shot" mode in the function on the client. But again, I think it's much better to not have the crate's defaults potentially cause long debugging sessions after which people end up re-doing the same calculation in user code and manually insert breaks after their requests. That's just frustrating and not very ergonomic as opposed to maybe losing out on the last bit of perf, which they could still get by using the library at a lower level.

flosse · 2024-11-19T11:40:38Z

@domenicquirl could you summarize yourself again in one sentence?

domenicquirl · 2024-11-19T12:16:39Z

@domenicquirl could you summarize yourself again in one sentence?

@flosse one might be difficult, but I'll try to give a TLDR:

This issue is also a problem when trying to communicate with multiple devices on one RTU bus, because devices not actively communicating may not actually process a frame but try to ignore it / wait for it to end.
Not respecting the inter-frame silence (by default) inside the library is surprising because it doesn't "just work" and other libraries (in Rust and other languages) do not do this.
It is also hard to debug, because it's a timing problem that depends a lot on the specific setup.
Imo tokio-modus should therefore by default handle frame delimiters so it "just works" with any setup and maybe allow to turn that off. If it doesn't, it should be documented somewhere loud that the user is supposed to be responsible for handling this, because currently the docs don't mention it (afaict).

That all with a lot of gratitude for making and maintaining the crate, which works very well for TCP and also for RTU if you know to insert frame breaks yourself ^^

uklotzde · 2024-11-19T12:48:01Z

My take:

An RTU frame only ends after the inter-frame silence interval expired and does not depend on the actual number of bytes received.

Hopefully someone is able to pick this up and provide a fix.

domenicquirl · 2024-11-19T13:13:22Z

An RTU frame only ends after the inter-frame silence interval expired and does not depend on the actual number of bytes received.

I very much agree with that view. At the same time, this might be difficult to integrate with the current implementation approach of using Decoders, which only provides you with the bytes that have been read so far and asynchronously fills that buffer - you would have to work with the underlying serial stream directly to be able to additionally take the time on the reads (and I can tell you from experience that even on an embedded device the timings don't always match up exactly: there is latency from the bus to the hardware, from that to the driver, etc.). I've also not thought a lot about how this would work with your current retry / drop bytes mechanism.

As a first step, would you be comfortable with assuming that the decoder returns "as soon as possible" - i.e., immediately after all expected bytes have been received, modulo some minimal time required to actually decode the frame - and always adding a short sleep afterwards @uklotzde? This could probably be made RTU-only by sticking it into service::rtu::Client.

uklotzde · 2024-11-24T11:24:41Z

As a first step, would you be comfortable with assuming that the decoder returns "as soon as possible" - i.e., immediately after all expected bytes have been received, modulo some minimal time required to actually decode the frame - and always adding a short sleep afterwards @uklotzde? This could probably be made RTU-only by sticking it into service::rtu::Client.

I didn't think about how a fix could be implemented and do not plan to work on this in my spare time.

It might be insightful to investigate how others are handling this.

flosse added the rtu label May 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync RTU sequential reading failure #180

Sync RTU sequential reading failure #180

gustavowd commented Apr 3, 2023

uklotzde commented Apr 3, 2023

gustavowd commented Apr 3, 2023

uklotzde commented Apr 5, 2023 •

edited

Loading

uklotzde commented Apr 5, 2023 •

edited

Loading

domenicquirl commented Nov 19, 2024

flosse commented Nov 19, 2024

domenicquirl commented Nov 19, 2024

uklotzde commented Nov 19, 2024

domenicquirl commented Nov 19, 2024

uklotzde commented Nov 24, 2024 •

edited

Loading

Sync RTU sequential reading failure #180

Sync RTU sequential reading failure #180

Comments

gustavowd commented Apr 3, 2023

uklotzde commented Apr 3, 2023

gustavowd commented Apr 3, 2023

uklotzde commented Apr 5, 2023 • edited Loading

uklotzde commented Apr 5, 2023 • edited Loading

domenicquirl commented Nov 19, 2024

flosse commented Nov 19, 2024

domenicquirl commented Nov 19, 2024

uklotzde commented Nov 19, 2024

domenicquirl commented Nov 19, 2024

uklotzde commented Nov 24, 2024 • edited Loading

uklotzde commented Apr 5, 2023 •

edited

Loading

uklotzde commented Apr 5, 2023 •

edited

Loading

uklotzde commented Nov 24, 2024 •

edited

Loading