Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

teensy4.1, client.read() without calling client.availble() causes crashes and hangs. Big issues with ArduinoWebsockets #36

Open
curtwelch opened this issue Sep 1, 2024 · 5 comments

Comments

@curtwelch
Copy link

I see this was talked about, but this needs to be fixed.

If you try to use ArduinoWebsockets on a Teensy4.1 that is built on NativeEthernet, when a client connection to a server is made, and the server returns a message too large to fit in a single Ethernet Placket (over 1500 bytes range), Everything crashes and burns very quickly. The Teensy4.1 crashes or can goe into a busy loop hang. It responds to pings in the loop mode but the main thread will lock in a busy loop.

I wrote my issue here: gilmaimon/ArduinoWebsockets#166

This weekend I tracked it down to problems here in NativeEithernet.

The first basic bug is that you should be able to call client.read() without calling client.accept() (per API definition) but NativeEithernet has bad bugs if you make that mistake. WebSockets makes that mistake if the message it is trying to read takes up more than one ethernet packet. It unfolds like this in the WebSocket code:

client.listen() is called and when it returns > 0 the code knows there's a new WebSocket message to read.
they call client.read() to read the 2-byte WebSocket header, then if needed for extended-sized packets, they call client.read() again, to read the extra bytes of the extended header. I don't remember if they call accept before that second read(). But that's not the bug that bit me.

Then the WebSocket finds out it's got 2000 bytes of data to read for the WebSocket message it's trying to read, and it hangs in a loop calling client.read() until it gets all 2000 bytes without every calling client.available() again. But client.available() only returned something like 1500 bytes in that first call because it had only received the first ethernet packet. But the web socket code just keeps calling read() without calling available() before every read so it reads all the data from the first packet, and when _aviable has dropped to zero, then they call read() again.

Your client.read() (in NativeEthernetClient.cpp) is this:

254 int EthernetClient::read(uint8_t *buf, size_t size)
255 {
256.   if(_remaining > 0) if (size > (size_t)_remaining) size = _remaining;
257.    if (sockindex >= Ethernet.socket_num) return 0;
258.   int16_t ret = Ethernet.socketRecv(sockindex, buf, size);
259    if (ret > 0) _remaining -= ret;
260    return ret;
261 }

This code allows you to call read() with _remaining == 0. It checks if the amount to read (512) in the WebSocket code is larger than _remaining, and it's not so it does nothing. It just does a socketRev() of size 512.

In this case socketRev() returns 512, even though available() was not called because by this point the second Ethernet packet has been received apparently.

But then line 259 subtracts 512 from 0, and assigns -512 to _remaining.

In available() the code chckes to see if _remainng has dropped to zero, and if it does it fills the buffer again:

if(_remaining == 0) {
#if FNET_CFG_TLS
        if(EthernetServer::_tls[sockindex]){
            fnet_socket_recvfrom(Ethernet.socket_ptr[sockindex], Ethernet.socket_buf_receive[sockindex], Ethernet.socket_size, MSG_PEEK, &_from, &fromlen);
            ret = fnet_tls_socket_recv(EthernetServer::tls_socket_ptr[sockindex], Ethernet.socket_buf_receive[sockindex], Ethernet.socket_size);
            Ethernet.socket_buf_index[sockindex] = 0;
        }
        else{
            ret = fnet_socket_recvfrom(Ethernet.socket_ptr[sockindex], Ethernet.socket_buf_receive[sockindex], Ethernet.socket_size, 0, &_from, &fromlen);
            Ethernet.socket_buf_index[sockindex] = 0;
        }
#else
        ret = fnet_socket_recvfrom(Ethernet.socket_ptr[sockindex], Ethernet.socket_buf_receive[sockindex], Ethernet.socket_size, 0, &_from, &fromlen);
        Ethernet.socket_buf_index[sockindex] = 0;
#endif
    }

But since _remaining has gone negative, this test for == 0 is never true, so available() just returns -512 forever and it never tries to call any of the socket read functions.

There is some simple and obvious fixes to help this. But I don't know all the side effects at play, so I don't know how well this works for all conditions.

But to start, just do this near the top of EthernetClient::available():

    if (_remaining < 0)
        _remaining = 0;     // Bug protection

But read() is where the real bug happens, so it should be fixed to something like this:

// Read a buffer of data from the socket.
// Returns bytes read, or 0 when there is nothing to read.
//
int EthernetClient::read(uint8_t *buf, size_t size)
{
    if (_remaining <= 0 && size > 0) { // We must call available() to see if there is more data.
        auto ret = available(); // will update _remaining if it's > 0.
        if (ret <= 0)
            return 0;
    }

    if (_remaining <= 0)
        return 0;       // This check should not be required, but we do it for safety.

    if (_remaining > 0) {
        if (size > (size_t)_remaining) {
            size = _remaining;      // reduce size to only read what is avaible.
        }
    }
    
	if (sockindex >= Ethernet.socket_num) return 0;

    int16_t ret = Ethernet.socketRecv(sockindex, buf, size);

    if (ret > 0) { // Never reduce _remaining to less than 0.
        if (ret >= _remaining)
            _remaining = 0;
        else
            _remaining -= ret;
    }

    return ret;
}
// Read a single character.
// returns the character, or -1 when there is nothing to read.
int EthernetClient::read()
{
    uint8_t b;
    int16_t ret = Ethernet.socketRecv(sockindex, &b, 1);
    if (ret > 0) {
        _remaining -= ret;
        if (_remaining < 0)
            _remaining = 0;
        return b;
    }
	return -1;
}

The code above fixed the one specific issue I was having oj a teensy4.1 with ArudinoWebsockets but is not tested on any other platform nor is is it tested well enough for this.

I'm just sharing what I know currently about these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@curtwelch and others