Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Teensy4.1, Large WebSocket Messages received that require 2 ethernet packets will corrupt/crash/hang the processor. #166

Open
curtwelch opened this issue Aug 26, 2024 · 2 comments
Assignees

Comments

@curtwelch
Copy link

Describe the bug
When running in client mode, small WebSocket messages sent from the server to the client that can be sent in one ethernet packet work fine. But when the message size is large enough to force it to be split across two packets (around 1450 bytes), memory corruption from something, causes the system to become unstable.

Actually, a test I did with a packet that was only slightly larger than one ethernet packet seemed to work (overflow of only a few bytes into a second ethernet packet). But I didn't run it long to verify it keeps working.

My normal testing is sending WebSocet messages of length 2000 when I consistently see the bug.

I have traced this down into the WebSocket code to the extent that I see the bad data coming from the calls to read the TCP socket. So the TCP layer is returning corrupted data. I have not yet chased this into the TCP code. But it would be surprising to me if the TCP code could not correctly read data from a single TCP stream, so I assume some sort of interaction between these packages is causing a memory corruption issue.

To Reproduce
Running on a Teensy4.1 with NativeEthernet -- using the onboard ethernet chip.
AurdurinoWebSockets 0.5.4
NativeEthernet

The teens41-client.ino from Examples should show this behavior. Actually, let me run that exactly as it's distributed from the Aurdino IDE instead of modifying for it the vscode and verify the bug works there... Just confirmed the bug also shows up running the example code with a few mods.

I had to assign a static IP so I changed the call to Ethernet.begin(mac, IP); (added the IP). And I changed the connection URL string to match my server IP. Otherwise, the example code complied on the Arduino IDE shows the same bug.

The example code prints the message received as:

08:47:12.261 -> Connected to server ws://192.168.1.10:9000
08:47:13.018 -> Got Message: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijkl7I���/��\f������}�'�g�����������Tߖ�c��7�?�����ӟ��
08:47:13.018 -> 3�p^�m �� ���mM����q��1
08:47:13.018 -> ��C@�θ2%��Z��k_�o�}+G���_�����p�LC�k�҃��?�u+
08:47:13.018 -> �5$����v�.WYz��\�� d�,��܋)�_�Or�1�0L���ŀT�~TG���3����ր���-l��jټ�a��K ��H���E������P�x�A+>�����2�� +��P�!���7����O�4$����T�="�m���a(�

The actual message sent was just 2000 bytes of repeating letters and numbers. WireShark confirms that the server sent a valid 2000-byte WebSocket packet full of simple ASCII letters and numbers (In two TCP packets), but the Teensy read garbage for the end of the package. Generally it seems the first packet was read correctly, but data from the second packet is corrupted.

I'm compiling and loading using vwscode with PlatformIO on most my testing, but the Aurdino IDE produces the same behavior.

Expected behavior
The good behavior is that packets will be received as expected.

The bad behavior is that the system goes unstable quicky after trying to receive the muilt-packet WebSocket messages of length 2000.

Code
I shared code and information on this thread about this issue on the PJRC forum:

https://forum.pjrc.com/index.php?threads/teensy4-1-auduinowebsockets-nativeethernet-or-qnethernet-websocket-messages-being-received-over-around-1440-bytes-are-corrupted-and-crash-the-teensy.75700/

On the Aurdio test I just did here's the .ino code:

/*
  Teensy41 Websockets Client (using NativeEthernet)

  This sketch:
  1. Connects to a ethernet network
  2. Connects to a websockets server at port 80
  3. Sends the websockets server a message ("Hello Server")
  4. Prints all incoming messages while the connection is open

  Note:
  Make sure you share your computer's internet connection with the Teensy
  via ethernet.

  Libraries:
  To use this sketch install
  * TeensyID library (https://github.com/sstaub/TeensyID)
  * NativeEthernet (https://github.com/vjmuzik/NativeEthernet)

  Hardware:
  For this sketch you need a Teensy 4.1 board and the Teensy 4.1 Ethernet Kit
  (https://www.pjrc.com/store/ethernet_kit.html).
*/

#include <NativeEthernet.h>
#include <ArduinoWebsockets.h>
#include <TeensyID.h>
#include <SPI.h>

using namespace websockets;
WebsocketsClient client;

// We will set the MAC address at the beginning of `setup()` using TeensyID's
// `teensyMac` helper.
byte mac[6];

// Enter websockets url.
// Note: wss:// currently not working.
//const char* url  = "ws://echo.websocket.org";
const char* url  = "ws://192.168.1.10:9000"; // Curt Welch change
IPAddress ip(192, 168, 1, 100); // Curt Welch Add

void setup() {
  // Set the MAC address.
  teensyMAC(mac);

  // Start Serial and wait until it is ready.
  Serial.begin(9600);
  while (!Serial) {}

  // Connect to ethernet.
  Ethernet.begin(mac, ip); // Curt Welch add -- I needed a static IP
  
  // if (Ethernet.begin(mac)) { // Curt welch comment this block out
  //   Serial.print("Ethernet connected (");
  //   Serial.print(Ethernet.localIP());
  //   Serial.println(")");
  // } else {
  //   Serial.println("Ethernet failed");
  // }

  // Connect to websocket server.
  if (client.connect(url)) {
    Serial.printf("Connected to server %s\n", url);
    // Send welcome message.
    client.send("Hello Server");
  } else {
    Serial.println("Couldn't connect to server!");
  }

  // Run callback when messages are received.
  client.onMessage([&](WebsocketsMessage message) {
    Serial.print("Got Message: ");
    Serial.println(message.data());
  });	
}

void loop() {
  // Check for incoming messages.
  if (client.available()) {
    client.poll();
  }
}

The server that's sending it is a bun server running this code (teensyBug.ts) with the arguments below to send a message once a second (1000 ms) of 2000 bytes:

% bun teensyBug.ts 1000 2000

import { ServerWebSocket } from 'bun'

const frequency = parseInt(process.argv[2])
const characterCount = parseInt(process.argv[3])

if (isNaN(frequency) || isNaN(characterCount)) {
  console.log('Usage: bun teensyStress.ts <frequency>ms <characterCount>')
  process.exit(1)
}

let wes: ServerWebSocket

Bun.serve({
  fetch(req, server) {
    const success = server.upgrade(req)
    if (success) {
      return undefined
    }
    return new Response('Teensy Stress Test API')
  },
  websocket: {
    async message(ws, message) {
      //let msg = JSON.parse(message as string)
      console.log(message)
      wes = ws
      console.log('Connected to Teensy!')
    },
  },
  port: 9000,
})

setInterval(() => {
  if (!wes) return

  const characters =
    'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'

  let message = ''
  for (let i = 0; i < characterCount; i++) {
    // message += characters[Math.floor(Math.random() * characters.length)]
    message += characters[i % characters.length]

  }
  console.log("Send message: len is" + message.length + " message is: [" + message + "]")
  wes.send(message)
}, frequency)

Additional context
I'm testing on a MacBook Pro, with a USB ethernet cable plugged into the Teensy Ethernet jack. It's a direct Ethernet connection from the Mac to the Teensy, with no hubs or routers. I'm testing with static IP, so no DHCP assignments. I'm running the bun server in a terminal window.

Using WireShark to monitor the ethernet traffic shows nothing odd at play. But the Teensy, once it becomes corrupted, will stop ACKing the additional data, so the server stops sending it with a WINDOW FULL condition on the TCP stream.

The Teensy will however, keep responding to ICMP Pings from the mac after it hangs (most the time).

In my debugging of the WebServer, I see it hung trying to read more data from the socket in readUntilSuccessfullOrError().

The socket returns -1 on read, but socket.aviabale() is valid so it hangs in this loop forever. This happens after it has first returned corrupted data from a previous message. There is TCP data waiting to be read per the WireShark sniffing but the TCP layer thinks there's no more data., So the TCP layer has been confused by something getting corrupted. I have not chased this down into the TCP layer code yet.

I did notice other reports of large messages causing crashes so it seems to me this might be and old bug that most people never see because they don't tend to try and send large WebSocket messages.

If anyone can help with this, it would be appreciated!

Curt Welch
[email protected]

@curtwelch
Copy link
Author

I traced this down to bugs in NativeEhthernet.

See vjmuzik/NativeEthernet#36 for my description and potential fixes for that.

NativeEthernet requires you always call client.avaiable() before every call to client.read(). WebSockets fails to do this at times which is what trigger the above I reported here.

I made changes to fix NativeEthernet in my copy, but I'm still studying it and and testing.

The bug happens when WebSocket client receives a WebSocket message too large to fit in one Ethernet packet. Websocket calls client.aviable() which reads the data for the first part of the large WebSocket message in the first Etherhnet Packet.

But the code here learns it's got a 2000 byte (for example) and then just hangs calling client.read() until it gets all the data.

But after it reads all the data from the first packet, it needs to call client.avaiable() again, to get the rest of the data ready to read, but fails to do that, and ends up reading garbage from the TCP stream.

So this code could be improved, in theory, to make sure never to call client.read in NativeEthernet, unless client.avaiable() ahs returned > 0. I will attempt to try those changes later and share what I find.

@curtwelch
Copy link
Author

I figured out how to fix ArduinoWebSockets to work around the NativeEthernet Bug that created this problem. It's one line of code (and some comments for good measure):

At line 78 of:

https://github.com/gilmaimon/ArduinoWebsockets/blob/master/src/tiny_websockets/network/teensy41/teensy41_tcp_client.hpp

Change this:

    uint32_t read(uint8_t* buffer, const uint32_t len) override {
      yield();
      return client.read(buffer, len);
    }

to this:

    uint32_t read(uint8_t* buffer, const uint32_t len) override {
      yield();
      // For NativeEthernet Bug
      // Must always call available() to fill buffer before calling read() and
      // and only call read() if there is data to read. Will crash if you fail to do this.
      if (!client.available()) return 0; 
      return client.read(buffer, len);
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@curtwelch @gilmaimon and others