Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ClientId is invalid" error after automatic reconnection to broker #446

Open
tobyfoo opened this issue Nov 22, 2023 · 4 comments
Open

"ClientId is invalid" error after automatic reconnection to broker #446

tobyfoo opened this issue Nov 22, 2023 · 4 comments
Labels
bug This issue is a bug. p2 This is a standard priority issue

Comments

@tobyfoo
Copy link

tobyfoo commented Nov 22, 2023

Describe the bug

Connecting to AWS IOT using iot.AwsIotMqtt5ClientConfigBuilder.newWebsocketMqttBuilderWithSigv4Auth works fine on the initial connection.

After 24h the broker closed the socket (this is normal and expected) so the client tries to reconnect but fails, forever, with an error: reasonString CONNACK:ClientId is invalid:..., reasonCode 133.

Some investigation:

  • Logging the initial connection event's connack the clientId it looks something like assignedClientIdentifier: '$GEN/8b42676c-3300-4b4d-8c4f-9b75a17f7999' while the failed attempts' event connack states reasonString: 'CONNACK:ClientId is invalid:c91911b9-d401-8fe2-261c-e4374f470aff'. I'm not sure where the $GEN/ is coming from but maybe it's a clue.

Expected Behavior

The re-connection succeeds, just like the initial connection.

Current Behavior

The Mqtt5Client fires a connectionFailure event on re-connection and tries to reconnect forever. Restarting the process/client helps and the new initial connection succeeds immediately.

Reproduction Steps

  • Make sure AWS credentials (like AWS_SECRET_ACCESS_KEY in the env) are present and have the policy AWSIoTDataAccess granted.
  • Put your AWS IoT core endpoint url into the env at BROKER_ENDPOINT
import { iot, mqtt5 } from 'aws-iot-device-sdk-v2';

async function init() {
  const builder = iot.AwsIotMqtt5ClientConfigBuilder.newWebsocketMqttBuilderWithSigv4Auth(
    process.env.BROKER_ENDPOINT as string,
    { region: 'eu-central-1' },
  );
  builder.withConnectProperties({ keepAliveIntervalSeconds: 120 });
  const client = new mqtt5.Mqtt5Client(builder.build());

  client.on('error', (err) => { console.log('MQTT error', err); });
  client.on('attemptingConnect', () => { console.log('Attempting Connect event'); });
  client.on('connectionSuccess', (eventData: mqtt5.ConnectionSuccessEvent) => { console.log('Connection Success event', eventData); });
  client.on('connectionFailure', (eventData: mqtt5.ConnectionFailureEvent) => { console.log('Connection failure event', eventData); });

  client.start();
}

init().catch(console.log);

Notice the initial connection succeeding via the console log of connectionSuccess.

Now wait 24h for the broker to close the websocket 😄

...or here is a method to force-close the TCP socket like this (under linux). Maybe there is an easier way by disconnecting the network interface but I haven't tried that.

  • netstat -tnp to find the PID of the node process connected to AWS IoT (foreign address should be port 443 as we are using mqtt over ws)
  • lsof -np $PID where $PID is the PID found via the previous step. Look for the fileDescriptor column, or "FD". Remember the FD of the TCP connection to the broker (the one connected to remote port 443)
  • gdb -p $PID where $PID is from step 1
  • in the gdb console run call close($FD) where $FD is the fileDescriptor obtained by running lsof
  • type quit to exit gdb
  • wait max 2 minutes for the client to heartbeat and realize the socket is closed and attampt to reconnect

Here is my log output, attached as log.txt file.
log.txt

Possible Solution

I read in the docs that the client id is regenerated on re-connection, which is precisely what we want, but maybe it is malformed. Also, the $GEN/ part in the initial connection's id might be a clue.

Additional Information/Context

As for authentication I tried both Fargate task roles and simple access keys in a local test environment, same behavior. The assigned policy was AWSIoTDataAccess which is pretty permissive, so I think I've ruled out an authorization issue.

SDK version used

1.17.0

Environment details (OS name and version, etc.)

Docker container running node:20.9 image, Linux host OS.

@tobyfoo tobyfoo added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 22, 2023
@bretambrose
Copy link
Contributor

This is a bug in IoT Core. I am putting together a ticket to them right now.

@bretambrose bretambrose removed the needs-triage This issue or PR still needs to be triaged. label Nov 22, 2023
@bretambrose
Copy link
Contributor

As a followup, until there's resolution, please just generate uuids locally and use them rather than leaving client id empty.

@jmklix jmklix added the p1 This is a high priority issue label Nov 22, 2023
@tobyfoo
Copy link
Author

tobyfoo commented Nov 23, 2023

Thanks @bretambrose for your quick reply and identifying the issue. Using a locally generated UUID seems to have fixed the problem, thanks for that. I'm not sure if I should close this issue or you want it to stay open.

For anyone else facing the issue, here is some code for the workaround:

import { v4 as uuidv4 } from 'uuid';

// ...
builder.withConnectProperties({ keepAliveIntervalSeconds: 120, clientId: uuidv4() });
// ...

@bretambrose
Copy link
Contributor

Let's keep it open until the overall issue is resolved. We don't know yet if IoT Core is willing to try and change the behavior here (it's not compliant with the MQTT5 spec, but sometimes you need to bend the rules a bit when scaling a service to the size of AWS). If they fix it great, otherwise we need to disable the ability to use auto-assigned topic aliasing from the SDK (likely by using uuid for client id if none is provided).

@jmklix jmklix added p2 This is a standard priority issue and removed p1 This is a high priority issue labels Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

3 participants