Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS Route handshake failure for OpenShift route #6143

Open
swb2-izu-ssp opened this issue Nov 18, 2024 · 2 comments
Open

TLS Route handshake failure for OpenShift route #6143

swb2-izu-ssp opened this issue Nov 18, 2024 · 2 comments
Labels
defect Suspected defect such as a bug or regression

Comments

@swb2-izu-ssp
Copy link

swb2-izu-ssp commented Nov 18, 2024

Observed behavior

Hello,

I am trying to put in place the following pattern: Stretch cluster, coming from https://www.synadia.com/blog/multi-cluster-consistency-models/

To do so, servers re-unification dispatched over several paases must be done within a single cluster.

The solution I have found so far is to use openshift route.

What I have tried before using route, is to deploy servers on different namespaces, but same paas, and use the headless service only for the route urls.

 routes: 
  - tls://nats.nats-stretch-1.svc.cluster.local:6222
  - tls://nats.nats-stretch-2.svc.cluster.local:6222
  - tls://nats.nats-stretch-3.svc.cluster.local:6222

it is working fine.

So I tried to extend this with openshift routes (with tls termination passthrough + routing to port 6222) only for the remote connectivity (locally, I am using Pod+headless service) and still same configuration (different namespaces and same paas)

 routes: 
  - tls://stretch-cluster-0.nats.nats-stretch-3.svc.cluster.local:6222
  - tls://stretch-cluster-1.nats.nats-stretch-3.svc.cluster.local:6222
  - tls://stretch-cluster-2.nats.nats-stretch-3.svc.cluster.local:6222
  - tls://nats-nats-stretch-1.mydomain.com:443
  - tls://nats-nats-stretch-2..mydomain.com:443

I started having issue with TLS because it is trying to connect with IP at the end, and not the hostname I have provider

[7] 2024/11/18 16:33:21.883403 [DBG] Attempting reconnect for solicited route "nats-route://IP_A:6222/"
[7] 2024/11/18 16:33:21.887386 [DBG] IP_A:6222 - rid:15540 - Starting TLS route client handshake
[7] 2024/11/18 16:33:21.889737 [ERR] IP_A:6222 - rid:15540 - TLS route handshake error: tls: failed to verify certificate: x509: cannot validate certificate for IP_A because it doesn't contain any IP SANs
[7] 2024/11/18 16:33:21.889766 [INF] IP_A:6222 - rid:15540 - Router connection closed: TLS Handshake Failure

So it seems, with route, it is not passing tlsName for tls validation. And of course, my certificate does not contain Ip adresses.
I would suspect this piece of code: https://github.com/nats-io/nats-server/blob/main/server/client.go#L5905
But this is a fast and lazy check, I have done...

Expected behavior

I would expect to see it works also, same as headless service usage.

Server and client version

Service: 2.10.21

Host environment

Linux

Steps to reproduce

No response

@swb2-izu-ssp swb2-izu-ssp added the defect Suspected defect such as a bug or regression label Nov 18, 2024
@wallyqs
Copy link
Member

wallyqs commented Nov 19, 2024

This will be a side effect from the way that cluster discovery works, the error would still show in the logs but would fade out after some time. The important part is to make sure that all the nodes have the same extra routes to avoid partitions, so if you change the configuration maps from both clusters to include the explicit routes and then issue a config reload you would have the mesh being formed.

@swb2-izu-ssp
Copy link
Author

Hello @wallyqs
Many thanks for your answer.

The configuration map of the 3 parts looks like this

PART 1

 routes: 
  - tls://stretch-cluster-0.nats.nats-stretch-1.svc.cluster.local:6222
  - tls://stretch-cluster-1.nats.nats-stretch-1.svc.cluster.local:6222
  - tls://stretch-cluster-2.nats.nats-stretch-1.svc.cluster.local:6222
  - tls://nats-nats-stretch-2.mydomain.com:443
  - tls://nats-nats-stretch-3..mydomain.com:443

PART 2

 routes: 
  - tls://stretch-cluster-0.nats.nats-stretch-2.svc.cluster.local:6222
  - tls://stretch-cluster-1.nats.nats-stretch-2.svc.cluster.local:6222
  - tls://stretch-cluster-2.nats.nats-stretch-2.svc.cluster.local:6222
  - tls://nats-nats-stretch-1.mydomain.com:443
  - tls://nats-nats-stretch-3..mydomain.com:443

PART 3

 routes: 
  - tls://stretch-cluster-0.nats.nats-stretch-3.svc.cluster.local:6222
  - tls://stretch-cluster-1.nats.nats-stretch-3.svc.cluster.local:6222
  - tls://stretch-cluster-2.nats.nats-stretch-3.svc.cluster.local:6222
  - tls://nats-nats-stretch-1.mydomain.com:443
  - tls://nats-nats-stretch-2..mydomain.com:443

As a matter of fact, I made it works, by disabling the no_adveritise key.
So, in a configuration, where I am deploying the 3 namespaces on same paas, this works.

But as soon, as I deployed the 3 namespaces on 3 different paases, I now have cluster un-stabilities.
image

This is varying if I run several time the nats list server cmd.
What I am missing?

Nicolas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

2 participants