Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dnslookupfamily returns ipv6 addresses for external clusters (oidc) #4744

Closed
zetaab opened this issue Nov 20, 2024 · 13 comments · Fixed by #4745
Closed

dnslookupfamily returns ipv6 addresses for external clusters (oidc) #4744

zetaab opened this issue Nov 20, 2024 · 13 comments · Fixed by #4745
Labels
kind/bug Something isn't working

Comments

@zetaab
Copy link
Contributor

zetaab commented Nov 20, 2024

Description:

What issue is being seen? Describe what should be happening instead of
the bug, for example: Envoy should not crash, the expected value isn't
returned, etc.

I compiled new version from latest master and our OIDC is now broken.

[2024-11-20 07:57:48.870][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:391] dns resolution for cognito-idp.eu-central-1.amazonaws.com started
[2024-11-20 07:57:48.876][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:308] dns resolution for cognito-idp.eu-central-1.amazonaws.com completed with status 0
[2024-11-20 07:57:48.876][1][debug][upstream] [source/common/upstream/upstream_impl.cc:484] transport socket match, socket default selected for host with address [2a05:d014:32e:701:9334:4719:42de:9263]:443
[2024-11-20 07:57:48.876][1][debug][upstream] [source/common/upstream/upstream_impl.cc:484] transport socket match, socket default selected for host with address [2a05:d014:32e:700:f4dc:9de:938f:1329]:443
[2024-11-20 07:57:48.876][1][debug][upstream] [source/common/upstream/upstream_impl.cc:484] transport socket match, socket default selected for host with address [2a05:d014:32e:702:b316:2916:8253:ddff]:443
[2024-11-20 07:57:48.876][1][debug][upstream] [source/extensions/clusters/strict_dns/strict_dns_cluster.cc:201] DNS refresh rate reset for cognito-idp.eu-central-1.amazonaws.com, refresh rate 30000 ms

Like can be seen our oidc now tries to use ipv6. However, we do not have ipv6 connectivity in our cluster at all

example interfaces

/home/curl_user $ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
3: eth0@if49: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 8910 qdisc noqueue state UP qlen 1000
    link/ether 6e:6c:a3:39:7c:a5 brd ff:ff:ff:ff:ff:ff
    inet 100.125.159.107/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6c6c:a3ff:fe39:7ca5/64 scope link 
       valid_lft forever preferred_lft forever

Repro steps:

Include sample requests, environment, etc. All data and inputs
required to reproduce the bug.

  1. compile latest main
  2. deploy it to ipv4 cluster and use oidc provider which do have ipv6 record
  3. oidc will be broken because it cannot fetch jwks

Note: If there are privacy concerns, sanitize the data prior to
sharing.

#4740 is perhaps the PR that is breaking this

Environment:

Include the environment like gateway version, envoy version and so on.

Logs:

Include the access logs and the Envoy logs.

@zetaab zetaab added the triage label Nov 20, 2024
@zhaohuabing
Copy link
Member

zhaohuabing commented Nov 20, 2024

This was introduced by #4740

Auto prioritizes IPv6 over IPv4.
EG should respect the IPFamily configuration in the EnvoyProxy, a resaonable DNS lookup strategy probably would be:

  • V4_ONLY for default/IPv4
  • V6_ONLY for IPv6
  • Auto for dualstack

If AUTO is specified, the DNS resolver will first perform a lookup for addresses in the IPv6 family and fallback to a lookup for addresses in the IPv4 family.

https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/cluster/v3/cluster.proto#envoy-v3-api-enum-value-config-cluster-v3-cluster-dnslookupfamily-auto

cc @zirain

@zhaohuabing zhaohuabing added kind/bug Something isn't working and removed triage labels Nov 20, 2024
@zetaab
Copy link
Contributor Author

zetaab commented Nov 20, 2024

I reverted PR #4740 from main, and now my OIDC is back working again.

So AUTO is not correct if the oidc do have ipv6 dns records and cluster do have only ipv4. Funny that envoyproxy does not check the interfaces that it has, it just cannot work like this. IMO envoyproxy should also fallback to ipv4 with AUTO setting because it does not have ipv6 interface

It looks like it is possible to change the behaviour with https://github.com/envoyproxy/gateway/blob/main/api/v1alpha1/envoyproxy_types.go#L149

However, that says

// If not specified, the system will operate as follows:
// - It defaults to IPv4 only.

that is not true now.

@zirain
Copy link
Contributor

zirain commented Nov 20, 2024

I recall it's designed for listener address, we maybe need another knob for your case.

@zirain
Copy link
Contributor

zirain commented Nov 20, 2024

a work around would be create a envoyproxy with IPFamily IPv4 and point to gatewayclass or gateway

@zhaohuabing
Copy link
Member

zhaohuabing commented Nov 20, 2024

I recall it's designed for listener address, we maybe need another knob for your case.

I think we can use the current IPFamily in the EnvoyProxy for both the listener and DNS lookup IPFamily. The below behavior would be sufficent for most of the use cases as the IPFamily of the Gateway Listener and the Gateway pod is typically consistent in most environments.

// IPFamily specifies the IP family for the EnvoyProxy fleet.
// This setting affects the Gateway listener port and the DNS resolver for the EnvoyProxy fleet.
// - IPv4 Gateway will listen on IPv4 addresses only, and the DNS resolver will resolve to IPv4 addresses only.
// - IPv6 Gateway will listen on IPv6 addresses only, and the DNS resolver will resolve to IPv6 addresses only.
// - DualStack Gateway will listen on both IPv4 and IPv6 addresses, and the DNS resolver will prefer IPv6 addresses over IPv4 addresses.
// - If unspecified, the default IP family is IPv4.
IPFamily *IPFamily json:"ipFamily,omitempty"

A dedicated configuration knob for DNS lookup family can be added later if people ask for it.

@zirain
Copy link
Contributor

zirain commented Nov 20, 2024

@zetaab can you try with V4_PREFERRED as default value on your cluster?

@zirain
Copy link
Contributor

zirain commented Nov 20, 2024

3b26516 passed on CI.

@arkodg
Copy link
Contributor

arkodg commented Nov 20, 2024

+1 to V4_PREFERRED as default to maintain backwards compatibility

@zhaohuabing
Copy link
Member

@zirain @arkodg I think V4_PREFERRED won't work for IPv6 env where the envoy pod only has an IPv6 address.

If V4_PREFERRED is specified, the DNS resolver will first perform a lookup for addresses in the IPv4 family and fallback to a lookup for addresses in the IPv6 family.

@alrai
Copy link
Contributor

alrai commented Nov 21, 2024

I encountered the following error in a pod deployed by the Gateway:

$ kubectl logs -f envoy-envoy-gateway-envoy-gateway-9dbc5803-66c67d8d54-pvmgb -n envoy-gateway
Defaulted container "envoy" out of: envoy, shutdown-manager
[2024-11-18 18:42:12.465][1][warning][misc] [source/extensions/filters/network/http_connection_manager/config.cc:88] internal_address_config is not configured. The existing default behaviour will trust RFC1918 IP addresses, but this will be changed in next release. Please explictily config internal address config as the migration step or config the envoy.reloadable_features.explicit_internal_address_config to true to untrust all ips by default
[2024-11-18 18:42:27.573][1][warning][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:130] gRPC config: initial fetch timed out for type.googleapis.com/envoy.config.cluster.v3.Cluster
[2024-11-18 18:42:42.573][1][warning][config] [source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:130] gRPC config: initial fetch timed out for type.googleapis.com/envoy.config.listener.v3.Listener
[2024-11-18 18:42:50.159][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:226] DeltaAggregatedResources gRPC config stream to xds_cluster closed since 37s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: immediate connect error: Network is unreachable|remote address:[2a02:6b8::242]:18000
[2024-11-18 18:43:01.488][1][warning][config] [./source/extensions/config_subscription/grpc/grpc_stream.h:226] DeltaAggregatedResources gRPC config stream to xds_cluster closed since 48s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: immediate connect error: Network is unreachable|remote address:[2a02:6b8::242]:18000

It tries to connect to some unknown IPv6 address even though I have a single-stack k8s cluster and all pods/services have only IPv4 addresses.

NAME                                                 TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)                                   AGE
service/envoy-envoy-gateway-envoy-gateway-9dbc5803   LoadBalancer   10.43.89.115   93.125.75.111   443:31726/TCP,8443:32345/TCP              46h
service/envoy-gateway                                ClusterIP      10.43.48.170   <none>          18000/TCP,18001/TCP,18002/TCP,19001/TCP   2d

Is that error caused by the same issue?

@zetaab
Copy link
Contributor Author

zetaab commented Nov 21, 2024

@alrai yes, its same issue

@zirain
Copy link
Contributor

zirain commented Nov 22, 2024

@zetaab can you try with #4745?

@zetaab
Copy link
Contributor Author

zetaab commented Nov 22, 2024

I can but in next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants