Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] gonetworkmanager.NewNetworkManager() get stuck on the boot #4643

Open
kvaps opened this issue Oct 21, 2024 · 3 comments
Open

[BUG] gonetworkmanager.NewNetworkManager() get stuck on the boot #4643

kvaps opened this issue Oct 21, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@kvaps
Copy link
Contributor

kvaps commented Oct 21, 2024

Kube-OVN Version

v1.12.19

Kubernetes Version

v1.30.3

Operation-system/Kernel Version

OS-IMAGE         KERNEL-VERSION
Talos (v1.8.1)   6.6.54-talos

Description

Hey I faced that some issue that kube-ovn get stuck on the boot.
I spent a few hours for debugging and just wanted to report an issue.

I found out that it stucks on calling this function:

manager, err := gonetworkmanager.NewNetworkManager()

Steps To Reproduce

Someone may face with this on Talos Linux

Current Behavior

kube-ovn-cni pods do not become to ready state, due to readiness probe, logs say:

I1021 21:17:51.672195    7025 ovs_linux.go:591] wait ovn0 gw ready
I1021 21:17:51.675748    7025 ovs.go:50] 100.64.0.3 network ready after 1 ping, gw 100.64.0.1

Expected Behavior

I expect message:

I1021 21:17:51.788534    7025 cniserver.go:83] start daemon controller

and kube-ovn-cni become to ready

@kvaps kvaps added the bug Something isn't working label Oct 21, 2024
Copy link

dosubot bot commented Oct 21, 2024

Hey @kvaps! I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this issue together!

The issue you're experiencing with gonetworkmanager.NewNetworkManager() getting stuck during boot on Talos Linux could be related to dependencies such as the DBus connection or the NetworkManager service. The function relies on a DBus connection to communicate with the NetworkManager service, and if there are issues with the DBus service or if the NetworkManager service is not running, it could cause the function to get stuck [1].

Additionally, the readiness probe for kube-ovn-cni pods checks the operational state of the ovn0 network interface and network connectivity to the gateway. If these checks fail, the pods will not reach a ready state [2].

To address this, ensure that the NetworkManager service is running and that there are no issues with the DBus connection on your Talos Linux setup. You might also want to verify the configuration specific to Talos Linux, such as declaring the openvswitch module and using the correct Helm chart options [3].

To continue talking to Dosu, mention @dosu.

@zhangzujian
Copy link
Member

Could you please provide the Talos os image?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants