Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[10.4 stable] Timeout calling dhcpcd #3481

Merged
merged 3 commits into from
Oct 10, 2023

Conversation

rouming
Copy link
Contributor

@rouming rouming commented Oct 6, 2023

These are commits originally submitted by @christoph-zededa to the master branch, which target the issue, which was observed on the 9.7 version. I would like to merge them to the 10.4 stable in order to ask the customer to switch to more recent and stable version.

Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

But it would be nice to have the commit messages refer to the commit hashes in master to make it easier to compare. FWIW I use git cherry-pick -x which does that automatically.

Add a timeout when calling dhcpcd in case it hangs; this
prevents the whole DPCManager from hanging and ultimatevely
making the watchdog fire and reboot the system.

Signed-off-by: Christoph Ostarek <[email protected]>
(cherry picked from commit e628c59)
before a timeout of 1000 seconds was used which is longer than
the watchdog waits (currently 500 seconds)

also switch to time.Duration to make time durations clearer

Signed-off-by: Christoph Ostarek <[email protected]>
(cherry picked from commit 193147f)
for command executions that run in separate worker thread:
set timeout to 1000 seconds (as it was before)

for direct command executions:
set timeout to 400 seconds in order to not trigger the watchdog
and therefore a reboot

Signed-off-by: Christoph Ostarek <[email protected]>
(cherry picked from commit 4240dd3)
@rouming rouming force-pushed the 10.4-stable-timeout-calling-dhcpcd branch from be38e83 to 0bfcabc Compare October 6, 2023 14:23
@rouming
Copy link
Contributor Author

rouming commented Oct 6, 2023

Difference to the previous version:

  • Add original sha to the cherry-picked commits

@eriknordmark
Copy link
Contributor

Hmm - the jenkins ztests job is failing on this, and for some reason the Eden pipeline does not run on the release/patch branches. Needs to be investigated.

@rouming
Copy link
Contributor Author

rouming commented Oct 9, 2023

@eriknordmark I do not see any failures. How can I check those?

@rouming
Copy link
Contributor Author

rouming commented Oct 9, 2023

Jenkins fails because eve can't unlock vault. Why eden has not been started - I have no idea, since I so not see any errors. I asked @yash-zededa , probably he can help.

What I did here is to reverted those 3 patches (basically make this PR bogus) in order to get better understanding will something pass or not.

@rouming rouming force-pushed the 10.4-stable-timeout-calling-dhcpcd branch from 54cb7f4 to e569873 Compare October 9, 2023 09:29
@yash-zededa yash-zededa self-requested a review October 9, 2023 10:55
@eriknordmark
Copy link
Contributor

I manually kicked of a jenkins run for this (now empty) PR.

yash-zededa
yash-zededa previously approved these changes Oct 9, 2023
@eriknordmark
Copy link
Contributor

Tried the image manuall on sc-supermicro-zc2 and get the same failure (attestation completes successfully but vault does not get unlocked). Will try 10.4.2-kvm-amd64 as well.

@eriknordmark
Copy link
Contributor

I've tried this PR build and various 10.4.X builds.
The conclusion is that the xen variant of this PR and 10.4.3-xen-amd64 fails, but 10.4.1-xen-amd64 and 10.4.2-xen-amd64 are OK. All kvm variants work, including of this PR.
So there must be some interaction between the PRs which went into https://github.com/lf-edge/eve/releases/tag/10.4.3 somehow causes Xen to not unlock the vault.

@rouming
Copy link
Contributor Author

rouming commented Oct 10, 2023

@eriknordmark so we have a xen regression between 10.4.2 and 10.4.3. I will take a look.

@rouming rouming force-pushed the 10.4-stable-timeout-calling-dhcpcd branch from e569873 to 0bfcabc Compare October 10, 2023 08:15
@rouming
Copy link
Contributor Author

rouming commented Oct 10, 2023

Difference to the previous version:

  • Removed all reverted changes, so now the PR description corresponds to actual commits.

@eriknordmark
Copy link
Contributor

We should merge this, but we do need to figure out why 9.4.3-xen-amd64 has issues with the vault.

@rouming rouming merged commit c7b73ea into lf-edge:10.4 Oct 10, 2023
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants