Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[10.4 stable] Stop trying to move config from read-only config partition #3629

Merged
merged 1 commit into from
Dec 1, 2023

Conversation

milan-zededa
Copy link
Contributor

@milan-zededa milan-zededa commented Nov 27, 2023

Backport of #3625 to 10.4

There is some prehistoric code in the onboot script that allows to pass initial hardwaremodel, rebootConfig and restartcounter files via /config partition. However, as this partition became read-only, EVE is no longer able to remove them once applied:

move /config/rebootConfig /persist/status
mv: cannot remove '/config/rebootConfig': Read-only file system

But while the file cannot be removed, mv command will manage to copy its content to /persist. But this means that on every boot the current value in /persist is overwritten with this initial content from /config. This is very dangerous especially for rebootConfig. Device will end up in endless reboot cycle: on every boot it will replace the last applied reboot counter with whatever the initial value was set in /config, which results in counter mismatch wrt. the reboot counter from device config received from the controller.

I'm not aware of any practical value of setting initial values of these files through /config, therefore I think this functionality can be completely removed.

Signed-off-by: Milan Lenco [email protected]
(cherry picked from commit c56cdb9)

There is some prehistoric code in the onboot script that allows to pass
initial hardwaremodel, rebootConfig and restartcounter files via /config
partition. However, as this partition became read-only, EVE is no longer
able to remove them once applied:

move /config/rebootConfig /persist/status
mv: cannot remove '/config/rebootConfig': Read-only file system

But while the file cannot be removed, mv command will manage to copy its
content to /persist. But this means that on every boot the current value
in /persist is overwritten with this initial content from /config.
This is very dangerous especially for rebootConfig. Device will end up
in endless reboot cycle: on every boot it will replace the last applied
reboot counter with whatever the initial value was set in /config,
which results in counter mismatch wrt. the reboot counter from device
config received from the controller.

I'm not aware of any practical value of setting initial values of these
files through /config, therefore I think this functionality can be
completely removed.

Signed-off-by: Milan Lenco <[email protected]>
(cherry picked from commit c56cdb9)
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eriknordmark
Copy link
Contributor

@milan-zededa this has failed eden 3 times (with zfs and ext4 when using tpm). Looking at the last one in https://github.com/lf-edge/eve/actions/runs/7009464560 there is a zedrouter.touch watchdog in the log. Can you take a look?

@milan-zededa
Copy link
Contributor Author

@milan-zededa this has failed eden 3 times (with zfs and ext4 when using tpm). Looking at the last one in https://github.com/lf-edge/eve/actions/runs/7009464560 there is a zedrouter.touch watchdog in the log. Can you take a look?

Yes, I will investigate.

@rouming
Copy link
Contributor

rouming commented Nov 29, 2023

@milan-zededa once merged, please don't forget to target the 11-stable as well. I did not do that in this PR #3644 due to your previous comment (#3629 (comment))

@milan-zededa
Copy link
Contributor Author

@milan-zededa this has failed eden 3 times (with zfs and ext4 when using tpm). Looking at the last one in https://github.com/lf-edge/eve/actions/runs/7009464560 there is a zedrouter.touch watchdog in the log. Can you take a look?

It turned out that the root issue is this fatal error that we see from time to time:

fatal: agent zedbox[1699]: couldn't initialize containerd (this should not happen): initContainerdClient: could not create containerd client. failed to dial "/run/containerd/containerd.sock": write unix @->/run/containerd/containerd.sock: use of closed network connection. Exiting.

I have not seen this with newer runners and it never happens on physical devices it seems (does not appear in production logs), but I will keep an eye on it.
In any case, the reboot is not related to this PR.

@eriknordmark eriknordmark merged commit 4467e49 into lf-edge:10.4-stable Dec 1, 2023
15 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants