Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update customize image test to fix ci #1478

Merged
merged 3 commits into from
Jun 6, 2024

Conversation

prestist
Copy link
Contributor

@prestist prestist commented May 28, 2024

Seeing a failure regarding greps; need to find which one is failing; changed assert to see what is happening.

#1474 (comment)

@prestist
Copy link
Contributor Author

prestist commented May 28, 2024

Cool new context; Comparing notes with the logs and the expected checks.
The * are the asserts which are reporting incorrectly inside the check_dest()

[2024-05-28T17:57:22.885Z] 314: check_dest
[2024-05-28T17:57:22.885Z] 182: assert @applied-live-ign@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @applied-live-ign@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@applied-live-ign@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@applied-live-ign@' not found in log
[2024-05-28T17:57:22.885Z] 183: assert @applied-live-2-ign@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @applied-live-2-ign@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@applied-live-2-ign@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@applied-live-2-ign@' not found in log
[2024-05-28T17:57:22.885Z] 184: assert @applied-dest-ign@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @applied-dest-ign@ log
**[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@applied-dest-ign@'\'' not found in log'**
[2024-05-28T17:57:22.885Z] Assertion failed: '@applied-dest-ign@' not found in log
[2024-05-28T17:57:22.885Z] 185: assert @applied-dest-2-ign@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @applied-dest-2-ign@ log
[2024-05-28T17:57:22.885Z] 149: echo 'Assertion passed: '\''@applied-dest-2-ign@'\'' found in log'
[2024-05-28T17:57:22.885Z] Assertion passed: '@applied-dest-2-ign@' found in log
[2024-05-28T17:57:22.885Z] 186: assert @preinst-1@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @preinst-1@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@preinst-1@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@preinst-1@' not found in log
[2024-05-28T17:57:22.885Z] 187: assert @preinst-2@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @preinst-2@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@preinst-2@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@preinst-2@' not found in log
[2024-05-28T17:57:22.885Z] 188: assert @postinst-1@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @postinst-1@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@postinst-1@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@postinst-1@' not found in log
[2024-05-28T17:57:22.885Z] 189: assert @postinst-2@
[2024-05-28T17:57:22.885Z] 148: grep -Fq @postinst-2@ log
[2024-05-28T17:57:22.885Z] 151: echo 'Assertion failed: '\''@postinst-2@'\'' not found in log'
[2024-05-28T17:57:22.885Z] Assertion failed: '@postinst-2@' not found in log
[2024-05-28T17:57:22.885Z] 190: assert 'Adding "coreos-installer test certificate" to list of CAs'
[2024-05-28T17:57:22.885Z] 148: grep -Fq 'Adding "coreos-installer test certificate" to list of CAs' log
[2024-05-28T17:57:22.885Z] 149: echo 'Assertion passed: '\''Adding "coreos-installer test certificate" to list of CAs'\'' found in log'
[2024-05-28T17:57:22.885Z] Assertion passed: 'Adding "coreos-installer test certificate" to list of CAs' found in log

Expected Checks

check_dest() {
    ! assert @applied-live-ign@
    ! assert @applied-live-2-ign@
    assert @applied-dest-ign@
    assert @applied-dest-2-ign@
    ! assert @preinst-1@
    ! assert @preinst-2@
    ! assert @postinst-1@
    ! assert @postinst-2@
    assert 'Adding "coreos-installer test certificate" to list of CAs'
}

@prestist
Copy link
Contributor Author

Okay, looking more closely

The dest.ign/dest.bu in the fixtures has some conditionals which are currently failing.

You can see that in the logs from #1474

2024-05-23T21:15:30.721Z] [ 3.332472] NetworkManager[631]: <warn> [1716498930.6791] keyfile: load: "/run/NetworkManager/system-connections/nmstate-json-eth1.nmconnection": failed to load connection: invalid connection: connection.interface-name: 'nmstate-json-eth1': interface name is longer than 15 characters

This needs to exist.. but does not so consequently we fail.

@prestist prestist force-pushed the debug-image-tests branch from 926a3f2 to 31d2e78 Compare May 28, 2024 19:11
@prestist
Copy link
Contributor Author

#1479

@HuijingHei
Copy link
Member

HuijingHei commented May 30, 2024

Get CI failed in check_dest, but no hint why dest-ignition-applied.service fails.

[2024-05-29T21:10:25.669Z] [FAILED] Failed to start dest-ignition-applied.service - Dest Ignition Applied.

[2024-05-29T21:10:25.669Z] See 'systemctl status dest-ignition-applied.service' for details.

...
[2024-05-29T21:10:26.686Z] Checking assertion: @applied-dest-ign@
[2024-05-29T21:10:26.686Z] 229: assert @applied-dest-ign@
[2024-05-29T21:10:26.686Z] 148: grep -Fq @applied-dest-ign@ log
[2024-05-29T21:10:26.686Z] 152: echo 'Assertion failed: '\''@applied-dest-ign@'\'' not found in log'
[2024-05-29T21:10:26.686Z] Assertion failed: '@applied-dest-ign@' not found in log
[2024-05-29T21:10:26.686Z] 153: return 1
[2024-05-29T21:10:26.686Z] 229: echo 'Failed assertion: @applied-dest-ign@'
[2024-05-29T21:10:26.686Z] Failed assertion: @applied-dest-ign@
[2024-05-29T21:10:26.686Z] 229: return 1

@prestist prestist force-pushed the debug-image-tests branch from be04223 to 6471ec1 Compare May 30, 2024 20:30
@prestist
Copy link
Contributor Author

@HuijingHei Is there a way to repo this locally? I am not super familiar with these image tests.

@travier
Copy link
Member

travier commented May 31, 2024

Looks like it's running this script: https://github.com/coreos/coreos-installer/blob/main/.cci.jenkinsfile#L47

@HuijingHei
Copy link
Member

grep failed with ExecStart=/bin/grep -qz 'serial --unit=0 --speed=115200 --word=8 --parity=no.terminal_input serial.terminal_output serial.#' /boot/grub2/grub.cfg in dest-ignition-applied.service dest.bu, no serial line in grub.cfg, if remove this line, CI passed.

Get grub.cfg content with add ExecStart=/bin/cat /boot/grub2/grub.cfg before grep, confirm that does not have serial content.

The problem is need to confirm if we still need the serial grep.

# This file is copied from https://github.com/coreos/coreos-assembler/blob/main/src/grub.cfg
# Changes:
#   - Dropped Ignition glue, that can be injected into platform.cfg
# petitboot doesn't support -e and doesn't support an empty path part
if [ -d (md/md-boot)/grub2 ]; then
  # fcct currently creates /boot RAID with superblock 1.0, which allows
  # component partitions to be read directly as filesystems.  This is
  # necessary because transposefs doesn't yet rerun grub2-install on BIOS,
  # so GRUB still expects /boot to be a partition on the first disk.
  #
  # There are two consequences:
  # 1. On BIOS and UEFI, the search command might pick an individual RAID
  #    component, but we want it to use the full RAID in case there are bad
  #    sectors etc.  The undocumented --hint option is supposed to support
  #    this sort of override, but it doesn't seem to work, so we set $boot
  #    directly.
  # 2. On BIOS, the "normal" module has already been loaded from an
  #    individual RAID component, and $prefix still points there.  We want
  #    future module loads to come from the RAID, so we reset $prefix.
  #    (On UEFI, the stub grub.cfg has already set $prefix properly.)
  set boot=md/md-boot
  set prefix=($boot)/grub2
else
  if [ -f ${config_directory}/bootuuid.cfg ]; then
    source ${config_directory}/bootuuid.cfg
  fi
  if [ -n "${BOOT_UUID}" ]; then
    search --fs-uuid "${BOOT_UUID}" --set boot --no-floppy
  else
    search --label boot --set boot --no-floppy
  fi
fi
set root=$boot

if [ -f ${config_directory}/grubenv ]; then
  load_env -f ${config_directory}/grubenv
elif [ -s $prefix/grubenv ]; then
  load_env
fi

if [ -f $prefix/console.cfg ]; then
  # Source in any GRUB console settings if provided by the user/platform
  source $prefix/console.cfg
fi

if [ x"${feature_menuentry_id}" = xy ]; then
  menuentry_id_option="--id"
else
  menuentry_id_option=""
fi

function load_video {
  if [ x$feature_all_video_module = xy ]; then
    insmod all_video
  else
    insmod efi_gop
    insmod efi_uga
    insmod ieee1275_fb
    insmod vbe
    insmod vga
    insmod video_bochs
    insmod video_cirrus
  fi
}

# Other package code will be injected from here
source $prefix/40_coreos-ignition.cfg
source $prefix/70_coreos-user.cfg
if [ x$feature_timeout_style = xy ] ; then
  set timeout_style=menu
  set timeout=1
# Fallback normal timeout code in case the timeout_style feature is
# unavailable.
else
  set timeout=1
fi

# Import user defined configuration
# tracker: https://github.com/coreos/fedora-coreos-tracker/issues/805
if [ -f $prefix/user.cfg ]; then
  source $prefix/user.cfg
fi

blscfg

@HuijingHei
Copy link
Member

HuijingHei commented May 31, 2024

@HuijingHei Is there a way to repo this locally?

See Timothée's comment, what I did for testing:

# build coreos-installer first

$ cosa shell
$ sudo rm /usr/bin/coreos-installer
$ sudo rsync -rlv coreos-installer/install/usr/ /usr/

mkdir fcos; cd fcos
cosa init https://github.com/coreos/fedora-coreos-config.git
cosa fetch
# copy built coreos-installer to overrides/rootfs
cosa build
cosa buildextend-metal && cosa buildextend-metal4k && cosa buildextend-live --fast

cosa compress --fast --artifact=metal4k

# comment other tests and keep customize.sh in images.sh
COREOS_INSTALLER_TEST_INSTALLED_BINARY=1 tests/images.sh /srv/fcos/builds/latest/x86_64 customize.sh

Edit

@prestist
Copy link
Contributor Author

@travier Thank you!
@HuijingHei wow good job! okay thank you; I will repro it locally. and I will see about the serial bit. I really want to know what the sudden change is; It does not look like this has been touched in 2 years ~

@HuijingHei
Copy link
Member

I really want to know what the sudden change is; It does not look like this has been touched in 2 years ~

Good question, not sure it is because the change of coreos/fedora-coreos-tracker#1671

@prestist
Copy link
Contributor Author

So debuging locally; I have been able to repo it and I ended up doing a cosa run qemu and messed with the image and creating my own unit with the same exec-start. It failed of course, I think we have two actionable steps.

  1. update the grub.cfg here with
serial --unit=0 --speed=115200 --word=8 --parity=no
terminal_input serial
terminal_output serial

  1. remove the serial bit from the unit as @HuijingHei tested.

I still cannot tell why it has become suddenly broken; as all changes seem to be from months or more ago but it is what it is.

@HuijingHei
Copy link
Member

HuijingHei commented Jun 3, 2024

So debuging locally; I have been able to repo it and I ended up doing a cosa run qemu and messed with the image and creating my own unit with the same exec-start. It failed of course, I think we have two actionable steps.

  1. update the grub.cfg here with
serial --unit=0 --speed=115200 --word=8 --parity=no
terminal_input serial
terminal_output serial

I do not think this will work.
We are building image using osbuild by default (see coreos/coreos-assembler#3800) now, with bootupd --with-static-configs to write file /boot/grub2/grub.cfg, and it includes source console.cfg (see coreos/bootupd#619), console.cfg was created by osbuild/osbuild#1589, which was from https://github.com/coreos/fedora-coreos-config/blob/testing-devel/platforms.yaml#L149, but it was not changed for 2 years, I am not sure when the change is.

cat console.cfg

# Any non-default console settings will be inserted here.
# CONSOLE-SETTINGS-START
serial --speed=115200
terminal_input serial console
terminal_output serial console
# CONSOLE-SETTINGS-END

@prestist
Copy link
Contributor Author

prestist commented Jun 3, 2024

So debuging locally; I have been able to repo it and I ended up doing a cosa run qemu and messed with the image and creating my own unit with the same exec-start. It failed of course, I think we have two actionable steps.

  1. update the grub.cfg here with
serial --unit=0 --speed=115200 --word=8 --parity=no
terminal_input serial
terminal_output serial

I do not think this will work. We are building image using osbuild by default (see coreos/coreos-assembler#3800) now, with bootupd --with-static-configs to write file /boot/grub2/grub.cfg, and it includes source console.cfg (see coreos/bootupd#619), console.cfg was created by osbuild/osbuild#1589, which was from https://github.com/coreos/fedora-coreos-config/blob/testing-devel/platforms.yaml#L149, but it was not changed for 2 years, I am not sure when the change is.

cat console.cfg

# Any non-default console settings will be inserted here.
# CONSOLE-SETTINGS-START
serial --speed=115200
terminal_input serial console
terminal_output serial console
# CONSOLE-SETTINGS-END

Wow, I had no idea about this. I need to read up on OSBuild. The error is consistent with serial not being in the grub config which lead me there, I need to research how the console.cfg gets layered into the grub.cfg

@prestist
Copy link
Contributor Author

prestist commented Jun 4, 2024

@HuijingHei so this makes sense, somthing must be responsable for running the grub2-mkconfig to create the new grub.cfg right?

Locally running fcos and cat grub.cfg | grep serial results in nil

From my understanding the grub.cfg should be updated with the composed information from the link you provided?

@HuijingHei
Copy link
Member

HuijingHei commented Jun 5, 2024

Not quite sure about grub2-mkconfig, do we use it during installation?

Inspired by @mike-nguyen, build with legacy cosa using COSA_USE_OSBUILD=0 cosa build-xxx, the CI passed, get serial line in grub.cfg (with ExecStart=/bin/cat /boot/grub2/grub.cfg before grep), seems there might be something different with ISO with install ?

serial --unit=0 --speed=115200 --word=8 --parity=no
terminal_input serial
terminal_output serial

when I check on qcow2.disk using cosa run, the serial line in /boot/grub2/grub.cfg is

serial --speed=115200
terminal_input serial console
terminal_output serial console

@prestist
Copy link
Contributor Author

prestist commented Jun 5, 2024

Not quite sure about grub2-mkconfig, do we use it during installation?

It would have to be before we are booted.

[core@cosa-devsh grub2]$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
/usr/sbin/grub2-mkconfig: line 268: /boot/grub2/grub.cfg.new: Read-only file system

Inspired by @mike-nguyen, build with legacy cosa using COSA_USE_OSBUILD=0 cosa build-xxx, the CI passed, get serial line in grub.cfg (with ExecStart=/bin/cat /boot/grub2/grub.cfg before grep), seems there might be something different with ISO with install ?

Yeah I think so. I suspect that we have an issue going on there. : we could revert the change and go back to doing both until we know why ? It would get us passing without failing ci here.

serial --unit=0 --speed=115200 --word=8 --parity=no
terminal_input serial
terminal_output serial

when I check on qcow2.disk using cosa run, the serial line in /boot/grub2/grub.cfg is

serial --speed=115200
terminal_input serial console
terminal_output serial console

Thats from the old build method right? well we know the change now. The builder is not doing grub2-mkconfig.

@jlebon wdyt we should do? does osbuilder have the ability to mount and run grub2-mkconfig ?

@jlebon
Copy link
Member

jlebon commented Jun 5, 2024

The context here is that when building with OSBuild, how GRUB configs are set up is different. You can find more details in coreos/fedora-coreos-tracker#1671 and the links there, but the TL;DR is that (1) a stub grub.cfg is written by bootupd which in turn sources console.cfg and (2) osbuild creates console.cfg in the org.osbuild.coreos.platform stage. coreos-installer was accordingly taught to write to console.cfg if found when using the --console option: #1360, #1423.

So... I think all we need here is

diff --git a/fixtures/customize/dest.bu b/fixtures/customize/dest.bu
index 1e17035..219422a 100644
--- a/fixtures/customize/dest.bu
+++ b/fixtures/customize/dest.bu
@@ -23,7 +23,7 @@ systemd:
         [Service]
         Type=oneshot
         RemainAfterExit=true
-        ExecStart=/bin/grep -qz 'serial --unit=0 --speed=115200 --word=8 --parity=no.terminal_input serial.terminal_output serial.#' /boot/grub2/grub.cfg
+        ExecStart=/bin/grep -qz 'serial --unit=0 --speed=115200 --word=8 --parity=no.terminal_input serial.terminal_output serial.#' /boot/grub2/grub.cfg /boot/grub2/console.cfg
         ExecStart=/bin/echo @applied-dest-ign@
         StandardOutput=tty

(and regenerate dest.ign as well).

I'll let one of you verify that this fixes the tests and open a PR for it since you did most of the work to investigate this (nice work!). :)

@prestist
Copy link
Contributor Author

prestist commented Jun 5, 2024

@jlebon Yep it works locally going to make a commit and push. Thank you!

@prestist prestist force-pushed the debug-image-tests branch 2 times, most recently from 20f0915 to 7a9b187 Compare June 5, 2024 20:01
@prestist prestist added the skip-notes This PR does not need release notes label Jun 5, 2024
@prestist prestist changed the title Debug-image-tests Update customize image test to fix ci Jun 5, 2024
tests/images/customize.sh Show resolved Hide resolved
tests/images/customize.sh Outdated Show resolved Hide resolved
tests/images/customize.sh Outdated Show resolved Hide resolved
fixtures/customize/dest.bu Outdated Show resolved Hide resolved
@prestist prestist force-pushed the debug-image-tests branch 3 times, most recently from 1c9c027 to a9a6439 Compare June 5, 2024 20:55
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor tweaks, otherwise LGTM.
First commit could use a commit message (what warnings?).

tests/images/customize.sh Outdated Show resolved Hide resolved
tests/images/customize.sh Outdated Show resolved Hide resolved
prestist added 3 commits June 5, 2024 17:13
NetworkManager produces a warning if an interface name exceeds 15 chars,
resulting in errors such as: 'failed to load connection: invalid connection
'nmstate-yaml-eth2': interface name is longer than 15 characters'. To reduce
these warnings, update network interface names to ensure they are within this
character limit.
With the changeover to OSBuild, GRUB configs are handled slightly
diffrently. Update butane and ignition files to reflect that.

fixes coreos#1479
@prestist prestist force-pushed the debug-image-tests branch from a9a6439 to 9e2933d Compare June 5, 2024 21:24
@prestist prestist merged commit 189f342 into coreos:main Jun 6, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip-notes This PR does not need release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants