Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use inventory_hostname in ceph related roles and play #2164

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

cjeanner
Copy link
Collaborator

@cjeanner cjeanner commented Jul 25, 2024

Until now, the ceph playbook and associated roles were trying to be
smart, but it created many other issues, and continues to create weird
situations.

With this patch, we reverse the burden, and base everything on the
inventory.
Since the inventory is generated by the Framework, and the Framework
also sets the virtual machines hostname, we should be able to rely on
the inventory_hostname only.

This means the inventory must be consistent, if provided by external
means.

We therefore update/correct the hostname set on the various nodes to
match the inventory.

As a pull request owner and reviewers, I checked that:

  • Appropriate testing is done and actually running
  • Appropriate documentation exists and/or is up-to-date:
    • README in the role

Testing (not in commit message)

  • va-hci (locally tested)
  • delta-v6 (downstream TP)

Copy link
Contributor

openshift-ci bot commented Jul 25, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@cjeanner cjeanner force-pushed the ceph/inventory_hostname branch from 76c1593 to 652bd20 Compare July 25, 2024 15:01
@fultonj fultonj requested review from fmount and katarimanojk July 25, 2024 15:05
@fultonj
Copy link
Contributor

fultonj commented Jul 25, 2024

Thank you Cedric. This is a nice clean up. Let's make sure it doesn't break anything (e.g. IPv6) before merging.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/e52d0ade349f406ba7974d7ac8b052a0

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 09m 56s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 12m 52s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 18m 43s
podified-multinode-hci-deployment-crc FAILURE in 57m 22s
✔️ adoption-standalone-to-crc-ceph-provider SUCCESS in 2h 31m 33s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 11s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 47s
✔️ cifmw-molecule-cifmw_cephadm SUCCESS in 4m 25s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0b6ec7322e6a45f889c1bf84b2fecd30

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 44m 13s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 12m 59s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 20m 37s
podified-multinode-hci-deployment-crc FAILURE in 54m 53s
✔️ adoption-standalone-to-crc-ceph-provider SUCCESS in 2h 28m 39s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 30s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 21s
✔️ cifmw-molecule-cifmw_cephadm SUCCESS in 5m 27s

@cjeanner
Copy link
Collaborator Author

It's failing in multinode-hci-deployment job:
ERROR: hostname "compute-0.ci-rdo.local" does not match expected hostname "compute-0"

When checking the source of cephadm, we can see the inventory doesn't reflect the actual hostname:
https://github.com/ceph/ceph/blob/7fe91d5d5842e04be3b4f514d6dd990c54b29c76/src/cephadm/cephadm.py#L8128-L8133

To correct that, we have 2 ways:

  • either ensure the inventory passed to ceph.yml is consistent (need to see how it's built, I suppose it's passing the zuul inventory as-is)
  • or modify the compute node hostname as a pre-task during the job preparation.

Having consistency between hostnames and inventory names sounds like the right thing imho. "just" need to make that consistency a realty.

@cjeanner cjeanner force-pushed the ceph/inventory_hostname branch 3 times, most recently from 73d10f9 to 67f904a Compare July 26, 2024 11:36
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/8634ce8acafe456da6cb97c49eb12740

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 49m 09s
podified-multinode-edpm-deployment-crc POST_FAILURE in 1h 18m 37s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 18m 46s
✔️ podified-multinode-hci-deployment-crc SUCCESS in 1h 42m 30s
✔️ adoption-standalone-to-crc-ceph-provider SUCCESS in 2h 32m 01s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 05s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 24s
✔️ cifmw-molecule-cifmw_cephadm SUCCESS in 4m 22s

Until now, the ceph playbook and associated roles were trying to be
smart, but it created many other issues, and continues to create weird
situations.

With this patch, we reverse the burden, and base everything on the
inventory.
Since the inventory is generated by the Framework, and the Framework
also sets the virtual machines hostname, we should be able to rely on
the `inventory_hostname` only.

This means the inventory must be consistent, if provided by external
means.

We therefore update/correct the hostname set on the various nodes to
match the inventory.
@cjeanner cjeanner force-pushed the ceph/inventory_hostname branch from 12b4780 to d1f8d7b Compare August 6, 2024 08:42
@cjeanner cjeanner marked this pull request as ready for review August 6, 2024 14:15
Copy link
Contributor

@fultonj fultonj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Copy link
Contributor

openshift-ci bot commented Aug 6, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fultonj

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Aug 6, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 9a4a904 into main Aug 6, 2024
6 checks passed
@openshift-merge-bot openshift-merge-bot bot deleted the ceph/inventory_hostname branch August 6, 2024 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants