Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to add OSD #161

Open
keuko opened this issue Dec 23, 2024 · 1 comment · May be fixed by #162
Open

Failed to add OSD #161

keuko opened this issue Dec 23, 2024 · 1 comment · May be fixed by #162

Comments

@keuko
Copy link

keuko commented Dec 23, 2024

Hi,

I really don't know what's the issue but when I am deploying to virtuals with cinder volumes, from time to time (quite often) it's failing to add osd.

TASK [stackhpc.cephadm.cephadm : Add OSDs individually] ***********************************************************************************************************************************************************
failed: [ceph2] (item=/dev/sdb) => {"ansible_loop_var": "item", "changed": true, "cmd": ["cephadm", "shell", "--", "ceph", "orch", "daemon", "add", "osd", "ceph2:/dev/sdb"], "delta": "0:00:12.391850", "end": "2024-12-22 23:59:10.466541", "item": "/dev/sdb", "msg": "non-zero return code", "rc": 1, "start": "2024-12-22 23:58:58.074691", "stderr": "Using ceph image with id '2bc0b0f4375d' and tag 'v18' created on 2024-07-23 22:19:35 +0000 UTC\nquay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c3172b0b23b37906\nError initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')", "stderr_lines": ["Using ceph image with id '2bc0b0f4375d' and tag 'v18' created on 2024-07-23 22:19:35 +0000 UTC", "quay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c3172b0b23b37906", "Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')"], "stdout": "", "stdout_lines": []}
failed: [ceph2] (item=/dev/sdc) => {"ansible_loop_var": "item", "changed": true, "cmd": ["cephadm", "shell", "--", "ceph", "orch", "daemon", "add", "osd", "ceph2:/dev/sdc"], "delta": "0:00:15.046005", "end": "2024-12-22 23:59:26.111456", "item": "/dev/sdc", "msg": "non-zero return code", "rc": 1, "start": "2024-12-22 23:59:11.065451", "stderr": "Using ceph image with id '2bc0b0f4375d' and tag 'v18' created on 2024-07-23 22:19:35 +0000 UTC\nquay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c3172b0b23b37906\nError initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')", "stderr_lines": ["Using ceph image with id '2bc0b0f4375d' and tag 'v18' created on 2024-07-23 22:19:35 +0000 UTC", "quay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c3172b0b23b37906", "Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')"], "stdout": "", "stdout_lines": []}

But always it's passing on second time ...can you please add retry for task "Add OSDs individually"

@keuko
Copy link
Author

keuko commented Dec 23, 2024

It's totally random, disks are dd cleared ..everythihng should be ok.

@keuko keuko closed this as completed Dec 23, 2024
@keuko keuko reopened this Dec 23, 2024
keuko added a commit to keuko/ansible-collection-cephadm that referenced this issue Dec 23, 2024
This change ensures the `Add OSDs individually`
task is retried up to 3 times with a 10-second delay
between attempts if the Ceph orchestrator command fails
(non-zero return code). This enhances task resilience
by allowing transient issues to resolve before marking
the operation as failed.

Resolves stackhpc#161
@keuko keuko linked a pull request Dec 23, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant