Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xapi not coming back on new master after pool.designate_new_master, old master loses TLS verification #6056

Open
ydirson opened this issue Oct 14, 2024 · 1 comment

Comments

@ydirson
Copy link
Contributor

ydirson commented Oct 14, 2024

In this XCP-ng forum subthread, a user reports a pool breakage, which I was able to replicate as follows (not 100% repro, I got it on 2nd attempt after joining a 3rd host):

  • setup a pool with 3 hosts
  • run xe pool-designate-new-master against the same host-uuid several times rapidly enough to run after DESIGNATE_NEW_MASTER_IN_PROGRESS stops to be returned but apparently before xapi realizes that host is already the master
[15:19 xcp-ng-hqerhcgv ~]# for i in $(seq 3); do time xe pool-designate-new-master host-uuid=a46b65ee-6b9f-4b73-9792-7c968c0c7f20; done

real	0m24.878s
user	0m0.010s
sys	0m0.019s

real	0m4.389s
user	0m0.000s
sys	0m0.034s
Lost connection to the server.

real	0m2.931s
user	0m0.004s
sys	0m0.030s
[15:20 xcp-ng-hqerhcgv ~]# 

The Lost connection to the server. is likely key, as I saw it first when I got the issue.

Forum post has user's full logs, the following are selected from my reproduction.

daemon.log on the expected-new-master shows a failure to restart xapi successfully, looping a few times on:

Oct 14 15:20:59 xcp-ng-hqerhcgv xapi-init[1244028]: Stopping xapi: [  OK  ]
Oct 14 15:20:59 xcp-ng-hqerhcgv systemd[1]: Unit xapi.service entered failed state.
Oct 14 15:20:59 xcp-ng-hqerhcgv systemd[1]: xapi.service failed.
Oct 14 15:20:59 xcp-ng-hqerhcgv systemd[1]: xapi.service holdoff time over, scheduling restart.
Oct 14 15:20:59 xcp-ng-hqerhcgv systemd[1]: Cannot add dependency job for unit lvm2-activation.service, ignoring: Unit is masked.
Oct 14 15:20:59 xcp-ng-hqerhcgv systemd[1]: Cannot add dependency job for unit lvm2-activation-early.service, ignoring: Unit is masked.
Oct 14 15:20:59 xcp-ng-hqerhcgv systemd[1]: Starting XenAPI server (XAPI)...
Oct 14 15:20:59 xcp-ng-hqerhcgv systemd[1]: Started XenAPI server (XAPI).
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi-init[1244047]: Starting xapi:
Oct 14 15:21:00 xcp-ng-hqerhcgv systemd[1]: xapi.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Oct 14 15:21:00 xcp-ng-hqerhcgv xapi-init[1244078]: Stopping xapi: [  OK  ]
...
Oct 14 15:21:04 xcp-ng-hqerhcgv systemd[1]: start request repeated too quickly for xapi.service
Oct 14 15:21:04 xcp-ng-hqerhcgv systemd[1]: Failed to start XenAPI server (XAPI).
Oct 14 15:21:04 xcp-ng-hqerhcgv systemd[1]: Unit xapi.service entered failed state.
Oct 14 15:21:04 xcp-ng-hqerhcgv systemd[1]: xapi.service failed.
Oct 14 15:21:18 xcp-ng-hqerhcgv message-switch[641]: main: [ info|message-switch] Session xapi:1240697 cleaning up
Oct 14 15:21:30 xcp-ng-hqerhcgv message-switch[641]: main: [ info|message-switch] Session xapi:1244047 cleaning up
Oct 14 15:21:31 xcp-ng-hqerhcgv message-switch[641]: main: [ info|message-switch] Session xapi:1244092 cleaning up
Oct 14 15:21:32 xcp-ng-hqerhcgv message-switch[641]: main: [ info|message-switch] Session xapi:1244140 cleaning up
Oct 14 15:21:33 xcp-ng-hqerhcgv message-switch[641]: main: [ info|message-switch] Session xapi:1244193 cleaning up
Oct 14 15:21:34 xcp-ng-hqerhcgv message-switch[641]: main: [ info|message-switch] Session xapi:1244247 cleaning up

with xensource.log showing:

Oct 14 15:20:57 xcp-ng-hqerhcgv xapi: [debug||225 ||xapi_pool_transition] The old master has not restarted yet. Sleep for 3 seconds
Oct 14 15:20:57 xcp-ng-hqerhcgv xapi: [debug||115 HTTPS 10.1.132.1->:::80|host.request_backup D:1d75ded0bc49|xapi] About to flush database: /var/lib/xcp/state.db
Oct 14 15:20:57 xcp-ng-hqerhcgv xapi: [ info||115 HTTPS 10.1.132.1->:::80|host.request_backup D:1d75ded0bc49|redo_debug] Flushing database to all active redo-logs
Oct 14 15:20:57 xcp-ng-hqerhcgv xapi: [debug||115 HTTPS 10.1.132.1->:::80|host.request_backup D:1d75ded0bc49|sql] XML backend [/var/lib/xcp/state.db] -- Write buffer flushed. Time: 0.093666
Oct 14 15:20:57 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xapi_pool_transition] attempting manual two-phase commit of new master. My address = 10.1.132.2; 
peer addresses = [ 10.1.132.1; 10.1.132.3 ]
Oct 14 15:20:57 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xapi_pool_transition] Phase 1: proposing myself as new master
Oct 14 15:20:57 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xapi_pool_transition] Proposing myself as a new master to host address: 10.1.132.1
Oct 14 15:20:57 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] client cert verification 10.1.132.1:443: SNI=pool path=/etc/stunnel/xapi-pool-ca-bundle.
pem
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] stunnel start
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] Started a client (pid:1243969): -> 10.1.132.1:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xmlrpc_client] stunnel pid: 1243969 (not cached) connected to 10.1.132.1:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] client cert verification 10.1.132.1:443: SNI=pool path=/etc/stunnel/xapi-pool-ca-bundle.
pem
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] stunnel start
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] Started a client (pid:1243973): -> 10.1.132.1:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xmlrpc_client] stunnel pid: 1243973 (not cached) connected to 10.1.132.1:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] client cert verification 10.1.132.1:443: SNI=pool path=/etc/stunnel/xapi-pool-ca-bundle.pem
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] stunnel start
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] Started a client (pid:1243977): -> 10.1.132.1:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xmlrpc_client] stunnel pid: 1243977 (not cached) connected to 10.1.132.1:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xapi_pool_transition] Proposing myself as a new master to host address: 10.1.132.3
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] client cert verification 10.1.132.3:443: SNI=pool path=/etc/stunnel/xapi-pool-ca-bundle.pem
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] stunnel start
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] Started a client (pid:1243981): -> 10.1.132.3:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xmlrpc_client] stunnel pid: 1243981 (not cached) connected to 10.1.132.3:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] client cert verification 10.1.132.3:443: SNI=pool path=/etc/stunnel/xapi-pool-ca-bundle.pem
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] stunnel start
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] Started a client (pid:1243985): -> 10.1.132.3:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xmlrpc_client] stunnel pid: 1243985 (not cached) connected to 10.1.132.3:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] client cert verification 10.1.132.3:443: SNI=pool path=/etc/stunnel/xapi-pool-ca-bundle.pem
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] stunnel start
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] Started a client (pid:1243989): -> 10.1.132.3:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xmlrpc_client] stunnel pid: 1243989 (not cached) connected to 10.1.132.3:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xapi_pool_transition] Proposing myself as a new master to host address: 10.1.132.2
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] client cert verification 10.1.132.2:443: SNI=pool path=/etc/stunnel/xapi-pool-ca-bundle.pem
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] stunnel start
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] Started a client (pid:1243993): -> 10.1.132.2:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xmlrpc_client] stunnel pid: 1244005 (not cached) connected to 10.1.132.2:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xapi_pool_transition] No-one objected to the proposal. Any errors from here on will result in thi
s node entering the 'broken' state
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xapi_pool_transition] Phase 2: committing transaction
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xapi_pool_transition] Phase 2.1: telling everyone but me to commit
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xapi_pool_transition] Signalling commit to host address: 10.1.132.1
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] client cert verification 10.1.132.1:443: SNI=pool path=/etc/stunnel/xapi-pool-ca-bundle.pem
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] stunnel start
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] Started a client (pid:1244011): -> 10.1.132.1:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xmlrpc_client] stunnel pid: 1244011 (not cached) connected to 10.1.132.1:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] client cert verification 10.1.132.1:443: SNI=pool path=/etc/stunnel/xapi-pool-ca-bundle.pem
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] stunnel start
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|stunnel] Started a client (pid:1244015): -> 10.1.132.1:443
Oct 14 15:20:58 xcp-ng-hqerhcgv xapi: [debug||79 HTTPS 10.1.132.1->:::80|pool.designate_new_master R:f29c1d69d97b|xmlrpc_client] stunnel pid: 1244015 (not cached) connected to 10.1.132.1:443
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [ warn||180 |watching networks for NBD-related changes D:5228a098d303|xmlrpc_client] stunnel pid: 1242711 caught Http_client.Http_request_rejected("{ frame = false; method = POST; uri = /; query = [  ]; content_length = [  ]; transfer encoding = ; version = 1.1; cookie = [ (value filtered) ]; task = ; subtask_of = DummyRef:|5228a098-d303-4215-d007-1262b74182b6|watching networks for NBD-related changes; content-type = ; host = ; user_agent = xen-api-libs/24.19.2; traceparent =  }")
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [ warn||179 |timeboxed_rpc D:12ea59e27c8b|xmlrpc_client] stunnel pid: 1241311 caught Http_client.Http_request_rejected("{ frame = false; method = POST; uri = /; query = [  ]; content_length = [  ]; transfer encoding = ; version = 1.1; cookie = [ (value filtered) ]; task = ; subtask_of = DummyRef:|221fabee-3cce-498d-5d77-1a2a0ddc0e41|xapi events; content-type = ; host = ; user_agent = xen-api-libs/24.19.2; traceparent =  }")
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [debug||180 |watching networks for NBD-related changes D:5228a098d303|stunnel] check_reusable: caught exception Http_client.Http_request_rejected("{ frame = false; method = POST; uri = /; query = [  ]; content_length = [  ]; transfer encoding = ; version = 1.1; cookie = [ (value filtered) ]; task = ; subtask_of = ; content-type = ; host = ; user_agent = xen-api-libs/24.19.2; traceparent =  }"); assuming not reusable
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [debug||180 |watching networks for NBD-related changes D:5228a098d303|stunnel] get_reusable_stunnel: Found non-reusable stunnel in the cache. disconnecting from 10.1.132.1:443
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] timeboxed_rpc D:12ea59e27c8b failed with exception Http_client.Http_request_rejected("{ frame = false; method = POST; uri = /; query = [  ]; content_length = [  ]; transfer encoding = ; version = 1.1; cookie = [ (value filtered) ]; task = ; subtask_of = DummyRef:|221fabee-3cce-498d-5d77-1a2a0ddc0e41|xapi events; content-type = ; host = ; user_agent = xen-api-libs/24.19.2; traceparent =  }")
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] Raised Http_client.Http_request_rejected("{ frame = false; method = POST; uri = /; query = [  ]; content_length = [  ]; transfer encoding = ; version = 1.1; cookie = [ (value filtered) ]; task = ; subtask_of = DummyRef:|221fabee-3cce-498d-5d77-1a2a0ddc0e41|xapi events; content-type = ; host = ; user_agent = xen-api-libs/24.19.2; traceparent =  }")
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 1/20 xapi Raised at file ocaml/libs/http-lib/http_client.ml, line 199
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 2/20 xapi Called from file ocaml/libs/http-lib/http_client.ml, line 219
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 3/20 xapi Called from file ocaml/libs/http-lib/xmlrpc_client.ml, line 384
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 4/20 xapi Called from file ocaml/libs/http-lib/xmlrpc_client.ml, line 365
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 5/20 xapi Called from file ocaml/libs/http-lib/xmlrpc_client.ml, line 371
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 6/20 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 7/20 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 39
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 8/20 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 9/20 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 39
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 10/20 xapi Called from file ocaml/libs/http-lib/xmlrpc_client.ml, line 194
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 11/20 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 12/20 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 39
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 13/20 xapi Called from file option.ml, line 24
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 14/20 xapi Called from file ocaml/libs/http-lib/xmlrpc_client.ml, line 192
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 15/20 xapi Called from file ocaml/xapi/helpers.ml, line 449
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 16/20 xapi Called from file ocaml/xapi/server_helpers.ml, line 67
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 17/20 xapi Called from file ocaml/xapi/server_helpers.ml, line 94
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 18/20 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 19/20 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 39
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 20/20 xapi Called from file ocaml/libs/log/debug.ml, line 250
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace]
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] Raised Http_client.Http_request_rejected("{ frame = false; method = POST; uri = /; query = [  ]; content_length = [  ]; transfer encoding = ; version = 1.1; cookie = [ (value filtered) ]; task = ; subtask_of = DummyRef:|221fabee-3cce-498d-5d77-1a2a0ddc0e41|xapi events; content-type = ; host = ; user_agent = xen-api-libs/24.19.2; traceparent =  }")
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 1/6 xapi Raised at file ocaml/libs/log/debug.ml, line 267
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 2/6 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 3/6 xapi Called from file ocaml/libs/xapi-stdext/lib/xapi-stdext-pervasives/pervasiveext.ml, line 39
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 4/6 xapi Called from file ocaml/xapi-client/client.ml, line 14
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 5/6 xapi Called from file ocaml/xapi-client/client.ml, line 3804
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace] 6/6 xapi Called from file ocaml/xapi/xapi_xenops.ml, line 3290
Oct 14 15:20:59 xcp-ng-hqerhcgv xapi: [error||179 |xapi events D:221fabee3cce|backtrace]

Meanwhile, the old-master while restarting xapi reveals a strange situation, where it has lost its tls-enablement status:

Oct 14 15:20:45 xcpng83-bzkcpvhy xsh: [ warn||0 ||xsh] TLS verification is disabled on this host: /var/xapi/verify-certificates is absent
Oct 14 15:20:45 xcpng83-bzkcpvhy xsh: [ info||0 ||Stunnel_client] disabling default tls verification
Oct 14 15:20:45 xcpng83-bzkcpvhy xsh: [debug||0 ||stunnel] client cert verification 10.1.132.2:443: None
Oct 14 15:20:45 xcpng83-bzkcpvhy xsh: [debug||0 ||stunnel] Started a client (pid:1124004): -> 10.1.132.2:443

Digging the log reveals that this first occurred last friday, when I was (without success) trying to reproduce the issue with only 2 hosts in the pool.

@ydirson
Copy link
Contributor Author

ydirson commented Oct 16, 2024

Checked on a brand new 8.2.1 pool, similar but not identical behavior: from the master I can launch a quick sequence of designate-new-master without even seeing a second run being logged (where on 8.3 I was getting a second run):

[18:11 host2 ~]# grep -i designate.*phase /var/log/xensource.log 
Oct 16 18:09:29 host2 xapi: [debug||207 HTTPS 10.1.132.31->:::80|pool.designate_new_master R:6cc538e4abe2|xapi_pool_transition] attempting manual two-phase commit of new master. My address = 10.1.132.32; peer addresses = [ 10.1.132.33; 10.1.132.31 ]
Oct 16 18:09:29 host2 xapi: [debug||207 HTTPS 10.1.132.31->:::80|pool.designate_new_master R:6cc538e4abe2|xapi_pool_transition] Phase 1: proposing myself as new master
Oct 16 18:09:30 host2 xapi: [debug||207 HTTPS 10.1.132.31->:::80|pool.designate_new_master R:6cc538e4abe2|xapi_pool_transition] Phase 2: committing transaction
Oct 16 18:09:30 host2 xapi: [debug||207 HTTPS 10.1.132.31->:::80|pool.designate_new_master R:6cc538e4abe2|xapi_pool_transition] Phase 2.1: telling everyone but me to commit
Oct 16 18:09:42 host2 xapi: [debug||207 HTTPS 10.1.132.31->:::80|pool.designate_new_master R:6cc538e4abe2|xapi_pool_transition] Phase 2.2: setting flag to make us restart when the connection to the master dies
[18:12 host2 ~]# 

And running a new sequence from the same machine (now a slave) to switch it back to master, I get the same result:

[18:12 host1 ~]# time xe pool-designate-new-master host-uuid=7978ca38-0c46-47f2-845b-1f0038c69fd9 ; time xe pool-designate-new-master host-uuid=7978ca38-0c46-47f2-845b-1f0038c69fd9 ; time xe pool-designate-new-master host-uuid=7978ca38-0c46-47f2-845b-1f0038c69fd9

real	0m11.634s
user	0m0.004s
sys	0m0.028s
Lost connection to the server.

real	0m3.486s
user	0m0.012s
sys	0m0.035s
Lost connection to the server.

real	0m0.144s
user	0m0.001s
sys	0m0.028s
[18:12 host1 ~]# grep -i designate.*phase /var/log/xensource.log 
Oct 16 18:12:46 host1 xapi: [debug||61 HTTPS 10.1.132.32->:::80|pool.designate_new_master R:5a0d04895229|xapi_pool_transition] attempting manual two-phase commit of new master. My address = 10.1.132.31; peer addresses = [ 10.1.132.32; 10.1.132.33 ]
Oct 16 18:12:46 host1 xapi: [debug||61 HTTPS 10.1.132.32->:::80|pool.designate_new_master R:5a0d04895229|xapi_pool_transition] Phase 1: proposing myself as new master
Oct 16 18:12:47 host1 xapi: [debug||61 HTTPS 10.1.132.32->:::80|pool.designate_new_master R:5a0d04895229|xapi_pool_transition] Phase 2: committing transaction
Oct 16 18:12:47 host1 xapi: [debug||61 HTTPS 10.1.132.32->:::80|pool.designate_new_master R:5a0d04895229|xapi_pool_transition] Phase 2.1: telling everyone but me to commit
Oct 16 18:12:56 host1 xapi: [debug||61 HTTPS 10.1.132.32->:::80|pool.designate_new_master R:5a0d04895229|xapi_pool_transition] Phase 2.2: setting flag to make us restart when the connection to the master dies
Oct 16 18:12:57 host1 xapi: [debug||122 HTTPS 10.1.132.32->:::80|pool.designate_new_master R:aafe3f5d52a5|xapi_pool_transition] attempting manual two-phase commit of new master. My address = 10.1.132.31; peer addresses = [ 10.1.132.32; 10.1.132.33 ]
Oct 16 18:12:57 host1 xapi: [debug||122 HTTPS 10.1.132.32->:::80|pool.designate_new_master R:aafe3f5d52a5|xapi_pool_transition] Phase 1: proposing myself as new master
Oct 16 18:12:59 host1 xapi: [debug||122 HTTPS 10.1.132.32->:::80|pool.designate_new_master R:aafe3f5d52a5|xapi_pool_transition] Phase 2: committing transaction
Oct 16 18:12:59 host1 xapi: [debug||122 HTTPS 10.1.132.32->:::80|pool.designate_new_master R:aafe3f5d52a5|xapi_pool_transition] Phase 2.1: telling everyone but me to commit
[18:16 host1 ~]# pidof xapi
[18:16 host1 ~]# 

daemon.log:

Oct 16 18:13:06 host1 systemd[1]: Starting XenAPI server (XAPI)...
Oct 16 18:13:06 host1 systemd[1]: Started XenAPI server (XAPI).
Oct 16 18:13:06 host1 xapi-init[12845]: Starting xapi:
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.863Z||584|About to bind to /var/run/nonpersistent/forkexecd//fd_34399add-58e8-4ee5-9fe2-b253d584dc3e\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.863Z||584|bound, listening\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.874Z||12885|Child here!\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.890Z||12886|Grandchild here!\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.890Z||12886|Started: state.cmdargs = [/usr/bin/systemctl;restart;stunnel@xapi]\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.890Z||12886|Started: state.env = [PATH=/sbin:/usr/sbin:/bin:/usr/bin]\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.891Z||12886|Selecting in handle_comms_no_fd_sock2\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.891Z||12886|Done\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.891Z||12886|fd sock\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.891Z||12886|Selecting in handle_comms_with_fd_sock2\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.891Z||12886|Done\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.892Z||12886|fd sock2\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.892Z||12886|Received fd named: 7db44ef4-5f1c-49cc-a1fa-21690803867a - duping to 1 (from 10)\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.892Z||12886|Selecting in handle_comms_with_fd_sock2\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.892Z||12886|Done\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.892Z||12886|fd sock2\x0A
Oct 16 18:13:06 host1 systemd[1]: xapi.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.892Z||12886|Received fd named: 42d68842-3582-455e-9e2b-eaedd538ecea - duping to 2 (from 10)\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.892Z||12886|Selecting in handle_comms_with_fd_sock2\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.892Z||12886|Done\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.892Z||12886|comms sock\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.893Z||12886|Exec\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.893Z||12886|Finished...\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.893Z||12886|Args after replacement = [/usr/bin/systemctl;restart;stunnel@xapi]\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.893Z||12886|I've received the following fds: [2;1]\x0A\x0A
Oct 16 18:13:06 host1 forkexecd: [ warn||0 ||forkexecd] 20241016T16:13:06.930Z||12886|Caught unexpected exception: End_of_file\x0A
Oct 16 18:13:06 host1 systemd[1]: Cannot add dependency job for unit lvm2-activation.service, ignoring: Unit is masked.
Oct 16 18:13:06 host1 systemd[1]: Cannot add dependency job for unit lvm2-activation-early.service, ignoring: Unit is masked.
Oct 16 18:13:06 host1 systemd[1]: Stopping TLS tunnel for xapi...
Oct 16 18:13:06 host1 systemd[1]: Starting TLS tunnel for xapi...
Oct 16 18:13:07 host1 systemd[1]: Started TLS tunnel for xapi.
Oct 16 18:13:07 host1 xapi-init[12891]: Stopping xapi: [  OK  ]
Oct 16 18:13:07 host1 systemd[1]: Unit xapi.service entered failed state.
Oct 16 18:13:07 host1 systemd[1]: xapi.service failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant