Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of Allocation API: fix "no path to region" errors for non-global regions into release/1.9.x #24682

Merged

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #24644 to be assessed for backporting due to the inclusion of the label backport/1.9.x.

🚨

Warning automatic cherry-pick of commits failed. If the first commit failed,
you will see a blank no-op commit below. If at least one commit succeeded, you
will see the cherry-picked commits up to, not including, the commit where
the merge conflict occurred.

The person who merged in the original PR is:
@tgross
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.

merge conflict error: POST https://api.github.com/repos/hashicorp/nomad/merges: 409 Merge conflict []

The below text is copied from the body of the original PR.


In #16872 we added support for unix domain sockets, but this required mutating the Config when parsing the address so as to remove the port number. In #23785 we fixed a bug where if the configuration was used across multiple clients (as in the autoscaler) that mutation would happen multiple times and the address would be incorrectly parsed.

When making alloc log, alloc fs, or alloc exec calls where we have line-of-sight to the client, we attempt to make a HTTP API call directly to the client node. So we create a new API client from the same configuration and then set the address. But in this case we copy the private url field and that causes the URL parsing to be skipped for the new client.

This results in the region always being set to the string literal "global" (because of mTLS handling code introduced all the way back in 4d3b75d), unless the user has set the region specifically. This fails with an error "no path to region" when the cluster isn't non-global and requests are sent to a non-leader.

Arguably the "right" way of fixing this would be for ClientConfig not to change the API client's region to "global" in the first place, but as this is a public API and extremely longstanding behavior, it could potentially be a breaking change for some downstream consumers. Instead, we'll avoid copying the private url field so that the new address is re-parsed.

Fixes: #24635
Fixes: #24609
Ref: #16872
Ref: #23785
Ref: 4d3b75d
Ref: https://hashicorp.atlassian.net/browse/NET-11858

Testing & Reproduction steps

To reproduce, stand up a cluster with region = "example" with at least one client, in an environment where you have line-of-sight to all nodes (ex. local development environment should work fine). Deploy a job to that client and run nomad alloc logs :alloc_id.


Overview of commits

Copy link

hashicorp-cla-app bot commented Dec 16, 2024

CLA assistant check
All committers have signed the CLA.

In #16872 we added support for unix domain sockets, but this required mutating
the `Config` when parsing the address so as to remove the port number. In #23785
we fixed a bug where if the configuration was used across multiple clients that
mutation would happen multiple times and the address would be incorrectly
parsed.

When making `alloc log`, `alloc fs`, or `alloc exec` calls where we have
line-of-sight to the client, we attempt to make a HTTP API call directly to the
client node. So we create a new API client from the same configuration and then
set the address. But in this case we copy the private `url` field and that
causes the URL parsing to be skipped for the new client.

This results in the region always being set to the string literal
`"global"` (because of mTLS handling code introduced all the way back in
4d3b75d), unless the user has set the region specifically. This fails with
an error "no path to region" when the cluster isn't non-global and requests are
sent to a non-leader.

Arguably the "right" way of fixing this would be for `ClientConfig` not to
change the API client's region to `"global"` in the first place, but as this is
a public API and extremely longstanding behavior, it could potentially be a
breaking change for some downstream consumers. Instead, we'll avoid copying the
private `url` field so that the new address is re-parsed.

Fixes: #24635
Fixes: #24609
Ref: #16872
Ref: #23785
Ref: 4d3b75d
@tgross tgross force-pushed the backport/b-region-fs-api/carefully-civil-crappie branch from bc1acaf to 12e148f Compare December 16, 2024 16:11
@tgross tgross marked this pull request as ready for review December 16, 2024 16:11
@tgross
Copy link
Member

tgross commented Dec 16, 2024

Fixed backport. Will need re-review.

tgross
tgross previously approved these changes Dec 16, 2024
@tgross tgross merged commit 439a656 into release/1.9.x Dec 16, 2024
27 checks passed
@tgross tgross deleted the backport/b-region-fs-api/carefully-civil-crappie branch December 16, 2024 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants