Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using routes.unmatched: heuristic interferes with Prometheus/Grafana regex interpretation #1410

Open
marevers opened this issue Nov 26, 2024 · 7 comments · May be fixed by #1418
Open

Using routes.unmatched: heuristic interferes with Prometheus/Grafana regex interpretation #1410

marevers opened this issue Nov 26, 2024 · 7 comments · May be fixed by #1418

Comments

@marevers
Copy link
Contributor

marevers commented Nov 26, 2024

If the following is enabled in the configuration:

routes:
  unmatched: heuristic

Beyla will automatically identify certain parts of the route (like numeric ids) and replace them with an asterisk in order to prevent high-cardinality metrics and to be able to analyze common routes even if the path contains a unique id. However, the specific use of an asterisk (*) can introduce some problems when using regex in the label selectors.

For example, if you want to match a certain list of paths, the asterisk is interpreted as regex which causes no matches.

http_server_request_duration_seconds_sum{http_route=~"/my-service/tenants/*/services/*/activate|/my-service/tenants/*/services/*/activate"}

Here I would expect metrics from these two specific HTTP routes, but there are no results, because:

  1. I need to use =~ to enable the usage of pipes.
  2. Using =~ enables regex as well, which interprets the asterisks rather than using them as a literal value.

This can be worked around, but is unhandy at times.

I would suggest making the replacement character (currently the asterisk) configurable. The asterisk can stay the default, but maybe it could be made configurable so the user can choose something else instead of an asterisk (either a single character or a string of characters).

@marctc
Copy link
Contributor

marctc commented Nov 26, 2024

Hey @marevers, thanks for reporting!

2. Using =~ enables regex as well, which interprets the asterisks rather than using them as a literal value.

How is this unhandy? isn't just scape the character?

@marevers
Copy link
Contributor Author

marevers commented Nov 26, 2024

I was now able to escape them with double \ e.g. /my-service/tenants/\\*/services/\\*/activate. However, escaping in itself is still required because of the asterisk. If it were any character not part of the regex token set, then that would not be necessary.

@marctc
Copy link
Contributor

marctc commented Nov 26, 2024

I see, definitively it's unpractical. I wouldn't mind make this configurable, but I think it's a bit overkill to add just a config option for that. Maybe we can come up with a better character/word than *.

@marevers
Copy link
Contributor Author

What about #? I think it can convey the same meaning and AFAIK it is not interpreted because it is not a token. It would be a breaking change, but probably only in few users' cases.

@grcevski
Copy link
Contributor

This is a good point, our use of asterisk came from looking at what the OpenTelemetry Java SDK did for unknown routes, but perhaps we can replace it in 2.0 which is a major new release. Adding '{}' might be an option too, but to make the code perform it was easier to use a single character. The loop in the heuristic processor is ugly because it's highly specialized, we were concerned with performance.

@marevers marevers linked a pull request Nov 28, 2024 that will close this issue
@mariomac
Copy link
Contributor

mariomac commented Nov 28, 2024

Dummy question: wouldn't escaping the asterisks work?

http_server_request_duration_seconds_sum{http_route=~"/my-service/tenants/\*/services/\*/activate|/my-service/tenants/\*/services/\*/activate"}

or maybe

http_server_request_duration_seconds_sum{http_route=~"/my-service/tenants/\\*/services/\\*/activate|/my-service/tenants/\\*/services/\\*/activate"}

@marevers
Copy link
Contributor Author

@mariomac it does, your option 2 works like I mentioned in my earlier comment. So it can indeed be worked around, but it would be better to use a character that is not interpreted by regex at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants