-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for checking error field on traceroute.Reply #29
Comments
This looks more like a bug in the Atlas code to me. Have you reported this to RIPE? For example: {
"hop": 2,
"result": [
{
"error": "sendto failed: Network is unreachable"
}
]
} Should really be: {
"hop": 2,
"error": "sendto failed: Network is unreachable"
} And that would be correct according to doc for v4400. |
I was generally thinking the same thing. However, one complication is around how to handle an error that occurs in a later reply. If a hop already has one or two valid replies, then it may not make sense to mark the entire hop as having errored.
|
I have not yet raised an issue related to this field with RIPE Atlas. As it stands there is a decent lead time on a fix being created, implemented, and deployed to their probes. Even if they are can finish the deployment of a patch by tomorrow, nearly all of the previous measurement data is still effected so it would be helpful to have a way of identifying it until a more permanent solution is implemented. I am also somewhat unclear on if this is actually a bug in the probe software or the API documentation. If we assume it is intentional then the 3 cases for Timeout, Error, and Reply would closely match the ping measurement results structure. |
I'd like to not deviate from the documentation, please try and get them to update that first. |
@jmeggitt Any progress on updating the documentation? |
@jmeggitt ping? |
Sorry about that. I saw your previous ping, but got distracted with other work. I have notified them about the issue, but due to limited resources to address the issue I would not expect to see any updates to the documentation anytime soon. I have not been pushing the issue other than bringing it to their attention, so I imagine it is still in their backlog. This issue can be found on RIPE-NCC/ripe-atlas-probe-measurements#14. However, if you want to get in contact with them or ask about the status of the issue then you will likely have more success directly emailing them at [email protected] to create a ticket in their system. |
Issue
RIPE Atlas probes will occasionally emit hop replies containing an
"error"
field. This behavior is not documented for any firmware version on https://atlas.ripe.net/docs/apis/result-format/. Upon some investigation, some of these errors can be attributed to the following probe measurement code.https://github.com/RIPE-NCC/ripe-atlas-probe-measurements/blob/master/eperd/traceroute.c#L636-L643
According to git blame, this has and continues to be part of the probe behavior for over 10 years now.
This functionality to detect this field is necessary to verify whether past and current traceroute measurement data is effected.
Effected Measurement Examples
I dumped every measurement from one of the RIPE Atlas hourly traceroute dump files (traceroute-2022-10-14T0400.bz2) that contained this field found. This data is stored as newline delimited JSON and can be found at https://gist.github.com/jmeggitt/11fba9f7fa539e8a4fdae1e231ec8fa1. This appears to be extremely rare and only occurred in 1690 of the 8,912,306 traceroute measurements in that data file (0.019%).
Values
As far as I have seen, this field is always a string when it appears. This appears to be consistent with the probe code above.
Examples
Here are a couple measurements I arbitrarily chose to show off what they look like in the context of the data.
The text was updated successfully, but these errors were encountered: