Support for checking error field on traceroute.Reply #29

jmeggitt · 2022-11-16T02:48:01Z

Issue

RIPE Atlas probes will occasionally emit hop replies containing an "error" field. This behavior is not documented for any firmware version on https://atlas.ripe.net/docs/apis/result-format/. Upon some investigation, some of these errors can be attributed to the following probe measurement code.
https://github.com/RIPE-NCC/ripe-atlas-probe-measurements/blob/master/eperd/traceroute.c#L636-L643
According to git blame, this has and continues to be part of the probe behavior for over 10 years now.

This functionality to detect this field is necessary to verify whether past and current traceroute measurement data is effected.

Effected Measurement Examples

I dumped every measurement from one of the RIPE Atlas hourly traceroute dump files (traceroute-2022-10-14T0400.bz2) that contained this field found. This data is stored as newline delimited JSON and can be found at https://gist.github.com/jmeggitt/11fba9f7fa539e8a4fdae1e231ec8fa1. This appears to be extremely rare and only occurred in 1690 of the 8,912,306 traceroute measurements in that data file (0.019%).

Values

As far as I have seen, this field is always a string when it appears. This appears to be consistent with the probe code above.

count   value
      9 "bind failed: Address already in use"
     11 "bind failed: Address not available"
     47 "bind failed: Cannot assign requested address"
    334 "bind failed: Invalid argument"
    804 "sendto failed: Network is unreachable"
    104 "sendto failed: Network unreachable"
    364 "sendto failed: Operation not permitted"
     17 "sendto failed: Permission denied"

Examples

Here are a couple measurements I arbitrarily chose to show off what they look like in the context of the data.

{
  "af": 6,
  "dst_addr": "2001:500:2::c",
  "dst_name": "2001:500:2::c",
  "endtime": 1665721544,
  "from": "2001:bc8:62c:2545::1",
  "fw": 5020,
  "lts": 11,
  "msm_id": 6011,
  "msm_name": "Traceroute",
  "mver": "2.2.0",
  "paris_id": 2,
  "prb_id": 1000410,
  "proto": "UDP",
  "result": [
    {
      "hop": 1,
      "result": [
        {
          "error": "sendto failed: Operation not permitted"
        }
      ]
    }
  ],
  "size": 40,
  "src_addr": "2001:bc8:62c:2545::1",
  "timestamp": 1665721544,
  "type": "traceroute"
}

{
  "af": 4,
  "dst_addr": "46.101.130.201",
  "dst_name": "46.101.130.201",
  "endtime": 1665722912,
  "from": "170.39.226.151",
  "fw": 5040,
  "group_id": 29556742,
  "lts": 1,
  "msm_id": 29556742,
  "msm_name": "Traceroute",
  "mver": "2.4.1",
  "paris_id": 14,
  "prb_id": 6927,
  "proto": "ICMP",
  "result": [
    {
      "hop": 1,
      "result": [
        {
          "x": "*"
        },
        {
          "x": "*"
        },
        {
          "x": "*"
        }
      ]
    },
    {
      "hop": 2,
      "result": [
        {
          "error": "sendto failed: Network is unreachable"
        }
      ]
    }
  ],
  "size": 48,
  "src_addr": "170.39.226.151",
  "timestamp": 1665722900,
  "type": "traceroute"
}

{
  "af": 6,
  "dst_addr": "2a00:74c0:0:2::20",
  "dst_name": "2a00:74c0:0:2::20",
  "endtime": 1665722173,
  "from": "2a05:f6c7:3853:0:eade:27ff:fe69:dd4e",
  "fw": 5070,
  "group_id": 25639804,
  "lts": 38,
  "msm_id": 25639804,
  "msm_name": "Traceroute",
  "mver": "2.6.1",
  "paris_id": 7,
  "prb_id": 22203,
  "proto": "ICMP",
  "result": [
    {"hop":1,"result":[{"from":"2a05:f6c7:3853:0:1e74:dff:fec3:e2f8","rtt":0.948,"size":96,"ttl":255},{"from":"2a05:f6c7:3853:0:1e74:dff:fec3:e2f8","rtt":0.828,"size":96,"ttl":255},{"from":"2a05:f6c7:3853:0:1e74:dff:fec3:e2f8","rtt":0.759,"size":96,"ttl":255}]},
    {"hop":2,"result":[{"from":"2a05:f6c0:1::18","rtt":3.658,"size":96,"ttl":63},{"from":"2a05:f6c0:1::18","rtt":4.791,"size":96,"ttl":63},{"from":"2a05:f6c0:1::18","rtt":3.415,"size":96,"ttl":63}]},
    {"hop":3,"result":[{"from":"2a05:f6c0:2:23::1","rtt":5.209,"size":96,"ttl":62},{"from":"2a05:f6c0:2:23::1","rtt":4.848,"size":96,"ttl":62},{"from":"2a05:f6c0:2:23::1","rtt":5.137,"size":96,"ttl":62}]},
    {"hop":4,"result":[{"from":"2001:6c8:81:100::1a1","rtt":4.867,"size":96,"ttl":252},{"from":"2001:6c8:81:100::1a1","rtt":18.54,"size":96,"ttl":252},{"from":"2001:6c8:81:100::1a1","rtt":4.106,"size":96,"ttl":252}]}, 
    {"hop":5,"result":[{"from":"2001:6c8:40::1e","rtt":4.761,"size":96,"ttl":250},{"from":"2001:6c8:40::1e","rtt":4.698,"size":96,"ttl":250},{"from":"2001:6c8:40::1e","rtt":5.275,"size":96,"ttl":250}]},
    {
      "hop": 6,
      "result": [
        {
          "x": "*"
        },
        {
          "error": "sendto failed: Network is unreachable"
        }
      ]
    }
  ],
  "size": 48,
  "src_addr": "2a05:f6c7:3853:0:eade:27ff:fe69:dd4e",
  "timestamp": 1665722169,
  "type": "traceroute"
}

The text was updated successfully, but these errors were encountered:

jelu · 2022-11-16T10:06:43Z

This looks more like a bug in the Atlas code to me. Have you reported this to RIPE?

For example:

    {
      "hop": 2,
      "result": [
        {
          "error": "sendto failed: Network is unreachable"
        }
      ]
    }

Should really be:

    {
      "hop": 2,
      "error": "sendto failed: Network is unreachable"
    }

And that would be correct according to doc for v4400.

jmeggitt · 2022-11-16T11:29:56Z

I was generally thinking the same thing. However, one complication is around how to handle an error that occurs in a later reply. If a hop already has one or two valid replies, then it may not make sense to mark the entire hop as having errored.

    {
      "hop": 6,
      "result": [
        {
          "x": "*"
        },
        {
          "error": "sendto failed: Network is unreachable"
        }
      ]
    }

jmeggitt · 2022-11-16T11:32:59Z

I have not yet raised an issue related to this field with RIPE Atlas. As it stands there is a decent lead time on a fix being created, implemented, and deployed to their probes. Even if they are can finish the deployment of a patch by tomorrow, nearly all of the previous measurement data is still effected so it would be helpful to have a way of identifying it until a more permanent solution is implemented.

I am also somewhat unclear on if this is actually a bug in the probe software or the API documentation. If we assume it is intentional then the 3 cases for Timeout, Error, and Reply would closely match the ping measurement results structure.

jelu · 2022-11-16T11:39:14Z

I'd like to not deviate from the documentation, please try and get them to update that first.

jelu · 2023-01-09T13:27:08Z

@jmeggitt Any progress on updating the documentation?

jelu · 2023-01-16T14:10:31Z

@jmeggitt ping?

jmeggitt · 2023-01-16T15:30:09Z

Sorry about that. I saw your previous ping, but got distracted with other work. I have notified them about the issue, but due to limited resources to address the issue I would not expect to see any updates to the documentation anytime soon. I have not been pushing the issue other than bringing it to their attention, so I imagine it is still in their backlog.

This issue can be found on RIPE-NCC/ripe-atlas-probe-measurements#14. However, if you want to get in contact with them or ask about the status of the issue then you will likely have more success directly emailing them at [email protected] to create a ticket in their system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for checking error field on traceroute.Reply #29

Support for checking error field on traceroute.Reply #29

jmeggitt commented Nov 16, 2022

jelu commented Nov 16, 2022 •

edited

Loading

jmeggitt commented Nov 16, 2022

jmeggitt commented Nov 16, 2022 •

edited

Loading

jelu commented Nov 16, 2022

jelu commented Jan 9, 2023

jelu commented Jan 16, 2023

jmeggitt commented Jan 16, 2023 •

edited

Loading

Support for checking error field on traceroute.Reply #29

Support for checking error field on traceroute.Reply #29

Comments

jmeggitt commented Nov 16, 2022

Issue

Effected Measurement Examples

Values

Examples

jelu commented Nov 16, 2022 • edited Loading

jmeggitt commented Nov 16, 2022

jmeggitt commented Nov 16, 2022 • edited Loading

jelu commented Nov 16, 2022

jelu commented Jan 9, 2023

jelu commented Jan 16, 2023

jmeggitt commented Jan 16, 2023 • edited Loading

jelu commented Nov 16, 2022 •

edited

Loading

jmeggitt commented Nov 16, 2022 •

edited

Loading

jmeggitt commented Jan 16, 2023 •

edited

Loading