-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rtcm_client fails silently if corrections server pauses #28
Comments
BTW, this happened with two different mountpoints on rtk2go.com. So it's probably not just Joe Blow's base station - it's a broader resilience issue. |
Reading rtk2go website it says it'll email you if something happens. Did you receive any emails? This Ntrip client works with the auscors network in Australia and with the French/european Ntrip servers. The error message seems to indicate that you are loosing networking?? |
Another option is to compile the code with uncommenting the line curl_easy_setopt(handle, CURLOPT_VERBOSE, 1L); In essence libcurl is being used. Any errors should be logged. You might also try setting the log level to debug in the launch file. |
@PaulBouchier could you please try setting this in the code with the verbose option https://curl.se/libcurl/c/CURLOPT_MAXAGE_CONN.html The suggestion is based on reading this https://stackoverflow.com/questions/70657452/libcurl-checking-the-server-shutdown-connection |
Hi - thanks for your prompt and effective response. Working through this, finally got it working like it was yesterday, only using a freshly cloned & built ublox_dgnss package. Answering a couple of questions, with additional testing per your request:
|
It resulted in the client printing the following error every few seconds to the terminal:
I don't know if that's significant, but I haven't been able to get this freshly-cloned ntrip_client to make diff_corr go true. That's different from yesterday with the dgnss ntrip_client from ros-humble-ublox-dgnss, which would make diff_corr go true, then it would go false after a few minutes. |
(ros-humble-ntrip-client was working perfectly before I ran the freshly-cloned version of dgnss ntrip_client.)
|
@PaulBouchier I've been told by another that rtk2go.com is not reliable. There are a few other CURLOPT_ settings to do with connections that might help. However you are going to have to try them out and see if they make a difference. If you can let me know which ones, then I'll look into settings up parameters for the next version. I may not have published a humble version of the driver for a while. The only other change I can think of is curl_easy_setopt(handle, CURLOPT_USERAGENT, "NTRIP ros2/ublox_dgnss"); was added otherwise it doesnt work with the French ntrip server. Maybe try commenting that out. If it does make a difference, again let me know, and I'll add a parameter and change it so its conditionally set such that you can disable it in the launch file To set the log level have a look here https://robotics.stackexchange.com/questions/99332/specify-log-level-on-ros2-node-in-launch-file |
There might also be another issue with using rtk2go - we may need to set headers to tell it to use Ntrip 2 ?? https://gitlab.com/gpsd/gpsd/-/issues/191 |
Thanks for thinking about this, Nick. I think, considering I have a November 23 deadline, I'm going to put this investigation on hold until after then, and use your ublox-dgnss satellite node with the ros-humble-ntrip-client, which seems to work. Thanks for the pointer to setting the logging level. I had found that, but hoped you'd been able to use it specifically in the containers you run the nodes in. The example in the link you gave looks like:
(with changes as noted in the thread. But your containerized launch file looks a lot different:
Maybe the --ros-args should go in the ComposableNodeContainer() block? Yes, rtk2go.com is unreliable, but it's all we've got in Texas. Thanks again for looking at this. I think we should leave this ticket open for now, and I can put some more logging in after my deadline and see if I can figure out what's going on. What I don't understand is, diff_corr stays false in the freshly-cloned SW, but ntrip_client is sending corrections data on /ntrip_client/rtcm. Very strange. Maybe a bag file would help. |
One other thing - and this is not a criticism - I greatly appreciate all the work you did to not just create the node for your own purposes, but then to publish it to a package and answer questions from people like me. I understand why you chose to use the SensorMessages Qos profile, it's what OSRF recommends, and for good reason - best effort with minimal delay is the right choice for sensor data. But it comes with downsides that I'm sure you're aware of - can't use rqt_topic, and I'm not yet sure about bagfiles. Essentially, the ROS2 ecosystem has failed to adequately accommodate best-effort Qos. Maybe, as an enhancement, there ought to be a repeater-node that produces "shadow-copies" of the topics except on a RELIABLE Qos topic, for tool support. Or maybe some of the topics that are of a debug nature, like UBX_RMX_RTCM, should be published with a RELIABLE Qos. Something to think about. If you want, I'll make an enhancement request ticket along these lines. |
@PaulBouchier im not really sure what's going on in your environment. You can always use curl command line to figure out the settings for libcurl. Reading rtk2go it suggests that the default of 5 connections for libcurl might be an issue. Whatever is going on, it is peculiar to rtk2go only and how those servers are set up. I've posted a few hints in the previous posts to figure it out - if the other script works that's good. Yes I might change some of the default libcurl connection settings such that if network connectivity is lost, it'll timeout and reconnect, instead of sitting there silently waiting for the next stream record. On the QoS side I've kept the recommended but acknowledge the challenges and yes for ros2 bag you need to override the QoS https://docs.ros.org/en/jazzy/How-To-Guides/Overriding-QoS-Policies-For-Recording-And-Playback.html I suspect you can do the same in the launch files to. However I might need to change how I create the message publishers https://robotics.stackexchange.com/questions/105844/is-there-a-workaround-to-set-qos-at-launch-time I just haven't had a need to do this (except overriding the qos for ros2 bags) |
@PaulBouchier if you look at the latest code that I've pushed to github, you will see that I've added parameters for log_level and maxage_conn If you want to debug try
Note when log_level is anything other then INFO it will now also show the verbose libcurl output as well. I've set the maxage_conn default to 30 seconds where the standard according to the libcurl doco is 118 seconds. Had it sitting here watching it publish the ntrip rtcm messages with There is not much more that can be done with the code at this point in time besides tweaking some of the other variables. I'll leave that to you if it adds value to your situation. Welcome to send through a PR with any other changes. Am closing this issue as I believe now you can tweak how often stale connection with attempt to reconnect if you loose network connectivity with rtk2go. |
@PaulBouchier added a new issue #29 for overriding QoS profiles - modified the code to add a qos override for the publisher and subscribers but haven't tested creating a launch file overriding Also thinking about rtk2go, have you investigated Point Perfect and SPARTN protocol? ... initial changes have been created such that you can see if SPARTN is working (look at UBX_RXM_COR message - has both NTRIP and SPARTN)... however more work needs to be done on SPARTNKEYS as well as receiving SPARTN messages similar to the NTRIP client and pushing them to the UBLOX_DGNSS and out to the device. SPARTN messages can be received I believe either via L1 Band or via the internet. Point Perfect with SPARTN would make using rtk2go obsolete. |
Thank you very much for your great support, and the parameter-add for
logging level. I forked your repo and am using it in my development so I
can better look into any concerns. And thanks for the info about Jazzy
supporting qos modifications - that's good to know and is something to
weigh in considering when to advance my ROS version.
I had not investigated Point Perfect - thanks for the pointer. At $42/mo
they're certainly reasonably priced for a commercial application. I'm a
hobbyist and that's more than I want to spend that way, and rtk2go works
well enough. If I find anything about why ntrip_client doesn't work with
rtk2go I'll let you know. However, one of my Dallas Personal Robotics Group
friends reported that ros-humble-ntrip-client will get you put in the
rtk2go sandbox too, if left on overnight, because it requests at 10Hz and
they only want max 1 Hz requests.
Best regards
Paul Bouchier
…On Tue, Oct 22, 2024 at 7:21 PM Nick Hortovanyi ***@***.***> wrote:
@PaulBouchier <https://github.com/PaulBouchier> added a new issue #29
<#29> for overriding
QoS profiles - modified the code to add a qos override for the publisher
and subscribers but haven't tested creating a launch file overriding
Also thinking about rtk2go, have you investigated Point Perfect and SPARTN
protocol? ... initial changes have been created such that you can see if
SPARTN is working (look at UBX_RXM_COR message - has both NTRIP and
SPARTN)... however more work needs to be done on SPARTNKEYS as well as
receiving SPARTN messages similar to the NTRIP client and pushing them to
the UBLOX_DGNSS and out to the device. SPARTN messages can be received I
believe either via L1 Band or via the internet.
Point Perfect with SPARTN would make using rtk2go obsolete.
—
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAT7CUPEINGYULL7PTHCLQTZ43TYNAVCNFSM6AAAAABQIAHFS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZQGU2TANJSGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
When using libcurl for ntrip it's a continuous stream of messages that never ends. I had to add in a way to check if it should exit the thread, so after every 10, it exits and then reconnects. That's the new connection attempts. The error just before is expected - only way to tell libcurl to return control. So I'm not sure about the Hz as it's driven by the ntrip castor itself. Suspect it's the reconnections - you might get leverage changing it to 20. Just means if you try to terminate the node, it might need to wait longer before the thread stops. Took a little bit of effort to get this working at first. I still think most when doing it in c/c++ use libcurl. You could just use the command line version of libcurl and create a shell script around it, if it fails logging appropriate messages to test out different settings. I'm surprised the Texas government hasn't set a service up?? |
Hmmm, thanks for the explanation. I don't know enough to respond, however
I'll go with what works. My DPRG friend is using
https://github.com/olvdhrm/RTK_GPS_NTRIP, which is the Kumar Robotics F9P
driver, the LORD/Microstrain ntrip_client (which is what's he reported as
getting him put in the sandbox after an overnight run, and also what I'm
using with your ublox_dgnss F9P driver) and one other package.
The Texas Department of Transportation has a corrections network that
covers the state, but they won't provide public access. 😫. Sucks for a
state with a population the same as Australia.
If you feel like a little light entertainment for a couple of minutes,
here's a summary video of my robot's 2nd place run in last year's DPRG
RoboColumbus contest. 😅
https://youtu.be/JCzpXuPcQ2M
The November 23 deadline is for this year's contest. I'm going back to New
Zealand to see my family in a few days so I am extremely time-constrained
for getting it working better for this year's contest.
Cheers
Paul
…On Wed, Oct 23, 2024 at 3:34 AM Nick Hortovanyi ***@***.***> wrote:
When using libcurl for ntrip it's a continuous stream of messages that
never ends. I had to add in a way to check if it should exit the thread, so
after every 10, it exits and then reconnects. That's the new connection
attempts. The error just before is expected - only way to tell libcurl to
return control.
So I'm not sure about the Hz as it's driven by the ntrip castor itself.
Suspect it's the reconnections - you might get leverage changing it to 20.
Just means if you try to terminate the node, it might need to wait longer
before the thread stops.
Took a little bit of effort to get this working at first. I still think
most when doing it in c/c++ use libcurl.
You could just use the command line version of libcurl and create a shell
script around it, if it fails logging appropriate messages to test out
different settings.
I'm surprised the Texas government hasn't set a service up??
—
Reply to this email directly, view it on GitHub
<#28 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAT7CUJNREDFMFNSXOJJ4KDZ45NSXAVCNFSM6AAAAABQIAHFS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZRGMYDONZWGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@PaulBouchier thanks for sharing. Looks interesting. Good luck this year with the contest :) |
I am having trouble with version (0.5.2-1jammy.20240731.053557) of ros-humble-ublox-dgnss installed from apt on Ubuntu 22.04 on Raspberry Pi and operating in RTK-GPS mode with an rtk2go.com corrections server. Frequently, at random intervals, the /ubx_nav_hp_pos_llh message field h_acc starts increasing and goes up to 2 meters or so, and differential corrections are lost. Upon investigation, I found /ntrip_client/rtcm would stop sending corrections coincident with the increase in the h_acc field - there were just no more corrections emitted by the rtcm_client, but no error on the terminal in which the client was launched with ros2 launch ublox_dgnss ntrip_client.launch.py use_https:=false host:=rtk2go.com port:=2101 mountpoint:=VN1 username:=paul.bouchier-at-gmail-d-com password:=Unused. Everything looked normal from the terminal and logging point of view.
I installed ros-humble-rtcm-client to get a second view of what was happening, and it printed the following to the terminal, coinicident with this package's rtcm_client going dead:
Output from ros-humble-rtcm-client shown.
This happened again about 5 minutes later, and I believe what's happening is the rtk2go.com ntrip server is pausing for 15-20 seconds every few minutes. This throws the ublox_dgnss rtcm_client for a loop and it doesn't attempt to reconnect, so no corrections are ever sent again until rtcm_client is re-started.
By comparison, ros_humble-rtcm-client times out and retries until it succeeds when the server comes back 20 seconds or so later.
I re-mapped topics to feed the ros-humble-rtcm-client corrections into ublox-dgnss node, and then I got stable operation. I'm unstuck for now, but clearly this is an undesirable state of affairs. I think this rtcm_client needs to detect loss of corrections and retry until it succeeds and be more noisy about the failure.
The text was updated successfully, but these errors were encountered: