Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

odom node crashes with buffer overrun #75

Open
tnr00071 opened this issue Oct 10, 2024 · 5 comments
Open

odom node crashes with buffer overrun #75

tnr00071 opened this issue Oct 10, 2024 · 5 comments

Comments

@tnr00071
Copy link

tnr00071 commented Oct 10, 2024

terminate called after throwing an instance of 'ros::serialization::StreamOverrunException'
  what():  Buffer Overrun
[robot/dlio_odom-1] process has died [pid 548014, exit code -6, cmd /local/gitlab/direct_lidar_inertial_odometry/build/Release/devel/lib/direct_lidar_inertial_odometry/dlio_odom_node ~pointcloud:=/ouster/points2 ~imu:=/ouster/imu ~odom:=dlio/odom_node/odom ~pose:=dlio/odom_node/pose ~path:=dlio/odom_node/path ~kf_pose:=dlio/odom_node/keyframes ~kf_cloud:=dlio/odom_node/pointcloud/keyframe ~deskewed:=dlio/odom_node/pointcloud/deskewed __name:=dlio_odom __log:=/home/tray/.ros/log/17805ca8-8655-11ef-a824-0f5710f791f5/robot-dlio_odom-1.log].
log file: /home/tray/.ros/log/17805ca8-8655-11ef-a824-0f5710f791f5/robot-dlio_o

Been also trying to run with gdb, to get the exact line that throw the error, but its extremely slow. Will keep trying to debug. Currently running on ros noetic on ubuntu 20.04

@kennyjchen
Copy link
Contributor

Hi @tnr00071 -- Try running gdb in RelWithDebInfo, i.e. put set( CMAKE_BUILD_TYPE RelWithDebInfo ... FORCE ) in the CMakeLists.txt. If you compile purely with debugging symbols, it'll slow down a lot. Otherwise, this might be related to the (excessive) terminal output. Try suppressing that and see if you still get this error.

@tnr00071
Copy link
Author

tnr00071 commented Oct 14, 2024

Hey @kennyjchen

Thanks for the suggestion. I ran gdb with RelWithDebInfo and suppressed the output to stdout.

This is the backtrace from gdb:

[New Thread 0x7ffee4e77700 (LWP 209010)]
terminate called after throwing an instance of 'ros::serialization::StreamOverrunException'
  what():  Buffer Overrun
[New Thread 0x7ffef4ff9700 (LWP 209011)]
--Type <RET> for more, q to quit, c to continue without paging--c

Thread 22565 "dlio_odom_node" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffeecff9700 (LWP 209009)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff727f859 in __GI_abort () at abort.c:79
#2  0x00007ffff769b8d1 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff76a737c in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff76a73e7 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff76a7699 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff794f212 in ros::serialization::throwStreamOverrun() () from /opt/ros/noetic/lib/libroscpp_serialization.so
#7  0x00005555555a0335 in ros::serialization::Stream::advance (len=8, this=<synthetic pointer>) at /opt/ros/noetic/include/ros/serialization.h:689
#8  ros::serialization::Serializer<double>::write<ros::serialization::OStream> (v=-4616.4892578125, stream=<synthetic pointer>...) at /opt/ros/noetic/include/ros/serialization.h:211
#9  ros::serialization::serialize<double, ros::serialization::OStream> (t=@0x7ffed508e650: -4616.4892578125, stream=<synthetic pointer>...)
    at /opt/ros/noetic/include/ros/serialization.h:155
#10 ros::serialization::OStream::next<double> (t=@0x7ffed508e650: -4616.4892578125, this=<synthetic pointer>) at /opt/ros/noetic/include/ros/serialization.h:755
#11 ros::serialization::Serializer<geometry_msgs::Point_<std::allocator<void> > >::allInOne<ros::serialization::OStream, geometry_msgs::Point_<std::allocator<void> > const&> (
    m=..., stream=<synthetic pointer>...) at /opt/ros/noetic/include/geometry_msgs/Point.h:183
#12 ros::serialization::Serializer<geometry_msgs::Point_<std::allocator<void> > >::write<ros::serialization::OStream, geometry_msgs::Point_<std::allocator<void> > > (t=..., 
    stream=<synthetic pointer>...) at /opt/ros/noetic/include/geometry_msgs/Point.h:188
#13 ros::serialization::serialize<geometry_msgs::Point_<std::allocator<void> >, ros::serialization::OStream> (t=..., stream=<synthetic pointer>...)
    at /opt/ros/noetic/include/ros/serialization.h:155
#14 ros::serialization::OStream::next<geometry_msgs::Point_<std::allocator<void> > > (t=..., this=<synthetic pointer>) at /opt/ros/noetic/include/ros/serialization.h:755
#15 ros::serialization::Serializer<geometry_msgs::Pose_<std::allocator<void> > >::allInOne<ros::serialization::OStream, geometry_msgs::Pose_<std::allocator<void> > const&> (m=..., 
    stream=<synthetic pointer>...) at /opt/ros/noetic/include/geometry_msgs/Pose.h:194
#16 ros::serialization::Serializer<geometry_msgs::Pose_<std::allocator<void> > >::write<ros::serialization::OStream, geometry_msgs::Pose_<std::allocator<void> > > (t=..., 
    stream=<synthetic pointer>...) at /opt/ros/noetic/include/geometry_msgs/Pose.h:198
#17 ros::serialization::serialize<geometry_msgs::Pose_<std::allocator<void> >, ros::serialization::OStream> (t=..., stream=<synthetic pointer>...)
    at /opt/ros/noetic/include/ros/serialization.h:155
#18 ros::serialization::OStream::next<geometry_msgs::Pose_<std::allocator<void> > > (t=..., this=<synthetic pointer>) at /opt/ros/noetic/include/ros/serialization.h:755
#19 ros::serialization::VectorSerializer<geometry_msgs::Pose_<std::allocator<void> >, std::allocator<geometry_msgs::Pose_<std::allocator<void> > >, void>::write<ros::serialization::OStream> (v=..., stream=<synthetic pointer>...) at /opt/ros/noetic/include/ros/serialization.h:447
#20 ros::serialization::serialize<geometry_msgs::Pose_<std::allocator<void> >, std::allocator<geometry_msgs::Pose_<std::allocator<void> > >, ros::serialization::OStream> (t=..., 
    stream=<synthetic pointer>...) at /opt/ros/noetic/include/ros/serialization.h:484
#21 ros::serialization::OStream::next<std::vector<geometry_msgs::Pose_<std::allocator<void> >, std::allocator<geometry_msgs::Pose_<std::allocator<void> > > > > (t=..., 
    this=<synthetic pointer>) at /opt/ros/noetic/include/ros/serialization.h:755
#22 ros::serialization::Serializer<geometry_msgs::PoseArray_<std::allocator<void> > >::allInOne<ros::serialization::OStream, geometry_msgs::PoseArray_<std::allocator<void> > const&>
    (m=..., stream=<synthetic pointer>...) at /opt/ros/noetic/include/geometry_msgs/PoseArray.h:219
#23 ros::serialization::Serializer<geometry_msgs::PoseArray_<std::allocator<void> > >::write<ros::serialization::OStream, geometry_msgs::PoseArray_<std::allocator<void> > > (t=..., 
--Type <RET> for more, q to quit, c to continue without paging--
    stream=<synthetic pointer>...) at /opt/ros/noetic/include/geometry_msgs/PoseArray.h:222
#24 ros::serialization::serialize<geometry_msgs::PoseArray_<std::allocator<void> >, ros::serialization::OStream> (t=..., stream=<synthetic pointer>...)
    at /opt/ros/noetic/include/ros/serialization.h:155
#25 ros::serialization::serializeMessage<geometry_msgs::PoseArray_<std::allocator<void> > > (message=...) at /opt/ros/noetic/include/ros/serialization.h:822
#26 0x00005555555955f6 in boost::_bi::list1<boost::reference_wrapper<geometry_msgs::PoseArray_<std::allocator<void> > const> >::operator()<ros::SerializedMessage, ros::SerializedMessage (*)(geometry_msgs::PoseArray_<std::allocator<void> > const&), boost::_bi::list0> (f=<optimized out>, a=<synthetic pointer>..., this=<optimized out>)
    at /usr/include/boost/bind/bind.hpp:1291
#27 boost::_bi::bind_t<ros::SerializedMessage, ros::SerializedMessage (*)(geometry_msgs::PoseArray_<std::allocator<void> > const&), boost::_bi::list1<boost::reference_wrapper<geometry_msgs::PoseArray_<std::allocator<void> > const> > >::operator() (this=<optimized out>) at /usr/include/boost/bind/bind.hpp:1294
#28 boost::detail::function::function_obj_invoker0<boost::_bi::bind_t<ros::SerializedMessage, ros::SerializedMessage (*)(geometry_msgs::PoseArray_<std::allocator<void> > const&), boost::_bi::list1<boost::reference_wrapper<geometry_msgs::PoseArray_<std::allocator<void> > const> > >, ros::SerializedMessage>::invoke (function_obj_ptr=...)
    at /usr/include/boost/function/function_template.hpp:137
#29 0x00007ffff7a9fb08 in ros::TopicManager::publish(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<ros::SerializedMessage ()> const&, ros::SerializedMessage&) () from /opt/ros/noetic/lib/libroscpp.so
#30 0x00007ffff7a938c6 in ros::Publisher::publish(boost::function<ros::SerializedMessage ()> const&, ros::SerializedMessage&) const () from /opt/ros/noetic/lib/libroscpp.so
#31 0x0000555555584370 in ros::Publisher::publish<geometry_msgs::PoseArray_<std::allocator<void> > > (message=..., this=0x7fffffffb5b8) at /usr/include/c++/9/new:174
#32 dlio::OdomNode::publishKeyframe (this=0x7fffffffb4a0, kf={...}, timestamp=...) at /local/gitlab/direct_lidar_inertial_odometry/src/dlio/odom.cc:472
#33 0x000055555559c4d8 in std::__invoke_impl<void, void (dlio::OdomNode::*)(std::pair<std::pair<Eigen::Matrix<float, 3, 1, 0, 3, 1>, Eigen::Quaternion<float, 0> >, boost::shared_ptr<pcl::PointCloud<dlio::Point> const> >, ros::Time), dlio::OdomNode*, std::pair<std::pair<Eigen::Matrix<float, 3, 1, 0, 3, 1>, Eigen::Quaternion<float, 0> >, boost::shared_ptr<pcl::PointCloud<dlio::Point> const> >, ros::Time> (__t=<optimized out>, __f=<optimized out>) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:474
#34 std::__invoke<void (dlio::OdomNode::*)(std::pair<std::pair<Eigen::Matrix<float, 3, 1, 0, 3, 1>, Eigen::Quaternion<float, 0> >, boost::shared_ptr<pcl::PointCloud<dlio::Point> const> >, ros::Time), dlio::OdomNode*, std::pair<std::pair<Eigen::Matrix<float, 3, 1, 0, 3, 1>, Eigen::Quaternion<float, 0> >, boost::shared_ptr<pcl::PointCloud<dlio::Point> const> >, ros::Time> (__fn=<optimized out>) at /usr/include/c++/9/bits/invoke.h:95
#35 std::thread::_Invoker<std::tuple<void (dlio::OdomNode::*)(std::pair<std::pair<Eigen::Matrix<float, 3, 1, 0, 3, 1>, Eigen::Quaternion<float, 0> >, boost::shared_ptr<pcl::PointCloud<dlio::Point> const> >, ros::Time), dlio::OdomNode*, std::pair<std::pair<Eigen::Matrix<float, 3, 1, 0, 3, 1>, Eigen::Quaternion<float, 0> >, boost::shared_ptr<pcl::PointCloud<dlio::Point> const> >, ros::Time> >::_M_invoke<0ul, 1ul, 2ul, 3ul> (this=<optimized out>) at /usr/include/c++/9/thread:244
#36 std::thread::_Invoker<std::tuple<void (dlio::OdomNode::*)(std::pair<std::pair<Eigen::Matrix<float, 3, 1, 0, 3, 1>, Eigen::Quaternion<float, 0> >, boost::shared_ptr<pcl::PointCloud<dlio::Point> const> >, ros::Time), dlio::OdomNode*, std::pair<std::pair<Eigen::Matrix<float, 3, 1, 0, 3, 1>, Eigen::Quaternion<float, 0> >, boost::shared_ptr<pcl::PointCloud<dlio::Point> const> >, ros::Time> >::operator() (this=<optimized out>) at /usr/include/c++/9/thread:251
#37 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (dlio::OdomNode::*)(std::pair<std::pair<Eigen::Matrix<float, 3, 1, 0, 3, 1>, Eigen::Quaternion<float, 0> >, boost::shared_ptr<pcl::PointCloud<dlio::Point> const> >, ros::Time), dlio::OdomNode*, std::pair<std::pair<Eigen::Matrix<float, 3, 1, 0, 3, 1>, Eigen::Quaternion<float, 0> >, boost::shared_ptr<pcl::PointCloud<dlio::Point> const> >, ros::Time> > >::_M_run (this=<optimized out>) at /usr/include/c++/9/thread:195
#38 0x00007ffff76d3df4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#39 0x00007ffff7f57609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#40 0x00007ffff737c353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) 

It appears the issue happens at line 427 of odom.cc: this->kf_pose_pub.publish(this->kf_pose_ros);

I commented this line out and the issue stopped happening. Not sure what causes this line throws the exception ... maybe not locked properly.

Could you also update the README.md to include a description of all of the different published topics? Would be nice to know the difference between kf_pose and pose, deskewed and kf_cloud etc.

@tnr00071
Copy link
Author

@kennyjchen anymore thoughts?

@kennyjchen
Copy link
Contributor

Hmm, how long after running DLIO is this occurring? It's possible that, after a long time, there are too many keyframes to publish and it subsequently crashes when trying to serialize it (hence the error). For the most part, a lot of the topics being published are for visualization purposes, which can be commented out if you're in a production environment.

I'll try to find some time in the future to update the README to include a description of the published topics, as you suggested. I appreciate the feedback!

@tnr00071
Copy link
Author

tnr00071 commented Oct 22, 2024

Around an hour long data set with an OS1-128 for context. Thanks for the reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants