Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGABRT in IceTransport::TimeoutCallback with libnice #1102

Closed
DarthBo opened this issue Jan 29, 2024 · 6 comments · Fixed by #1103
Closed

SIGABRT in IceTransport::TimeoutCallback with libnice #1102

DarthBo opened this issue Jan 29, 2024 · 6 comments · Fixed by #1103

Comments

@DarthBo
Copy link
Contributor

DarthBo commented Jan 29, 2024

I've seen some occasional crashes in the field that usually seem to come from IceTransport::TimeoutCallback.

I've had a look at the implementation of the libnice backend, I'm assuming it's a lifetime issue with the timer stuff, but I'm afraid my lack of experience with glib makes it hard for me to figure out what's going wrong. I also don't have a way to reproduce it yet, so I'm kind of hoping you'll take one look at the relevant code and say "aha, whoops" 😅

libdatachannel version: 0.19.5
libdatachannel build flags: NO_WEBSOCKET, NO_MEDIA, USE_NICE
libnice version: 0.1.21
libglib version: 2.72.3

last log prints:

14:52:44:731 [Debug] Peer Connection state changed: connecting
14:52:45:086 [Debug] Peer Connection state changed: connected
14:52:46:983 [Debug] Peer Connection state changed: disconnected
14:52:46:984 [Debug] Peer Connection state changed: closed
14:52:48:299 [Warning] rtc::impl::IceTransport::processTimeout@745: ICE timeout 

stacktrace:

Fatal Error: SIGABRT / 0xfffffffa / 0x190

0   libc.so.6                       0xffffb9b73a60      __pthread_kill_implementation
1   libc.so.6                       0xffffb9b3b3dc      __GI_raise
2   libc.so.6                       0xffffb9b2b224      __GI_abort
3   libc.so.6                       0xffffb9b35bc0      __assert_fail_base
4   libc.so.6                       0xffffb9b35c24      __GI___assert_fail
5   libc.so.6                       0xffffb9b74d34      ___pthread_mutex_lock
6   libdatachannel.so.0.19          0xffffba99c988      [inlined] __gthread_mutex_lock (gthr-default.h:749)
7   libdatachannel.so.0.19          0xffffba99c988      [inlined] __gthread_recursive_mutex_lock (gthr-default.h:811)
8   libdatachannel.so.0.19          0xffffba99c988      [inlined] std::recursive_mutex::lock (mutex:108)
9   libdatachannel.so.0.19          0xffffba99c988      [inlined] std::lock_guard<T>::lock_guard (std_mutex.h:229)
10  libdatachannel.so.0.19          0xffffba99c988      [inlined] rtc::synchronized_callback<T>::operator() (utils.hpp:80)
11  libdatachannel.so.0.19          0xffffba99c988      rtc::impl::Transport::changeState (transport.cpp:64)
12  libdatachannel.so.0.19          0xffffba95a130      rtc::impl::IceTransport::TimeoutCallback (icetransport.cpp:844)
13  libglib-2.0.so.0                0xffffbb5441fc      g_timeout_dispatch
14  libglib-2.0.so.0                0xffffbb543814      g_main_context_dispatch
15  libglib-2.0.so.0                0xffffbb543a58      g_main_context_iterate.constprop.0
16  libglib-2.0.so.0                0xffffbb543e80      g_main_loop_run
17  libstdc++.so.6                  0xffffb9ddfc4c      execute_native_thread_routine
18  libc.so.6                       0xffffb9b72170      start_thread
19  libc.so.6                       0xffffb9bc7f18      thread_start
@paullouisageneau
Copy link
Owner

I think there is a possible race condition as the destructor for IceTransport first disarm the timer then destroys the agent. However, it would be possible that the agent changes states between the two and rearms the timer, resulting in IceTransport::TimeoutCallback being called later for a destroyed object.

Could you please check whether #1103 fixes the issue?

@DarthBo
Copy link
Contributor Author

DarthBo commented Feb 2, 2024

Oof, I definitely never would have thought of that. Thanks for looking into it!

I'll pull the patch and see if we can reproduce it, will report back :)

@paullouisageneau
Copy link
Owner

Any news about this?

@DarthBo
Copy link
Contributor Author

DarthBo commented Feb 23, 2024

Oh sorry, I didn't realise it has already been 3 weeks 😱

I haven't seen the issue since applying the patch, but I never really bumped into it before myself either. I was hoping to push a build to our beta testers (who have bumped into it) last week, but unrelated issues delayed that release.

It might be another couple of weeks before I can safely say it's fixed 😅

@DarthBo
Copy link
Contributor Author

DarthBo commented Mar 11, 2024

I haven't seen a single libnice/datachannel related crash so far, so it certainly looks good :)

@paullouisageneau
Copy link
Owner

Great, thank you for testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants