Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing at exit #251

Closed
springmeyer opened this issue May 5, 2014 · 28 comments
Closed

Crashing at exit #251

springmeyer opened this issue May 5, 2014 · 28 comments

Comments

@springmeyer
Copy link
Member

Been seeing some odd crashes that show that node-mapnik may crash as the Node.js process exits. The condition occurs because node-mapnik is still rendering in a thread while the main event loop closes and the process unloads memory (and mapnik datasource plugins).

A first I thought this was a fluke but users have been reporting it occasionally in the past month. So perhaps something changed in Mapnik 2.3.x or node-mapnik to make this possible.

@springmeyer
Copy link
Member Author

@springmeyer
Copy link
Member Author

crash report https://gist.github.com/tmcw/11187569 - which was worked around by not using the OGR plugin. So it seems like at least in this case (tilelive-overlay) the OGR plugin was causing a crash at exit. Switching out for the CSV plugin avoided the problem (mapbox/geojson-mapnikify@b65856b).

@springmeyer
Copy link
Member Author

Another case which only happens when the GDAL plugin is involved, so this is starting to look like some kind of threading conflict between gdal and libuv.

@springmeyer
Copy link
Member Author

looks like we need to be calling

    GDALDestroyDriverManager();
    OGRCleanupAll();

as per http://hub.qgis.org/issues/8932

@springmeyer
Copy link
Member Author

6b2b2a9 avoids the crash but is not feasible because node-mapnik, and libmapnik itself do not and cannot link to libgdal directly because libgdal is only linked to the dynamically loaded Mapnik gdal.input plugin.

Trying to register and atexit in the plugins themselves does not avoid the crash: https://gist.github.com/springmeyer/51e39e5a78c603824c35

Here is the full crash:

Process:         node [97044]
Path:            /Users/USER/*/node
Identifier:      node
Version:         ???
Code Type:       X86-64 (Native)
Parent Process:  node [97042]
Responsible:     Terminal [46679]
User ID:         501

Date/Time:       2014-05-16 15:46:56.726 -0700
OS Version:      Mac OS X 10.9.2 (13C1021)
Report Version:  11
Anonymous UUID:  B9F868B4-F6CC-BF2C-C8C7-25252C6F4292

Sleep/Wake UUID: DCF4EA15-6ABA-4248-A2A4-838EBAB91472

Crashed Thread:  4

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000111806340

VM Regions Near 0x111806340:
    __LINKEDIT             000000010b0c3000-00000001114d4000 [100.1M] r--/rwx SM=COW  /Users/USER/*/*.dylib
--> 
    STACK GUARD            0000000113276000-0000000113277000 [    4K] ---/rwx SM=NUL  stack guard for thread 5

Thread 0:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib          0x00000001031c9a3a __semwait_signal + 10
1   libsystem_pthread.dylib         0x00000001032c57f3 pthread_join + 433
2   node                            0x0000000100131bb2 uv_thread_join + 14
3   node                            0x0000000100132055 cleanup + 55
4   dyld                            0x00007fff6c475ef5 ImageLoaderMachO::doTermination(ImageLoader::LinkContext const&) + 215
5   dyld                            0x00007fff6c466279 dyld::runTerminators(void*) + 69
6   libsystem_c.dylib               0x00000001031087a1 __cxa_finalize + 177
7   libsystem_c.dylib               0x0000000103108a4c exit + 22
8   node                            0x000000010000745e node::Exit(v8::Arguments const&) + 53
9   node                            0x00000001001757f2 v8::internal::Builtin_HandleApiCall(v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)1>, v8::internal::Isolate*) + 482
10  ???                             0x000033211c806362 0 + 56217305113442
11  ???                             0x000033211ca0d644 0 + 56217307240004
12  ???                             0x000033211ca0ca58 0 + 56217307236952
13  ???                             0x000033211c8122de 0 + 56217305162462
14  ???                             0x000033211c8ae96f 0 + 56217305803119
15  ???                             0x000033211ca07b88 0 + 56217307216776
16  ???                             0x000033211c80ccce 0 + 56217305140430
17  ???                             0x000033211ca07a66 0 + 56217307216486
18  ???                             0x000033211c8d9875 0 + 56217305978997
19  ???                             0x000033211c8d96f4 0 + 56217305978612
20  ???                             0x000033211c8d9177 0 + 56217305977207
21  ???                             0x000033211c80d507 0 + 56217305142535
22  ???                             0x000033211c806116 0 + 56217305112854
23  node                            0x00000001001a3a67 v8::internal::Invoke(bool, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool*) + 439
24  node                            0x000000010014f678 v8::Function::Call(v8::Handle<v8::Object>, int, v8::Handle<v8::Value>*) + 440
25  node                            0x0000000100006932 node::MakeCallback(v8::Handle<v8::Object>, v8::Handle<v8::Function>, int, v8::Handle<v8::Value>*) + 177
26  node                            0x0000000100006a25 node::MakeCallback(v8::Handle<v8::Object>, v8::Handle<v8::String>, int, v8::Handle<v8::Value>*) + 89
27  node                            0x000000010000ab87 node::CheckImmediate(uv_check_s*, int) + 110
28  node                            0x000000010012cc32 uv__run_check + 35
29  node                            0x000000010012990b uv_run + 233
30  node                            0x000000010000a87f node::Start(int, char**) + 331
31  node                            0x0000000100001af4 start + 52

Thread 1:
0   libsystem_kernel.dylib          0x00000001031c5a1a mach_msg_trap + 10
1   libsystem_kernel.dylib          0x00000001031c4d18 mach_msg + 64
2   com.apple.CoreFoundation        0x000000010107bfc5 __CFRunLoopServiceMachPort + 181
3   com.apple.CoreFoundation        0x000000010107b5e9 __CFRunLoopRun + 1161
4   com.apple.CoreFoundation        0x000000010107af25 CFRunLoopRunSpecific + 309
5   com.apple.CoreFoundation        0x0000000101130811 CFRunLoopRun + 97
6   node                            0x000000010013471a uv__cf_loop_runner + 67
7   node                            0x0000000100128fdb uv__thread_start + 25
8   libsystem_pthread.dylib         0x00000001032c1899 _pthread_body + 138
9   libsystem_pthread.dylib         0x00000001032c172a _pthread_start + 137
10  libsystem_pthread.dylib         0x00000001032c5fc9 thread_start + 13

Thread 2:: Dispatch queue: com.apple.libdispatch-manager
0   libsystem_kernel.dylib          0x00000001031ca662 kevent64 + 10
1   libdispatch.dylib               0x000000010302143d _dispatch_mgr_invoke + 239
2   libdispatch.dylib               0x0000000103021152 _dispatch_mgr_thread + 52

Thread 3:: SamplerThread
0   libsystem_kernel.dylib          0x00000001031c9a3a __semwait_signal + 10
1   libsystem_c.dylib               0x0000000103126dc0 nanosleep + 200
2   libsystem_c.dylib               0x0000000103126cb2 usleep + 54
3   node                            0x00000001003e5e3d v8::internal::SamplerThread::Run() + 125
4   node                            0x00000001003e55ec v8::internal::ThreadEntry(void*) + 60
5   libsystem_pthread.dylib         0x00000001032c1899 _pthread_body + 138
6   libsystem_pthread.dylib         0x00000001032c172a _pthread_start + 137
7   libsystem_pthread.dylib         0x00000001032c5fc9 thread_start + 13

Thread 4 Crashed:
0   ???                             0x0000000111806340 0 + 4588593984
1   libsystem_pthread.dylib         0x00000001032c554f _pthread_tsd_cleanup + 86
2   libsystem_pthread.dylib         0x00000001032c2479 _pthread_exit + 111
3   libsystem_pthread.dylib         0x00000001032c18a4 _pthread_body + 149
4   libsystem_pthread.dylib         0x00000001032c172a _pthread_start + 137
5   libsystem_pthread.dylib         0x00000001032c5fc9 thread_start + 13

Thread 5:
0   libsystem_kernel.dylib          0x00000001031c9746 __psynch_mutexwait + 10
1   libsystem_pthread.dylib         0x00000001032c4779 _pthread_mutex_lock + 372
2   libsystem_pthread.dylib         0x00000001032c3cce _pthread_cond_wait + 874
3   node                            0x0000000100131e55 uv_cond_wait + 9
4   node                            0x0000000100132648 worker + 207
5   node                            0x0000000100128fdb uv__thread_start + 25
6   libsystem_pthread.dylib         0x00000001032c1899 _pthread_body + 138
7   libsystem_pthread.dylib         0x00000001032c172a _pthread_start + 137
8   libsystem_pthread.dylib         0x00000001032c5fc9 thread_start + 13

Thread 6:
0   libsystem_kernel.dylib          0x00000001031c5a1a mach_msg_trap + 10
1   libsystem_kernel.dylib          0x00000001031c4d18 mach_msg + 64
2   libsystem_kernel.dylib          0x00000001031c03f4 semaphore_create + 99
3   libsystem_platform.dylib        0x00000001032aa3f8 _os_semaphore_create + 56
4   libsystem_pthread.dylib         0x00000001032c24b0 _pthread_exit + 166
5   libsystem_pthread.dylib         0x00000001032c18a4 _pthread_body + 149
6   libsystem_pthread.dylib         0x00000001032c172a _pthread_start + 137
7   libsystem_pthread.dylib         0x00000001032c5fc9 thread_start + 13

Thread 4 crashed with X86 Thread State (64-bit):
  rax: 0x0000000111806340  rbx: 0x0000000000000107  rcx: 0x0000000000000107  rdx: 0x0000000108fe2000
  rdi: 0x0000000113d237a0  rsi: 0x0000000000000000  rbp: 0x0000000108fe1ea0  rsp: 0x0000000108fe1e78
   r8: 0x0000000106f0b6c0   r9: 0x0000000106f0cb80  r10: 0x00000000e60971e5  r11: 0x00000000980855d1
  r12: 0x0000000108fe2000  r13: 0x0000000000000000  r14: 0x00000001032c9c00  r15: 0x0000000000000000
  rip: 0x0000000111806340  rfl: 0x0000000000010287  cr2: 0x0000000111806340

Logical CPU:     3
Error Code:      0x00000014
Trap Number:     14

springmeyer pushed a commit that referenced this issue May 16, 2014
springmeyer pushed a commit that referenced this issue May 16, 2014
@springmeyer
Copy link
Member Author

just found that building gdal with --with-threads=no also works around this crash (on OS X at least).

@springmeyer
Copy link
Member Author

my punishment for:

Once I rebuilt gdal --with-threads all my troubles went away.

http://osgeo-org.1560.x6.nabble.com/gdal-dev-Reason-not-to-enable-gdal-with-threads-by-default-td3747947.html

@springmeyer
Copy link
Member Author

another data point: only happens with node v0.10.x and not v0.8.x (based on one travis build: https://travis-ci.org/mapnik/node-mapnik/builds/25319869)

@GretaCB
Copy link
Contributor

GretaCB commented May 19, 2014

@springmeyer My most recent seg fault crash reports for my ogr datasources:
https://gist.github.com/GretaCB/8ef22213cce510e37798

Currently attempting creating Datasources as CSV's.

@springmeyer
Copy link
Member Author

noting that --with-threads was first determined to be needed as part of mapnik/mapnik#597, but this was a while ago - maybe we can get away with it.

@springmeyer
Copy link
Member Author

core dump on ubuntu 14.04 looks like:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00002b85705151d0 in ?? ()
(gdb) thread apply all bt

Thread 6 (Thread 0x2b85714e0700 (LWP 30410)):
#0  0x00002b85705151d0 in ?? ()
#1  0x00002b8569826f82 in __nptl_deallocate_tsd () at pthread_create.c:158
#2  0x00002b8569827195 in start_thread (arg=0x2b85714e0700) at pthread_create.c:325
#3  0x00002b8569b3830d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 5 (Thread 0x2b85712df700 (LWP 30409)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000993bb9 in uv_cond_wait (cond=cond@entry=0xe92160 <cond>, mutex=mutex@entry=0xe92120 <mutex>) at ../deps/uv/src/unix/thread.c:322
#2  0x0000000000993d3b in worker (arg=arg@entry=0x0) at ../deps/uv/src/unix/threadpool.c:56
#3  0x00000000009895f9 in uv__thread_start (ctx_v=<optimized out>) at ../deps/uv/src/uv-common.c:322
#4  0x00002b8569827182 in start_thread (arg=0x2b85712df700) at pthread_create.c:312
#5  0x00002b8569b3830d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 4 (Thread 0x2b85687f7700 (LWP 30408)):
#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1  0x00000000008291e5 in v8::internal::LinuxSemaphore::Wait() ()
#2  0x0000000000741444 in v8::internal::RuntimeProfiler::WaitForSomeIsolateToEnterJS() ()
#3  0x000000000082a812 in v8::internal::SignalSender::Run() ()
#4  0x00000000008291c7 in v8::internal::ThreadEntry(void*) ()
#5  0x00002b8569827182 in start_thread (arg=0x2b85687f7700) at pthread_create.c:312
#6  0x00002b8569b3830d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 3 (Thread 0x2b85687e3c40 (LWP 30407)):
#0  0x00002b856982866b in pthread_join (threadid=47852129482496, thread_return=thread_return@entry=0x0) at pthread_join.c:92
#1  0x00000000009937be in uv_thread_join (tid=<optimized out>) at ../deps/uv/src/unix/thread.c:37
#2  0x0000000000588822 in cleanup () at ../deps/uv/src/unix/threadpool.c:143
#3  0x00002b85687c573a in _dl_fini () at dl-fini.c:252
#4  0x00002b8569a79509 in __run_exit_handlers (status=1, listp=0x2b8569dfc6c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:82
#5  0x00002b8569a79555 in __GI_exit (status=<optimized out>) at exit.c:104
#6  0x0000000000830ffc in node::Exit(v8::Arguments const&) ()
#7  0x00000000005c1b1c in v8::internal::Builtin_HandleApiCall(v8::internal::(anonymous namespace)::BuiltinArguments<(v8::internal::BuiltinExtraArguments)1>, v8::internal::Isolate*)
    ()
#8  0x000031eb2000618e in ?? ()
#9  0x000031eb20013058 in ?? ()
#10 0x000031eb200060e1 in ?? ()
#11 0x00007fff6b069270 in ?? ()
#12 0x00007fff6b0692c0 in ?? ()
#13 0x000031eb2027a384 in ?? ()
#14 0x0000352b70857f81 in ?? ()
#15 0x0000000100000000 in ?? ()
#16 0x0000118f30f75b71 in ?? ()
#17 0x0000352b7086dbf9 in ?? ()
---Type <return> to continue, or q <return> to quit---
#18 0x0000118f30f75b01 in ?? ()
#19 0x00007fff6b0692f8 in ?? ()
#20 0x000031eb202797d8 in ?? ()
#21 0x0000000100000000 in ?? ()
#22 0x0000352b708573a1 in ?? ()
#23 0x0000352b7086dbf9 in ?? ()
#24 0x0000300b07a37859 in ?? ()
#25 0x0000300b07a377c9 in ?? ()
#26 0x00007fff6b069338 in ?? ()
#27 0x000031eb200284fe in ?? ()
#28 0x0000300b07a36421 in ?? ()
#29 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x2b85716e1700 (LWP 30411)):
#0  pthread_cond_signal@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_signal.S:91
#1  0x0000000000993b79 in uv_cond_signal (cond=cond@entry=0xe92160 <cond>) at ../deps/uv/src/unix/thread.c:312
#2  0x0000000000993e01 in worker (arg=arg@entry=0x0) at ../deps/uv/src/unix/threadpool.c:61
#3  0x00000000009895f9 in uv__thread_start (ctx_v=<optimized out>) at ../deps/uv/src/uv-common.c:322
#4  0x00002b8569827182 in start_thread (arg=0x2b85716e1700) at pthread_create.c:312
#5  0x00002b8569b3830d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x2b85718e2700 (LWP 30412)):
#0  0x00002b85705151d0 in ?? ()
#1  0x00002b8569826f82 in __nptl_deallocate_tsd () at pthread_create.c:158
#2  0x00002b8569827195 in start_thread (arg=0x2b85718e2700) at pthread_create.c:325
#3  0x00002b8569b3830d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

@springmeyer
Copy link
Member Author

rebuild in debug, but basically the same backtrace: https://gist.github.com/springmeyer/ca6e2afec1b4d2ce29d7

@rcoup
Copy link
Contributor

rcoup commented May 22, 2014

Does dropping GDAL/OGR plugins away and doing a minimal GDAL build help at all? Guess it's hard to know which :)

@springmeyer
Copy link
Member Author

Here are workarounds I've tried that do not work / still crash

To recap, the only viable workaround so far is building gdal --with-threads=no. The other workaround which is not viable is calling GDALDestroyDriverManager from node.js C++ atexit (this is not viable because node-mapnik does not and cannot link to libgdal - to test I forced it but this will not work for general users).

@rcoup
Copy link
Contributor

rcoup commented May 23, 2014

can GDALDestroyDriverManager() be called from PluginInfo::exit() or somewhere via some sort of datasource_cache::unload() method, then call that from node::AtExit() via node-mapnik?

@springmeyer
Copy link
Member Author

good idea - @artemp had a similar idea today, which is to create a singleton in mapnik to be able to call GDALRegisterALL only once rather than per datasource constructor. This singleton could also be used to call GDALDestroyDriverManager() when exiting.

@springmeyer
Copy link
Member Author

Note: [email protected] went out with binaries using gdal build with --with-threads=no and its now proven that this is still not safe for highly threaded render jobs: https://github.com/mapbox/tilemill/issues/2338, tracking at #265.

@springmeyer
Copy link
Member Author

another finding: Building mapnik with ./configure PLUGIN_LINKING=static also is able to workaround the atexit crash.

@springmeyer
Copy link
Member Author

another workaround: dlopen('gdal.input', RTLD_NODELETE) and more resources:

@springmeyer
Copy link
Member Author

working on testcase that removes Node.js and Mapnik at https://github.com/springmeyer/gdal-atexit-crash

diorahman pushed a commit to rockybars/node-mapnik that referenced this issue Jun 12, 2014
diorahman pushed a commit to rockybars/node-mapnik that referenced this issue Jun 12, 2014
diorahman pushed a commit to rockybars/node-mapnik that referenced this issue Jun 12, 2014
diorahman pushed a commit to rockybars/node-mapnik that referenced this issue Jun 12, 2014
@springmeyer
Copy link
Member Author

solved and should no longer happen in [email protected]. Solution was a combination of http://trac.osgeo.org/gdal/ticket/5509#comment:11 (which requires GDAL >= 2.0 since it is not backported to GDAL 1.11.x) and starting to bundle a libgdal shared library instead of statically linking libgdal.a.

@GretaCB
Copy link
Contributor

GretaCB commented Jun 13, 2014

Woooo! Nice @springmeyer !!!

@springmeyer
Copy link
Member Author

re-opening, still seeing problems

@springmeyer springmeyer reopened this Jul 3, 2014
@springmeyer
Copy link
Member Author

latest crash is when using KML: https://gist.github.com/springmeyer/9aa01910103230209950

@springmeyer
Copy link
Member Author

closing, this turns out to be a tilelive-bridge / mocha forcibly exiting bug: mapbox/tilelive-bridge#19

@springmeyer
Copy link
Member Author

Noting that the resolution to this problem was to move to packaging libgdal as a dynamic library. However the move to mason (#738) moved back to a static library for gdal.

@springmeyer
Copy link
Member Author

Also noting that #328 has more info on related issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants