Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible crash on async explicit dispose in 1.4.0 #244

Open
ewaza opened this issue Sep 20, 2021 · 3 comments
Open

Possible crash on async explicit dispose in 1.4.0 #244

ewaza opened this issue Sep 20, 2021 · 3 comments

Comments

@ewaza
Copy link

ewaza commented Sep 20, 2021

Hi all,

I've ran into an issue on a production system after upgrading to version 1.4.0 from 1.3.0. I haven't been able to exactly pinpoint the cause, because there seems to be lot of changes in the reference checking code.

I do have an example to reproduce the scenario. gst1-java-core-crash-issue-example.zip. When running the Junit test it will crash sooner or later with: GLib-GObject:ERROR:../../../gobject/gobject.c:3208:toggle_refs_notify: assertion failed: (tstack.n_toggle_refs == 1

I've tested on multiple systems with the same result. When using version 1.3.0 the test runs without any problems till the end.

@neilcsmith-net
Copy link
Member

neilcsmith-net commented Sep 21, 2021

Thanks for the report. Firstly, the provided example is a little broken - the ordering of the two tasks is not guaranteed.

However, there is a potential problem. If you move the explicit disposal into Gst::invokeLater then it might be OK - seems to be OK here. We do strongly recommend using the Gst executor for all interaction with the library - massive numbers of threads can definitely be an issue. Although this issue might appear with just one - I think it's caused by explicit disposal while a native object is active on another thread. It's possible we might be able to provide a bit more safety here.

The fixes between 1.3.0 and 1.4.0 are probably the trigger but not the cause - the old implementation was much more dependent on the garbage collector to clear up native references behind the scenes, particularly with messages.

Do you still need explicit disposal? Do you actually need to create so many pipelines in your production system?

@ewaza
Copy link
Author

ewaza commented Oct 4, 2021

Thank you very much for your extensive response and sorry for my late reply.

You're absolutely right about my example being a bit broken. The discardPipeline.drainPermits() call should happen outside of the tasks and before scheduling them. The system I tried to mimic actually does this right. Also fixing my example did not change the behavior.

However as you mentioned, stopping to use explicit disposals or using them with Gst::invokeLater seems to prevent crashes. So I will definitely see if this could work reliably on our system.

Regarding the number of pipelines. Our production system has parallel real-time requirements which make having a number of concurrent active pipelines unavoidable. However the number is way lower than in my example. That is why we experienced the issue maybe once every week.

Instead of doing a create/dispose for each session I could use a (couple of) pool(s) of pipelines which will be recycled. Is this something you recommend for more stability? In the long run we should probably split the work over multiple OS processes and do some recycling there.

@neilcsmith-net
Copy link
Member

Another potential workaround - at least your test passes with it here - is to use

pipeline.getBus().setSyncHandler(msg -> BusSyncReply.PASS);

That causes the Message objects to be created eagerly on the streaming thread again - basically 1.3.0 behaviour. This causes a lots of messages to hold references until the GC runs though, which is why the changes were made. Allowing the GC to clean up the Pipeline instead if that works is still probably better.

Regarding the number of pipelines. Our production system has parallel real-time requirements which make having a number of concurrent active pipelines unavoidable. ... Instead of doing a create/dispose for each session I could use a (couple of) pool(s) of pipelines which will be recycled. Is this something you recommend for more stability?

Multiple concurrent pipelines shouldn't be a problem. I tried a version of your test that uses the Gst executor to configure, start and dispose all pipelines and that seemed OK. Remember that all the pipelines have their own threads anyway, so you don't need to parallelize the setup and control steps. I doubt pooling is worth the hassle, but really depends on the system.

Incidentally, I can offer support via https://www.codelerity.com in terms of looking at specifics of client systems if that's of interest in future. Without going in depth, help is always going to be somewhat generic.

I'll keep this issue open for now - trying to think if there's a way we can stop the crash from happening while keeping lazy creation of the native wrappers for message and source element.

@neilcsmith-net neilcsmith-net changed the title upgrade to version 1.4.0 causes seemingly random crashes with minimal reproducible example. Possible crash on async explicit dispose in 1.4.0 Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants