Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple worlds #18

Open
osrf-migration opened this issue Mar 19, 2019 · 15 comments
Open

Support for multiple worlds #18

osrf-migration opened this issue Mar 19, 2019 · 15 comments
Labels
bug Something isn't working

Comments

@osrf-migration
Copy link

Original report (archived issue) by Louise Poubel (Bitbucket: chapulina, GitHub: chapulina).


Description

Since the very beginning we've had support for multiple <world> tags within a single SDF file. The current behaviour is that a simulation runner is started for each of them. However, as the code evolved, that use case became less well supported.

Steps to Reproduce

At the moment, the following crashes with an ODE assertion:

ign-gazebo -f test/worlds/multiple_worlds.sdf

ODE INTERNAL ERROR 1: assertion "!colliders_initialized" failed in dInitColliders() [collision_kernel.cpp:168]

ODE INTERNAL ERROR 1: assertion "!colliders_initialized" failed in dInitColliders() [collision_kernel.cpp:168]

Expected behavior:

At a minimum, no crash. At a maximum, multiple worlds spun up in parallel and running in lockstep, the UI displaying all of them side-by-side.

Actual behavior:

Server crashes, UI freezes.

Reproduces how often:

100% of the time

Versions

ign-gazebo1

Additional Information

I'm not sure if it's best at the moment to:

  • print a warning and only load the first world
  • catch the physics problem before the crash and only disable the bit that is having trouble running in the same process (some singleton within ODE?)
  • leave it as it is but discourage people from trying to use it. It should be possible to avoid the crash, like using a different physics engine or even no physics.
@osrf-migration
Copy link
Author

Original comment by Michael Grey (Bitbucket: mxgrey, GitHub: mxgrey).


This might be something that needs to be addressed in the ignition-physics-dartsim plugin.

Is this happening due to multi-threading, or will it still happen even with only a single thread? If it's a multi-threading race condition issue, then maybe we just need to put a mutex in a sensible place in the ignition-physics-dartsim plugin.

@osrf-migration
Copy link
Author

Original comment by Addisu Z. Taddese (Bitbucket: azeey, GitHub: azeey).


I believe this is a race condition on global variables in ODE. I was experimenting with a solution here (914d44a), but ignition-physics-dartsim is probably a more appropriate place to put the mutex.

@osrf-migration
Copy link
Author

Original comment by Louise Poubel (Bitbucket: chapulina, GitHub: chapulina).


By the way, the same problem happens if we try to spin up 2 secondary servers running physics simulation in the same test, to test distributed simulation.

@osrf-migration
Copy link
Author

Original comment by Diego Ferigo (Bitbucket: dgferigo).


Is there any update on this problem? I started testing multi-world simulations and I stumbled upon this problem right away.

If this feature is not yet completed, it would be nice at least having a warning printed when the world file is loaded and multiple worlds are found.

@osrf-migration
Copy link
Author

Original comment by Diego Ferigo (Bitbucket: dgferigo).


Thanks @azeey for the fix you proposed. I know it’s a hacky workaround, though it works reliably. Do you see any better solution that does not involve static mutexes?

@osrf-migration
Copy link
Author

Original comment by Diego Ferigo (Bitbucket: dgferigo).


Quick update. With fixed-base simple models like pendulum and cartpole the mutex workaround is effective as I reported in my last comment.

However, with complex floating-base models like a humanoid robot with meshes both for visual and collisions, a multiworld setup still segfaults.

I managed to “isolate” the problem, if we can say it. In PhysicsPrivate::Step, locking the same mutex in this scope prevents the segfault. Though, this is not a solution since it means that the real step of the world is not concurrently executed by the threads that handle the different worlds (one for each SimulationRunner). In other words, this mutex prevents stepping the worlds in parallel. I think that parallel execution within a single process is the main feature that triggers downstream users to use a multiworld setup.

Here below the stack trace (text here):


@osrf-migration
Copy link
Author

Original comment by Addisu Z. Taddese (Bitbucket: azeey, GitHub: azeey).


Looks like there is a global cache used by the collision detector in ODE. One thing to try would be to build ode with --enable-ou,

@osrf-migration
Copy link
Author

Original comment by Diego Ferigo (Bitbucket: dgferigo).


Thanks for the suggestion! From my side recompiling ODE would mean that all the chain ODE → DART → ign-physics should be compiled from sources. In this moment I doubt I can dedicate enough time to set everything up and start debugging the problem (also considering that ODE code is not very straighforward to read) 😕

@traversaro
Copy link
Contributor

traversaro commented Apr 19, 2024

Original comment by Addisu Z. Taddese (Bitbucket: azeey, GitHub: azeey).

Looks like there is a global cache used by the collision detector in ODE. One thing to try would be to build ode with --enable-ou,

We recently experienced this again with @xela-95 in robotology/gz-sim-yarp-plugins#153 . A workaround is to change DART's collision detector not to use ODE . At least on conda-forge, it is relatively easy to set ENABLE_OU option in CMake, I checked and it does not affect the public ABI of ode, while on apt is more complex and can't be done for stable distros. However, I am not sure if this is helpful in the case two servers are stepped by the same thread (unless a new thread is always created by the server, in that case indeed that would solve the problem).

@traversaro
Copy link
Contributor

traversaro commented Apr 19, 2024

One thing that we could think of at some point in the future (clearly not in an already released version of Gazebo) is to switch the default collision detector in DART from ODE to something else, I am not sure if ODE was chosen just for historical reasons or indeed the is the best performing collision detector for Gazebo's needs.

@azeey
Copy link
Contributor

azeey commented Apr 19, 2024

I am not sure if ODE was chosen just for historical reasons or indeed the is the best performing collision detector for Gazebo's needs.

If I remember correctly, it was chosen because it worked much better than DART or FCL as collision detectors. I remember FCL converted primitive shapes to meshes, which caused unexpected behaviors. And I think DART as a collision engine is missing some features. It's been a while though, so it might be worth revisiting. Also, bullet might be a viable option, but upstream dartsim needs some changes to make mesh-to-mesh collisions work.

@traversaro
Copy link
Contributor

remember FCL converted primitive shapes to meshes, which caused unexpected behaviors.

That seems to be still the case, see the corresponding DART issue: dartsim/dart#19 (that seems to be blocked by something, but I am not sure which FCL issue, perhaps flexible-collision-library/fcl#106 ? ).

@traversaro
Copy link
Contributor

traversaro commented Apr 19, 2024

Yes, indeed looking at bit at issues it seems that each collision detector has some issues, so there is no clearly better DART collision detector, xref: #1306 . Also in https://github.com/resibots/robot_dart it seems that the collision detector used changes from example to example.

@traversaro
Copy link
Contributor

traversaro commented Aug 21, 2024

I investigate a bit the problem as it occurs in robotology/gz-sim-yarp-plugins#201 for some specific test. I was able to fix the crash with error:

ODE INTERNAL ERROR 1: assertion "!colliders_initialized" failed in dInitColliders() [collision_kernel.cpp:168]

with the following DART patch:

diff --git a/dart/collision/ode/OdeCollisionDetector.cpp b/dart/collision/ode/OdeCollisionDetector.cpp
index abfd232..ca45ccc 100644
--- a/dart/collision/ode/OdeCollisionDetector.cpp
+++ b/dart/collision/ode/OdeCollisionDetector.cpp
@@ -49,6 +49,8 @@
 #include "dart/dynamics/SoftMeshShape.hpp"
 #include "dart/dynamics/SphereShape.hpp"

+#include <iostream>
+
 namespace dart {
 namespace collision {

@@ -216,9 +218,14 @@ double OdeCollisionDetector::distance(
 OdeCollisionDetector::OdeCollisionDetector()
 {
   // Initialize ODE. dInitODE is deprecated.
-  const auto initialized = dInitODE2(0);
-  assert(initialized);
-  DART_UNUSED(initialized);
+  static std::mutex odeInitMutex;
+
+  {
+    std::unique_lock<std::mutex> lock(odeInitMutex);
+    const auto initialized = dInitODE2(0);
+    assert(initialized);
+    DART_UNUSED(initialized);
+  }

   dAllocateODEDataForThread(dAllocateMaskAll);

that is probably similar to the gz-sim solution proposed by @azeey in 914d44a . This solves the test failures, but those tests use simple models, while apparently this solution still have problems for complex model (like humanoid robots). Probably we can try to extend robotology/gz-sim-yarp-plugins#201 to also include more complex models.

fyi @xela-95

@traversaro
Copy link
Contributor

I noticed something unfortunate related to:

ODE INTERNAL ERROR 1: assertion "!colliders_initialized" failed in dInitColliders() [collision_kernel.cpp:168]

flaky errors. I tried to work around it by setting another collision manager, but sometimes the test continue to fail. I then realized that the problem is that even if you set another collider, for each world an instance of OdeCollisionDetector is anyway created in ConstructEmptyWorld (see https://github.com/gazebosim/gz-physics/blob/a068d3f730b8eb2886eff72a8828930489789bc9/dartsim/src/EntityManagementFeatures.cc#L728), that is called by ConstructSDFWorld (https://github.com/gazebosim/gz-physics/blob/a068d3f730b8eb2886eff72a8828930489789bc9/dartsim/src/SDFFeatures.cc#L492). The collision detector is then overwritten in https://github.com/gazebosim/gz-physics/blob/a068d3f730b8eb2886eff72a8828930489789bc9/dartsim/src/WorldFeatures.cc#L41-L74, but this is too late to avoid that dInitODE2 is called two times in two different threads, resulting in the race condition that can result in:

ODE INTERNAL ERROR 1: assertion "!colliders_initialized" failed in dInitColliders() [collision_kernel.cpp:168]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants