UNIT_agent_simulation_builder_test failed in gcc but not with clang. #751

agalbachicar · 2021-02-10T16:37:56Z

Found the error in CI --> maliput/maliput_infrastructure#169

The extract of the log follows:

build/delphyne/Testing/20210210-1401/Test.xml: 89 tests, 0 errors, 1 failure, 0 skipped
- UNIT_agent_simulation_builder_test
  <<< failure message
    -- run_test.py: invoking following command in '/__w/dsim-repos-index/dsim-repos-index/maliput_ws/src/delphyne':
     - /__w/dsim-repos-index/dsim-repos-index/maliput_ws/build/delphyne/test/regression/cpp/UNIT_agent_simulation_builder_test --gtest_output=xml:/__w/dsim-repos-index/dsim-repos-index/maliput_ws/build/delphyne/test_results/delphyne/UNIT_agent_simulation_builder_test.gtest.xml
    Running main() from /opt/ros/dashing/src/gtest_vendor/src/gtest_main.cc
    [==========] Running 14 tests from 1 test case.
    [----------] Global test environment set-up.
    [----------] 14 tests from AgentSimulationTest
    [ RUN      ] AgentSimulationTest.TestGetVisualScene
    [NON-XML-CHAR-0x1B][1;33mWarning [Model.cc:210][NON-XML-CHAR-0x1B][0m Non-unique names detected in XML children of model with name[prius_1].
    [NON-XML-CHAR-0x1B][1;33mWarning [Model.cc:261][NON-XML-CHAR-0x1B][0m Joint with name [body] in model with name [prius_1] has a name collision, changing joint name to [body_joint].
    [       OK ] AgentSimulationTest.TestGetVisualScene (167 ms)
    [ RUN      ] AgentSimulationTest.BasicTest
    [NON-XML-CHAR-0x1B][1;33mWarning [Model.cc:210][NON-XML-CHAR-0x1B][0m Non-unique names detected in XML children of model with name[prius_1].
    [NON-XML-CHAR-0x1B][1;33mWarning [Model.cc:261][NON-XML-CHAR-0x1B][0m Joint with name [body] in model with name [prius_1] has a name collision, changing joint name to [body_joint].
    [NON-XML-CHAR-0x1B][1;33mWarning [Model.cc:210][NON-XML-CHAR-0x1B][0m Non-unique names detected in XML children of model with name[prius_1].
    [NON-XML-CHAR-0x1B][1;33mWarning [Model.cc:261][NON-XML-CHAR-0x1B][0m Joint with name [body] in model with name [prius_1] has a name collision, changing joint name to [body_joint].
    [       OK ] AgentSimulationTest.BasicTest (157 ms)
    [ RUN      ] AgentSimulationTest.TestPriusSimpleCar
    [NON-XML-CHAR-0x1B][1;33mWarning [Model.cc:210][NON-XML-CHAR-0x1B][0m Non-unique names detected in XML children of model with name[prius_1].
    [NON-XML-CHAR-0x1B][1;33mWarning [Model.cc:261][NON-XML-CHAR-0x1B][0m Joint with name [body] in model with name [prius_1] has a name collision, changing joint name to [body_joint].
    [       OK ] AgentSimulationTest.TestPriusSimpleCar (186 ms)
    [ RUN      ] AgentSimulationTest.TestPriusUnicycleCar
    [NON-XML-CHAR-0x1B][1;33mWarning [Model.cc:210][NON-XML-CHAR-0x1B][0m Non-unique names detected in XML children of model with name[prius_1].
    [NON-XML-CHAR-0x1B][1;33mWarning [Model.cc:261][NON-XML-CHAR-0x1B][0m Joint with name [body] in model with name [prius_1] has a name collision, changing joint name to [body_joint].
    -- run_test.py: return code -11
    -- run_test.py: generate result file '/__w/dsim-repos-index/dsim-repos-index/maliput_ws/build/delphyne/test_results/delphyne/UNIT_agent_simulation_builder_test.gtest.xml' with failed test
    -- run_test.py: verify result file '/__w/dsim-repos-index/dsim-repos-index/maliput_ws/build/delphyne/test_results/delphyne/UNIT_agent_simulation_builder_test.gtest.xml'
  >>>
build/delphyne/test_results/delphyne/UNIT_agent_simulation_builder_test.gtest.xml: 1 test, 0 errors, 1 failure, 0 skipped
- delphyne UNIT_agent_simulation_builder_test.gtest.missing_result
  <<< failure message
    The test did not generate a result file:
    
    Running main() from /opt/ros/dashing/src/gtest_vendor/src/gtest_main.cc
    [==========] Running 14 tests from 1 test case.
    [----------] Global test environment set-up.
    [----------] 14 tests from AgentSimulationTest
    [ RUN      ] AgentSimulationTest.TestGetVisualScene
    ?[1;33mWarning [Model.cc:210]?[0m Non-unique names detected in XML children of model with name[prius_1].
    ?[1;33mWarning [Model.cc:261]?[0m Joint with name [body] in model with name [prius_1] has a name collision, changing joint name to [body_joint].
    [       OK ] AgentSimulationTest.TestGetVisualScene (167 ms)
    [ RUN      ] AgentSimulationTest.BasicTest
    ?[1;33mWarning [Model.cc:210]?[0m Non-unique names detected in XML children of model with name[prius_1].
    ?[1;33mWarning [Model.cc:261]?[0m Joint with name [body] in model with name [prius_1] has a name collision, changing joint name to [body_joint].
    ?[1;33mWarning [Model.cc:210]?[0m Non-unique names detected in XML children of model with name[prius_1].
    ?[1;33mWarning [Model.cc:261]?[0m Joint with name [body] in model with name [prius_1] has a name collision, changing joint name to [body_joint].
    [       OK ] AgentSimulationTest.BasicTest (157 ms)
    [ RUN      ] AgentSimulationTest.TestPriusSimpleCar
    ?[1;33mWarning [Model.cc:210]?[0m Non-unique names detected in XML children of model with name[prius_1].
    ?[1;33mWarning [Model.cc:261]?[0m Joint with name [body] in model with name [prius_1] has a name collision, changing joint name to [body_joint].
    [       OK ] AgentSimulationTest.TestPriusSimpleCar (186 ms)
    [ RUN      ] AgentSimulationTest.TestPriusUnicycleCar
    ?[1;33mWarning [Model.cc:210]?[0m Non-unique names detected in XML children of model with name[prius_1].
    ?[1;33mWarning [Model.cc:261]?[0m Joint with name [body] in model with name [prius_1] has a name collision, changing joint name to [body_joint].
  >>>

It is strange but the test passes with clang.

francocipollone · 2021-02-24T21:18:24Z

This test seems to be failing not because of the implementation of the tests in the agent_simulation_builder_test.cc file but because of some bad parsing in the message that prints the warnings.(it is printed by a third library )

I couldn't reproduce this error and at the moment we run out of minutes in CI so I couldn't get to the bottom of the issue however I believe that if we resolve those warnings this error won't be shown anymore and to solve that we only have to change one line in the simple_prius.sdf file. (Check change)

WDYT @agalbachicar if I add that minor modification and we check later on if it is needed to continue debugging or not?

agalbachicar · 2021-02-25T12:08:56Z

WDYT @agalbachicar if I add that minor modification and we check later on if it is needed to continue debugging or not?

The change seems good to me. I doubt that is the only reason why the test failed. Let's start with that.

francocipollone · 2021-02-25T13:46:22Z

WDYT @agalbachicar if I add that minor modification and we check later on if it is needed to continue debugging or not?

The change seems good to me. I doubt that is the only reason why the test failed. Let's start with that.

Yeah, for what the issue description says technically the return code of the run_test.py command is -11 which in unix mean SEGFAULT. At the moment I couldn't reproduce it locally. I am still trying though.

francocipollone · 2021-02-25T22:03:03Z

I was barely able to reproduce it, just a few times. (reducing the cpu usage of the container and doing a bit of magic).

I can tell two things so far:

The body of the test is correctly executed. I've added prints at the end of the test and there is no problems up to there.
run_test.py (from /opt/ros/dashing/share/ament_cmake_test/cmake): I added some prints there just to check if it is an issue located in this script when executing the test. But no, the return code -11 is returned by the test(gtest)

So I should investigate a bit and understand why the gtest tool is returning -11 when test is finished with no fails.
An error when destroying the objects after the test ends?. Odd.

Extra:
I've checked the test_result (build/delphyne/test_results/delphyne/UNIT_agent_simulation_builder_test.gtest.xml) after having the error and it shows:

<?xml version="1.0" encoding="UTF-8"?>
<testsuite name="delphyne" tests="1" failures="1" time="0" errors="0" skipped="0">
  <testcase classname="delphyne" name="UNIT_agent_simulation_builder_test.gtest.missing_result" time="0">
    <failure message="The test did not generate a result file:&#10;&#10;Running main() from /opt/ros/dashing/src/gtest_vendor/src/gtest_main.cc&#10;[==========] Running 5 tests from 1 test case.&#10;[----------] Global test environment set-up.&#10;[----------] 5 tests from AgentSimulationTest&#10;[ RUN      ] AgentSimulationTest.TestGetVisualScene&#10;[       OK ] AgentSimulationTest.TestGetVisualScene (799 ms)&#10;[ RUN      ] AgentSimulationTest.BasicTest&#10;[       OK ] AgentSimulationTest.BasicTest (1301 ms)&#10;[ RUN      ] AgentSimulationTest.TestPriusSimpleCar&#10;-1&#10;0&#10;1&#10;2&#10;3&#10;4&#10;5&#10;6&#10;7&#10;8&#10;9&#10;10&#10;11&#10;12&#10;13&#10;14&#10;15&#10;16&#10;17&#10;18&#10;19&#10;20&#10;21&#10;22&#10;23&#10;24&#10;"/>
  </testcase>
</testsuite>

Which is in essence is the same failure message shown in the issue's description.

agalbachicar · 2021-02-26T12:08:31Z

I was barely able to reproduce it, just a few times. (reducing the cpu usage of the container and doing a bit of magic).

Consider commenting out tests to do a one vs all evaluation and find the test that has the segfault. Another alternative is to split tests across multiple buildable units.

francocipollone · 2021-02-26T21:37:42Z

I was able to reproduce the error by doing the following:

I am running 3 containers at the same time.
Each container was run with the --cpus 0.1 command in order to use 10% of only one cpu.
The test is executed in a loop 100 times.
Just for not wasting time in other tests I commented out other tests and focused it on TestPriusSimpleCar and TestPriusUnicycleCar which they both had presented the aforementioned error.

By doing this I was able to enforce the error between 10 and 20 times (out of 300 runs.)

Once I could reproduce it in a "controlled" way I commented out the lines related to ignition
In particular I removed

test::IgnMonitor<ignition::msgs::AgentState_V> ign_monitor(kStateTopicName);

and all the lines related to the ign messages.
---> No error was thrown.

The next test I did was to just left the creation of the ignMonitor object:

test::IgnMonitor<ignition::msgs::AgentState_V> ign_monitor(kStateTopicName);

in the TestPriusSimpleCar test.
---> run_test.py: return code -11

I guess that when the ign_montior object's resources are meant to be deallocated the error comes up.

So now the problem is way more narrowed down than before.

agalbachicar · 2021-02-26T21:46:32Z

Excellent investigation @francocipollone !

francocipollone · 2021-03-02T12:40:22Z

Update:

Context: IgnMonitor class subscribes to a topic and attaches a callback method to count and save every new message.
The problem comes up when the resources are being deallocated,
It happens that, IgnMonitor's object is destroyed and after this, the callback method is called which leads to a SEGFAULT.

I tried destroying the IgnMonitor object at the end of the execution and adding a node_.Unsubscribe() in the destructor method
but the results are the same, the callback is still being called after the class is destroyed(and the node is unsubscribed)

I'll continue working on a solution

francocipollone · 2021-03-05T19:32:46Z

Solved. Further discussion if any should continue in #760

agalbachicar mentioned this issue Feb 10, 2021

Removes occurrences of libqt5multimedia because it is properly handled by rosdep maliput/maliput_infrastructure#169

Merged

francocipollone mentioned this issue Feb 25, 2021

Fixes joint name in simple_prius.sdf file. #755

Closed

This was referenced Mar 4, 2021

Fixes IgnMonitor bug. #759

Merged

Restore IgnMonitor behavior after bug in Ignition::transport is fixed #760

Open

Flaky ign_publisher_system_test #713

Closed

francocipollone closed this as completed Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UNIT_agent_simulation_builder_test failed in gcc but not with clang. #751

UNIT_agent_simulation_builder_test failed in gcc but not with clang. #751

agalbachicar commented Feb 10, 2021 •

edited

Loading

francocipollone commented Feb 24, 2021 •

edited

Loading

agalbachicar commented Feb 25, 2021

francocipollone commented Feb 25, 2021

francocipollone commented Feb 25, 2021

agalbachicar commented Feb 26, 2021

francocipollone commented Feb 26, 2021 •

edited

Loading

agalbachicar commented Feb 26, 2021

francocipollone commented Mar 2, 2021 •

edited

Loading

francocipollone commented Mar 5, 2021

UNIT_agent_simulation_builder_test failed in gcc but not with clang. #751

UNIT_agent_simulation_builder_test failed in gcc but not with clang. #751

Comments

agalbachicar commented Feb 10, 2021 • edited Loading

francocipollone commented Feb 24, 2021 • edited Loading

agalbachicar commented Feb 25, 2021

francocipollone commented Feb 25, 2021

francocipollone commented Feb 25, 2021

agalbachicar commented Feb 26, 2021

francocipollone commented Feb 26, 2021 • edited Loading

agalbachicar commented Feb 26, 2021

francocipollone commented Mar 2, 2021 • edited Loading

francocipollone commented Mar 5, 2021

agalbachicar commented Feb 10, 2021 •

edited

Loading

francocipollone commented Feb 24, 2021 •

edited

Loading

francocipollone commented Feb 26, 2021 •

edited

Loading

francocipollone commented Mar 2, 2021 •

edited

Loading