Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce MaybeSharedPtr for managing user facing objects #514

Merged
merged 20 commits into from
Dec 4, 2023

Conversation

tmadlener
Copy link
Collaborator

@tmadlener tmadlener commented Nov 10, 2023

BEGINRELEASENOTES

  • Introduce the MaybeSharedPtr to manage the Obj* in the user facing handle classes.
  • Remove the ObjBase base class and make the ObjectID a member of the Obj classes.
  • Make the user facing handle class constructors from an Obj* private as users will not have access to raw Obj* in any case.
    • Introduce a static makeEmpty method for the generated classes in order to create an empty handle, which is also used internally to handle unpersisted relations.
  • Enable more existing test cases in sanitizer workflows now that it has become possible to do so.

ENDRELEASENOTES

This is a resurrected version of #220 with some minor modifications. I will summarize things again here, so that things are in one place and easier to follow. Similar to that one this should address the heap-use-after-free issues

The main thing this PR introduces is the MaybeSharedPtr, a smart pointer, that can relinquish ownership of the managed object while still being safely destructible even if the originally managed object has been destroyed. It achieves this by having a simple control block that tracks the lifetime of the MaybeSharedPtr and optionally that of the managed object. Ownership of the managed pointer can only move from the MaybeSharedPtr to something else, but never the other way around. It avoids the heap-use-after-free issues by keeping the control block alive until all references to it are destroyed. Previously this control block lived inside the managed object, so that the potentially separate lifetimes caused issues.

In order to keep backwards compatibility it is used for all user facing objects. However, the internal control block will only be instantiated and initialized if the user facing object is created "freestanding", i.e. not obtained from a collection via create or via one of the accessor methods. Even though the control block is unused in these cases, it still occupies space and all user facing handles have now double the previous size, i.e. 2 pointers. This should still be small enough to allow for simple pass-by-value function calls efficiently. Overall the behavior is the same as previous but the heap-use-after-free issues should be gone, since the control block will always live long enough to be safely destructed. The documentation has been updated to clearly state the expectations on the validity of handles that have been added to a collection. This is purely a documentation change, nothing in the behavior has change in this respect.

As a small benefit the public constructors starting from an Obj* have been removed, since they are not usable in any case as there is no (easy) way in which users could obtain such a pointer. Instead a new makeEmpty function is introduced to make it possible to create "empty handles" for handling unpersisted relations (without having to expose a public constructor from an Obj*). These object will always return false to a call to isAvailable().

As a smaller detail this also removes the podio::ObjBase and moves the ObjectID into the Obj as an additional member.

Using makeEmpty to detect uninitialized handles

In some cases it makes sense to declare a default constructed immutable object (handle) and then assign to that in the course of an algorithm, e.g.:

MCParticle particle;
if (condition) { 
  particle = /* result of e.g. some lookup */
}

In case condition is never true, particle will contain a default initialized handle (i.e. most likely 0s only), which may be entirely useless. However, without explicitly checking whether it is default initialized it is impossible to tell whether this is in a useful state or not. It is possible to use the makeEmpty function to initialize this handle to something that will crash if used without reassigning via:

auto particle = MCParticle::makeEmpty();
if (condition) {
  particle = /* result of e.g. some lookup */
}
if (particle.isAvailable()) {
  // particle has been assigned a proper value
}

Now, if condition is never true, accessing particle will crash at runtime. However, it is also possible to check whether an assignment has happened by calling isAvailable() first.

  • Release notes
  • Documentation

@tmadlener
Copy link
Collaborator Author

@veprbl does this look like something that could work for EIC / Jana2? I have added the minimal reprodcuer from the original issue as a test-case that previously triggered under builds with AddressSanitizer enabled, but now no longer does.

Do the list of potential issues so far look like something that would make it hard / impossible for you to adapt this?

@tmadlener
Copy link
Collaborator Author

This will break quite a few things in CEPCSW, mostly around maps or some lookup algorithms. In at least a few cases it's not entirely clear whether things will still work if objects are initialized as empty, or whether something downstream expects at least a usable object. Potentially we should consider introducing a makeDefault factory method as well that returns a (default) initialized object, something like

static makeDefault() {
  const static auto defaultObj = Obj{};
  return {defaultObj};
}

@veprbl
Copy link
Contributor

veprbl commented Nov 18, 2023

I've tried adapting our code to this branch, and the resulting diff is quite small, however, the makeEmpty() doesn't allow it to be forward compatible. And, I found that even though std::map::operator[], like you say, doesn't compile, the at() method still works. I do get a random crash in runtime:

       `- JSignalHandler::handle_sigsegv(int, siginfo_t*, void*) (0x7ffff7f6b7a1)
        `- libc.so.6 (0x7ffff563dbf0)
         `- edm4eic::Cluster::getHits(unsigned long) const (0x7ffff4f144cf)
          `- eicrecon::ImagingClusterReco::fit_track(std::vector<edm4eic::Cluster, std::allocator<edm4eic::Cluster> > const&) const (0x7ffff3223571)

Haven't debugged it, but our access pattern looks quite reasonable:
https://github.com/eic/EICrecon/blob/fdc79ac42bf27b9000e2a9c752b5b9cfc6a85ba4/src/algorithms/calorimetry/ImagingClusterReco.h#L271

@tmadlener
Copy link
Collaborator Author

Thanks for checking this.

It looks a bit like your crash is because your Cluster comes from a makeEmpty and hasn't been assigned a "proper" cluster yet. I am mainly guessing this from the stacktrace and the fact, that Cluster::getHits(unsigned). It is generated from these bits here:

{{ relation.full_type }} {{ class_type }}::{{ relation.getter_name(get_syntax) }}(std::size_t index) const {
if ({{ relation.name }}_size() > index) {
return m_obj->m_{{ relation.name }}->at(m_obj->data.{{ relation.name }}_begin + index);
}
throw std::out_of_range("index out of bounds for existing references");
}

Effectively the implementation will try to access the internal m_obj which is a nullptr if initialized via makeEmpty and will hence crash.

I am not sure what is the best solution in this case here. We could also bring back the default constructor. That would fix this, and other, crashes. On the other hand it might hide subtle issues where people access objects that are default initialized, expecting something more useful.

I think we should schedule a brief meeting to discuss which approach works best for you in this case. That might be quicker and easier to do than over github issues.

@tmadlener tmadlener force-pushed the double-free-frame-repro branch 2 times, most recently from 3291496 to a4a9032 Compare November 21, 2023 10:24
@tmadlener
Copy link
Collaborator Author

I have now also made the default handles ref-counted / managed via a MaybeSharedPtr (as also suggested in #492). This doubles their size to two pointers. However, if they are in any form obtained from a collection the control block will not be initialized (it will remain nullptr). This means that almost all checks for the control block will be a simple check against nullptr. The main reasons for doing this are:

  • Keep the existing behavior (but remove the use-after-free)
  • Make dangling references a bit harder to create

This will now work as expected (this has become a test case):

std::vector<ExampleHit> hits;
{
  auto hit = MutableExampleHit{};
  hits.push_back(hit);
}
auto e = hits[0].energy();

The only thing that is still creating dangling references is something along the lines of

auto hit = ExampleHit::makeEmpty();
{
  auto hits = ExampleHitCollection();
  auto mHit = hits.create();
  hit = mHit;
}
auto e = hit.energy(); // <--- heap-use-after-free

However, since this is pre-existing behavior (although not documented well enough probably), I would argue that this is OK.

@tmadlener tmadlener changed the title [WIP] Better memory model for user facing objects Introduce MaybeSharedPtr for managing user facing objects Nov 21, 2023
@tmadlener tmadlener force-pushed the double-free-frame-repro branch from f1908d3 to b935bbc Compare November 21, 2023 13:32
@tmadlener tmadlener requested a review from hegner November 21, 2023 13:32
@tmadlener
Copy link
Collaborator Author

I also brought back the pre-existing constructors for the default, immutable handles. With this in place this PR should now be completely transparent in fixing the reported use-after-free issues in destructors. Any other use-after-free from using "dangling references" are still there. These are not easily fixable in the current model, but fixing that requires some really breaking changes. I have, however, tried to at least document that these dangling references / invalid handles can happen if a handle outlives its collection.

tmadlener added a commit to tmadlener/K4FWCore that referenced this pull request Nov 21, 2023
tmadlener added a commit to tmadlener/K4FWCore that referenced this pull request Nov 21, 2023
tmadlener added a commit to key4hep/k4FWCore that referenced this pull request Nov 21, 2023
tmadlener added a commit to tmadlener/CEPCSW that referenced this pull request Nov 22, 2023
That has been removed in AIDASoft/podio#514 and the makeEmpty factory
method has been introduced instead.
python/templates/MutableObject.h.jinja2 Outdated Show resolved Hide resolved
python/templates/Obj.cc.jinja2 Outdated Show resolved Hide resolved
@tmadlener
Copy link
Collaborator Author

I did some brief benchmarking using the k4EDM4hep2LcioConv converter as that seemed like an easy and reasonable choice as it handles creation of new data as well as reading them back (and touching all members / relations) in the tests. I did not measure any real difference there. So I think from that point of view this approach is fine.

@veprbl this should now be fully backward compatible if you don't have handle constructors from Obj* (switching to makeEmpty would be the solution there). Apologies for the intermediate confusion with the removed handle constructors and more breakage. If you have any easy to run benchmarks / profiling, I would also be interested in whether you see the same (unchanged) behavior as I did.

@tmadlener
Copy link
Collaborator Author

I also just realized that this fixes #431 because it introduces a dedicated overload for Collection::push_back for Mutable types. Hence, there is no longer a need for an implicit conversion on the python side.

@veprbl
Copy link
Contributor

veprbl commented Nov 30, 2023

Hi @tmadlener. I've run some reconstruction benchmarks with EICrecon, and I see no difference in timing with PODIO from this branch vs the release version.

@tmadlener
Copy link
Collaborator Author

Thanks a lot for testing. Also good that you see similar behavior than me. In that case I would vote for merging this and revisit this in case performance (ever) becomes an issue around this.

@tmadlener tmadlener force-pushed the double-free-frame-repro branch from 4dce8f8 to 1823684 Compare December 1, 2023 18:36
- Mutable objects either start reference counted or managed by a
collection depending on how they are created.
- Default objects can only be created by collections and are hence
always managed by the collection.
- Make constructors from Obj* (or MaybeSharedPtr<Obj>) private and
introduce a public static Object::makeEmpty function to create empty
handles that are necessary to model relations that have not been
persisted.
Since the default classes have no really usable public constructor any
longer, exposing the Mutable classes for python makes more sense.
@tmadlener tmadlener force-pushed the double-free-frame-repro branch from 1823684 to 6ce119c Compare December 4, 2023 13:32
@tmadlener tmadlener merged commit e5a13c2 into AIDASoft:master Dec 4, 2023
17 checks passed
@tmadlener tmadlener deleted the double-free-frame-repro branch December 4, 2023 14:08
wdconinc added a commit to wdconinc/JANA2 that referenced this pull request Feb 23, 2024
Due to AIDASoft/podio#564 we get compilation errors:
```
  >> 322    /home/wdconinc/git/JANA2/src/examples/PodioExample/PodioExample.cc:46:51: error: call of overloaded 'ExampleHit(<brace-enclosed initializer list>)' is ambiguous
     323       46 |     hits2.push_back(ExampleHit({42, 5, -5, 5, 7.6}));
     324          |                                                   ^
     325    In file included from /home/wdconinc/git/JANA2/src/examples/PodioExample/datamodel/MutableExampleHit.h:8,
     326                     from /home/wdconinc/git/JANA2/src/examples/PodioExample/PodioExample.cc:7:
     327    /home/wdconinc/git/JANA2/src/examples/PodioExample/./datamodel/ExampleHit.h:59:3: note: candidate: 'ExampleHit::ExampleHit(const MutableExampleHit&)'
     328       59 |   ExampleHit(const MutableExampleHit& other);
     329          |   ^~~~~~~~~~
     330    /home/wdconinc/git/JANA2/src/examples/PodioExample/./datamodel/ExampleHit.h:47:3: note: candidate: 'ExampleHit::ExampleHit(const ExampleHit&)'
     331       47 |   ExampleHit(const ExampleHit& other) = default;
     332          |   ^~~~~~~~~~
```
This commit remove the ambiguity that is present by passing an initializer list to a constructor.

After this commit, JANA2 compiles both before and after podio PR 564 (and also with v00-17-03 which is before AIDASoft/podio#514 which led to JeffersonLab#269). There is no minimum podio version specified in CMakeLists.txt to test with.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants