Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rawhits removed from TrackerHits #382

Open
andresailer opened this issue Nov 21, 2024 · 5 comments
Open

Rawhits removed from TrackerHits #382

andresailer opened this issue Nov 21, 2024 · 5 comments

Comments

@andresailer
Copy link
Collaborator

          > * Remove the edm4hep::ObjectID from the components #260

This was not quite the descriptive PR title that I thought it was, and it is turning out to be quite annoying to deal with for us due to the removal of the vector member of raw hits in TrackerHit. Same for ACTS, which is relying on the raw hits. Edit: ACTS abuses this somewhat, but we are using this in the intended way: to refer to raw hits.

Originally posted by @wdconinc in #261 (comment)

@tmadlener
Copy link
Contributor

Hi @wdconinc, thanks for the report. That was definitely an unintended consequence of that change. The main goal was to get rid of the duplication of podio::ObjectID and edm4hep::ObjectID.

A few questions to see how we can salvage this best:

  • What are the raw hits in your case? And how many different types are involved?
  • How does your workflow look at the moment? I.e. how do you get from the ObjectID to the actual thing in the end? Did you write some utility for that?
  • Since the edm4hep::Track now uses the edm4hep::TrackerHit interface for its hits relation, would you need access to the raw hits from the interface, or are you OK with down-casting to the actual type and getting access to the raw hits from there?
  • (Could the raw hits be combined into an interface themselves? Or do you think that would be more cumbersome than the current workflow?)

@wdconinc
Copy link
Contributor

Hi @wdconinc, thanks for the report. That was definitely an unintended consequence of that change. The main goal was to get rid of the duplication of podio::ObjectID and edm4hep::ObjectID.

Yeah, and that main goal is what I got from the PR title, and I agree its duplication that should be removed. I was surprised when it turned out to include other changes that I didn't expect.

A few questions to see how we can salvage this best:

  • What are the raw hits in your case? And how many different types are involved?

We are using an EDM4eic RawTrackerHit since there is no similar type available in EDM4hep. https://github.com/eic/EDM4eic/blob/09d18ef63f8b18ecc94ce995d812ab90d597acc0/edm4eic.yaml#L351. We have an EDM4eic TrackerHit too which we use in some subsystes but not all, and we would prefer to move all to EDM4hep instead of the other way round.

So, one type, but outside EDM4hep. Main goal to have this is to have a full chain of required relations from reconstructed to raw detected quantities (and only optional associations/links to truth info).

  • How does your workflow look at the moment? I.e. how do you get from the ObjectID to the actual thing in the end? Did you write some utility for that?

At this point we store the raw hits and they are accessible and used by analysis scripts based on what is them stored in the root file (primarily things like how many raw hits are unused, or used twice). Since it is clear which collection is referred to (only one raw hits collection for each TrackerHitCollection), only the index is used, just as with how other vector members are accessed in root/uproot.

  • Since the edm4hep::Track now uses the edm4hep::TrackerHit interface for its hits relation, would you need access to the raw hits from the interface, or are you OK with down-casting to the actual type and getting access to the raw hits from there?

We would be ok with accessing this through an the interface instead of the type. Also could be an interface itself, i.e. RawTrackerHit being an interface for various ways of storing raw hit data. I think in either case this will result in a new hard requirement on a next version of EDM4hep.

  • (Could the raw hits be combined into an interface themselves? Or do you think that would be more cumbersome than the current workflow?)

The complication here is that we would need to have the equivalent of our EDM4eic RawTrackerHit as an option for the raw hit data type, since once the data type is defined in EDM4hep, I don't think we can add more types to it in EDM4eic, right?

@tmadlener
Copy link
Contributor

We might have gone for the easiest way with removing the rawHits since we didn't see any use on our end, and discussed with CEPC that they would be OK with fixing their side. We should have also consulted you guys on this. Apologies. (As a side note, I just realized that the schema diagram that we have in EDM4hep is not correct, as it still has the connection to the raw hits from the TrackerHits.)

I am currently not entirely sure how to best fix this without delaying EDM4hep v1.0 even further. Let me confirm that the schema evolution mechanism of podio can actually handle an addition of a VectorMember (which it definitely should in principle and by design).

From a usability in frameworks point of view: I am fairly certain that in the k4FWCore implementation that we currently have, we cannot simply ask a/the data service for a collection by its ID rather than its name. In "plain podio" it's possible to do at the moment (by first getting the name from the frame by ID and then getting the collection from there). I am not sure how Jana2 handles this. Obviously if we have an interface for raw hits that would be a lot easier because than the relation handling is done completely under the hood and "will just work (TM)".

We are using an EDM4eic RawTrackerHit since there is no similar type available in EDM4hep.

I think the only real contender for a similar thing in EDM4hep would be the RawTimeSeries. The only real thing they have in common is the cellID (and potentially the charge if we gloss over the details of which unit that might be stored in). I am mainly looking into similarities here because that would in the end define the interface (if we decide to go for one).

since once the data type is defined in EDM4hep, I don't think we can add more types to it in EDM4eic, right?

Yeah, for interfaces, we need all participating types to be known when the interface is defined, see also this discussion / comment in podio

@tmadlener
Copy link
Contributor

We have AIDASoft/podio#714 in flght and it should be merged in time for the next tag of podio (hopefully this week). That would at least allow you to define an interface in EDM4eic that uses types from both EDM4hep and EDM4eic. That doesn't fully solve the issue under discussion here, but it might make the amount of redefinition slightly less.

We had a brief discussion in the EDM4hep meeting today and at least for v01-00 we don't really see a way to bring back the rawHits as VectorMembers of a podio::ObjectID. We could bring them back after that, once we also have developed the necessary automatic schema evolution code generation in podio.

Another alternative for similar functionality would be to use templated links for connecting the tracker hits with the raw hits. Since you only have one raw hit type at the moment, that might be a possibility to keep at least some of the indexing approach alive. However, I don't know enough about your particular details to know for sure.

This was referenced Dec 3, 2024
@tmadlener
Copy link
Contributor

TL;DR: Would you be persuadable to use podio::Links for keeping track of raw hits that are attached to tracker hits for EIC? We came to the conclusion that that is most likely the current best solution to the problem.


We had quite some discussion last week about this issue and various ways of how to solve the general case of "attaching raw (hit) information to higher level datatypes". The main issue we had in mind in that discussion is that most likely the higher level datatype will already be in EDM4hep, while some of the raw information will be very detector specific. Hence, we will most likely not be able to know about all of them and they will usually live in some extension (at least for a considerable amount of time, until they have been refined enough to be upstreamed).

The technical solutions to this that we discussed were:

  • Member or VectorMember of podio::ObjectID
  • Using podio::Links
  • Introducing a (potentially extremely generic std::any-like) interface

Proposed solution: podio::Links

Our proposed solution would be to use podio::Links as they offer at least some type safety. They can be introduced in the extension directly without touching the upstream model, and they are usable already now. Some of the cons that we discussed were that the size on disk is larger than the other options since we duplicate some of the information and that the order of different raw hits will be hard to maintain if there are several different raw hit types. However, we could not come up with any scenario where different types would be involved for raw hits.

Arguments for / against other options

In the end all of the three mechanisms described above can be used to achieve the goal. Additionally, the differences between them are in some cases rather marginal, while in others there are a few conceptual differences as well. I will just put the arguments that we had during the discussion here to see if we missed something or if you had other preferences.

VectorMember of podio::ObjectID

This is equally flexible as the solution with the podio::Links, i.e. we don't need to touch the upstream model to relate to a new raw hit. However, this requires quite intimate knowledge of the involved types and dedicated type casting. Additionally, the I/O handling is a bit cumbersome, especially in our k4FWCore usage, where we currently don't have retrieval by collectionID available (although the podio::Frame supports that). What speaks for this solution is that it maintains order even if different raw hit types are involved (as maintaining type information would now be entirely user responsibility).

Introducing a generic interface

This approach would hide some of the manual I/O management compared to the podio::ObjectID approach. However, it would still require explicit type knowledge to cast back to the actual type for usage. It would also maintain order for several involved types of raw hits. The major downside is that currently it's not possible to have types from the extension participate in an interface defined in the upstream model, which would be a strict requirement. There was some discussion as to whether it would be technically possible to lift this requirement (i.e. opening interfaces to all datatypes that fulfill some API, compared to the currently required type list), but we decided to not pursue that direction at the moment (given other constraints).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants