-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Davidl root reader t directory #579
base: master
Are you sure you want to change the base?
Davidl root reader t directory #579
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this. It's a slightly different approach than I would have attempted for this, but I think it's also quite elegant as it re-uses quite a bit of existing machinery. Also summoning @hegner for a few more high level comments.
One question: How do you build the TMemFile
s on the writing side? (Or plan to build)
My main concern for this approach is that this ties the streaming of Frames rather tightly to ROOTs TTree backend. I would have tried to serialize on the FrameData level, which should be possible without any dependency on a specific backend. However, since you have a proof-of-principle already for the full chain, your approach is potentially a bit easier to implement (at least on the podio side), since a lot of the heavy lifting is done else where. You probably have a better feeling for how involved the overall approach is.
Another somewhat smaller concern is performance. When I first implemented Frame I/O I didn't pay too much attention to the things that happen "only once", e.g. when opening a file, or when first setting up a category for reading the first entry. Hence, I crammed quite a bit of (potentially) unnecessary work into that since the overhead will be distributed in any case. I am not entirely sure how things look if this "overhead" happens repeatedly for just a few events. In case all of the events that you stream have the same metadata, and the same categories there might be a way to cache a bit more of this information between calls to openTDirectory
in case performance becomes an issue. Until that is the case, I would prefer the current version, since has fewer assumptions and is more readable.
I can't claim to fully understand the testing configuration, but hopefully this is in line with what you would like.
Yeah, it's not the simplest setup, but I haven't yet found the time to clean it up a bit better, now that the legacy EventStore
is gone which drove some of it's implementation. I think overall the current approach is OK, I would maybe try to reduce some code duplication, but we can also do that at the end.
@@ -189,54 +197,87 @@ std::vector<std::string> getAvailableCategories(TChain* metaChain) { | |||
return brNames; | |||
} | |||
|
|||
/// @brief Read version and data model from the m_metaTree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring needs to be in the header file in our doxygen configuration.
|
||
#include "TFile.h" | ||
|
||
int read_frames(podio::ROOTReader& reader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (on a quick first glance) seems to more or less be a copy of the templated version in read_test.h
. Maybe that can be adapted to split off a version with an already initialized reader (initialized == file / TDirectory opened).
You can see a pair of programs that do the serialization/deserialization and zmq messaging here. These are independent of the PODIO library. They are specific to the ROOT file format of the ePIC simulated data files in that it assumes 4 TTrees with the event data being kept in "events". https://github.com/JeffersonLab/SRO-RTDP/tree/davidl_podio2tcp/src/utilities/cpp/podio2tcp How this is implemented on the ePIC side is via a JANA
I agree 100% that a better implementation will be to do this at the Frame level. This approach was quicker for me to implement and didn't require my digging too deep into the PODIO source. It will serve my short term goals until someone more familiar with PODIO can do it the right way. I also think the
Yes, performance is not great, but it is currently obscured by the horribly slow ePIC reconstruction code. When reading from a file, it takes 2-5 seconds to reconstruct an event. Preliminary testing with pre-formed send buffers yields sending takes only about 0.02 seconds/event. When it comes to reading in the meta-data, one knob we have to turn is how many events are sent at once. I can just as easily send 100 events or more at a time which does reduce that overhead. I'll tune that as needed when I start scaling up for more performance testing. |
Thanks for the links. After a quick look and IIUC the main use case here is to read from stream in Jana2, but the source of the stream would then be outside of Jana2? I am mainly asking, because I was wondering whether you also planned to implement (or actually need) the creation of the |
Correct, there is no need for a JANA2 process on the sending side. I do not anticipate using It is true that at the moment and for the next couple of years we will likely only have simulated data and it will be in the form of podio files on disk. Hence the need to generate stream(s) from that so the SRO infrastructure can start development. |
I haven't heard anything on this for 2 months. Are there any plans to merge it or any hold-ups preventing it? |
@faustus123 - apologies for having it dormant for so long. We've been doing some refactoring which we just managed to finish. If you can update your PR I can deal with it this week. |
Uggh. I just tried to rebase to bring my fork up to date and a lot of changes have happened in the ROOTReader files in the meantime. It is going to take a while to try and resolve this. It would probably be more efficient if I do this after you have finished with whatever work you are doing there. Please let me know when you are finished with the current round of work and I will take a look at it then. |
The changes that we want to go in first are the ones in #625. Looking at the changes in this PR, I am wondering whether you cherry-picking might be easier than rebasing? |
BEGINRELEASENOTES
Add ROOTReader::openTDirectory() to allow external management of TDirectory (e.g. TMemFile) to support streaming.
ENDRELEASENOTES
This is motivated by issue #565 . It makes it possible to write a stream source that sends data packets in the form of serialized
TMemFile
objects using ROOT's built-in serialization/deserialization functionality. EachTMemFile
may contain a small number of events in the "events" tree and the complete metadata trees allowing the other standardROOTReader
methods to work. I have a working test case that reads events from an existing PODIO ROOT file and sends them this way via zmq messages. This couples with the ePIC recon code and works so there is a proof-of-principle that this works as needed.A couple of notes on what was done:
A
TTree *m_metaTree
member was added to take the place of theTChain
in most of the calls for which aTChain
is not actually needed. When aTChain
is actually used, this just points to theTChain
.The
TChain
was changed from being kept in aunique_ptr
to being a direct member of the class. I suspect theunique_ptr
trick was used becauseTChain
has its copy and assignment constructors unavailable as private members of theTChain
class and that causes some issues with storing these instd::map
. I ran into other complications with trying to implementm_metaTree
to support both cases (an externally suppliedTTree
or an internalTChain
) I resolved it by explicitly deleting the copy and assignment constructors forROOTReader
which avoids the problem of ownership of theTChain
. Removing theunique_ptr
mechanism was then possible which simplified things in the class. There may have been other reasons for usingunique_ptr
which I don't know about.A new test was added
read_frame_root_tdirectory
which is almost a complete copy ofread_frame_root_multiple
. I can't claim to fully understand the testing configuration, but hopefully this is in line with what you would like.