Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running multithreaded: Add an IOSvc, an algorithm for reading and writing for functionals in k4FWCore #173

Merged
merged 127 commits into from
May 21, 2024

Conversation

jmcarcell
Copy link
Member

@jmcarcell jmcarcell commented Jan 18, 2024

BEGINRELEASENOTES

  • Add a new way of using functionals in k4FWCore such that it's possible to run using the Gaudi HiveWhiteBoard (multithreaded) that includes:
  • A new service IOSvc to take of the input and output, together with a Reader and Writer algorithm, to be used instead of PodioInput and PodioOutput
  • Support for reading and writing an arbitrary (specified at runtime) number of collections, using std::map<std::string, const Coll&> for reading and std::map<std::string, Coll>, i.e. for reading we get const references to the collections and for writing we put the collections using std::move typically in the map
  • Custom implementations of the Consumer, Producer and Transformer to make all of this possible and the interface simple (for example, if we have an algorithm that outputs a std::map<std::string, edm4hep::MCParticleCollection> we only have to set the names in python and all the collections will be in the store available for the next algorithms
  • Some utilities on the python side: automatic insertion of the Reader and Writer algorithms when an input or output files are specified, automatic creation of output folders (see Create output directories #170)
  • Tests for several different situations: functionals reading and outputting to files, using only memory, running multithreaded, functionals with an arbitrary number of collections as input and/or output

ENDRELEASENOTES

Ready for review! During the coming days I'll test a few things: if there are memory leaks, metadata handling (most of it copied from what we were doing with PodioOutput), running multithreaded

Left out of this PR: writing multiple times in the same chain (multiple output files). That will be in a different PR, this one is already big enough.

In addition, when AIDASoft/podio#522 is merged I'll make the changes so that it's easy to swap between the normal ROOT, either in this PR or in a different one.

@jmcarcell jmcarcell changed the title WIP: Add an IOSvc, an algorithm for reading and writing WIP: Add an IOSvc, an algorithm for reading and writing for functionals in k4FWCore Jan 26, 2024
@jmcarcell jmcarcell changed the title WIP: Add an IOSvc, an algorithm for reading and writing for functionals in k4FWCore Add an IOSvc, an algorithm for reading and writing for functionals in k4FWCore Feb 16, 2024
@jmcarcell jmcarcell changed the title Add an IOSvc, an algorithm for reading and writing for functionals in k4FWCore Support running multithreaded: Add an IOSvc, an algorithm for reading and writing for functionals in k4FWCore Feb 16, 2024
@jmcarcell jmcarcell force-pushed the no-datasvc branch 2 times, most recently from 30a5e80 to 3054187 Compare February 29, 2024 09:09
@andresailer andresailer self-requested a review March 5, 2024 08:50
Copy link
Contributor

@m-fila m-fila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments on the service part

For the algorithm part. Previously the client algorithms were supposed to be derived from GaudiAlgorithms with custom Traits, now they are meant to be derived with provided k4FWCore::Consumer etc. Does it mean later all other flavors of functional algorithms (e.g. filters) should be added upon request? Or is there some incompatibility built-in?

k4FWCore/components/IIOSvc.h Show resolved Hide resolved
k4FWCore/components/IIOSvc.h Outdated Show resolved Hide resolved
k4FWCore/components/IOSvc.cpp Show resolved Hide resolved
k4FWCore/components/IOSvc.cpp Outdated Show resolved Hide resolved
k4FWCore/components/IOSvc.cpp Outdated Show resolved Hide resolved
k4FWCore/components/IOSvc.h Outdated Show resolved Hide resolved
Comment on lines 70 to 64
Gaudi::Property<std::vector<std::string>> m_readingFileNames{this, "input", {}, "List of files to read"};
Gaudi::Property<std::string> m_writingFileName{this, "output", {}, "List of files to read"};
Gaudi::Property<std::vector<std::string>> m_outputCommands{
this, "outputCommands", {"keep *"}, "A set of commands to declare which collections to keep or drop."};
Gaudi::Property<std::string> m_inputType{this, "ioType", "ROOT", "Type of input file (ROOT, RNTuple)"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in Gaudi the convention seems to be property names starting with uppercase (e.g. "Input")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we have been using lower caps name for a while in PodioInput and PodioOutput: https://github.com/key4hep/k4FWCore/blob/main/k4FWCore/components/PodioInput.h#L51 and https://github.com/key4hep/k4FWCore/blob/main/k4FWCore/components/PodioOutput.h#L59; I don't have an opinion nor care much about this

k4FWCore/components/IOSvc.h Outdated Show resolved Hide resolved
k4FWCore/components/IOSvc.h Outdated Show resolved Hide resolved
k4FWCore/components/IOSvc.h Outdated Show resolved Hide resolved
Copy link
Contributor

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only had a brief look so far. I think my main concern still is the fact that we re-implement Gaudi functionality here and that we have to maintain that. Maybe we can try to figure out if it's possible to upstream some of the necessary things in order to make this work?

k4FWCore/components/IIOSvc.h Show resolved Hide resolved
k4FWCore/components/Reader.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having read a bit more, I am wondering whether this still supports algorithms that produce / consume other things than podio::CollectionBase derived things. E.g. if some (internal) algorithm just produces some values or a vector of something that is then consumed by another one.

k4FWCore/include/k4FWCore/FunctionalUtils.h Outdated Show resolved Hide resolved
k4FWCore/include/k4FWCore/FunctionalUtils.h Outdated Show resolved Hide resolved
k4FWCore/include/k4FWCore/FunctionalUtils.h Outdated Show resolved Hide resolved
@m-fila
Copy link
Contributor

m-fila commented Mar 8, 2024

Would it be possible to have Reader and Writer dynamically declare their data dependency and participate in a data flow (using property system instead of writing directly to the data store, Gaudi's CPUCruncher has it)? With that there shouldn't be a need for custom ApplicationMgr and additional wrapping in sequencers in order to ensure they are the first and last algorithm called.

Taken from FunctionalMTFile:

AvalancheSchedu...   INFO Data Dependencies for Algorithms:
  Reader
      none
  Transformer
    o INPUT  'MCParticles'
    o OUTPUT 'NewMCParticles'
  Writer
      none

The avalanche scheduler recognizes only the transformer to have data dependencies

====================================
Data origins and destinations:
====================================
Transformer
V
o 0x4045d28
V
====================================

Comment on lines 40 to 42
if attr == 'output':
if os.path.dirname(value) and not os.path.exists(os.path.dirname(value)):
os.makedirs(os.path.dirname(value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to create a python wrapper instead of putting this in a C++ class directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also addressed in #170 already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#170 does that when using PodioOutput. Then whether to have it in C++ or python, well in python it's a one liner, compared to #170 for example and it's something that runs at most once. It could also be part of the podio backends

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern was that the python wrapper will have different behavior and it's not related to make it work or be nicer in python. I think the intent was to have python as configuration layer but keep the logic in C++

#170 does also error handling

@jmcarcell
Copy link
Member Author

I left some comments on the service part

For the algorithm part. Previously the client algorithms were supposed to be derived from GaudiAlgorithms with custom Traits, now they are meant to be derived with provided k4FWCore::Consumer etc. Does it mean later all other flavors of functional algorithms (e.g. filters) should be added upon request? Or is there some incompatibility built-in?

Yeah they have to be added on demand. With the Producer, Consumer and Transformer in this PR it's most use cases, we're only missing filtering and maybe some utility functionals

k4FWCore/components/Writer.cpp Outdated Show resolved Hide resolved
Comment on lines 43 to 44
setProperty("Cardinality", 1).ignore();
}

mutable Gaudi::Property<std::vector<std::string>> m_OutputNames{this, "CollectionNames", {}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the non-reentrant algorithm should override the bool isReEntrant() const method. Parent returns true.

Suggested change
setProperty("Cardinality", 1).ignore();
}
mutable Gaudi::Property<std::vector<std::string>> m_OutputNames{this, "CollectionNames", {}};
setProperty("Cardinality", 1).ignore();
}
bool isReEntrant() const override { return false; }
mutable Gaudi::Property<std::vector<std::string>> m_OutputNames{this, "CollectionNames", {}};

k4FWCore/components/Reader.cpp Outdated Show resolved Hide resolved
Comment on lines 191 to 187
// This is the case when no reading is being done
// needs to be fixed? (new without delete)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the operator() declaration is rather long and I feel it could benefit from refactoring to several smaller methods. The memory managment could be done on the top level and smaller methods receive a raw pointer (non-owning semantics)

As is the operator has several returns so a manual memory managment will be tedious and error prone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I don't see how to refactor, the steps are quite short but the error checking makes it very verbose. Memory management doesn't need to be done there, as the wrapper that is created with new is deleted by the store later (I checked if it's deleted then it's a double delete). So in the end it's 3 loops, once the first time to get which collections are available, another one to remove from the store the ones that belong to a frame and a final one to put into the frame whichever collections are going to be written.

@m-fila
Copy link
Contributor

m-fila commented Mar 11, 2024

The algorithms using runtime specification of collections with std::map seem to be incompatible with AvalancheScheduler.
Similarly to Read and Write algorithms they don't declare the runtime data dependencies, so the scheduler can't figure out the order in which they should be executed.

For instance, I changed ExampleFunctionalConsumerRuntimeCollections.py (attached) to use AvalancheScheduler and obtained an error:

36: Consumer            ERROR Consumer : Failed to retrieve object MCParticles2
36: Consumer            ERROR Maximum number of errors ( 'ErrorMax':1) reached.
36: AlgTask           WARNING Execution of algorithm Consumer failed
36: AvalancheSchedu...  ERROR *** Stall detected, event context: s: 0  e: 0

After inspecting the data dependencies:

Data dependencies:

INFO Data Dependencies for Algorithms:
  Producer0
    o OUTPUT 'MCParticles0'
  Producer1
    o OUTPUT 'MCParticles1'
  Producer2
    o OUTPUT 'MCParticles2'
  Consumer
    o INPUT  'InputCollection'

Data flow:

====================================
Data origins and destinations:
====================================
Producer0
V
o 0x25022f8
V
====================================
Producer1
V
o 0x25023d8
V
====================================
Producer2
V
o 0x2502488
V
====================================

The baseline version without AvalancheScheduler runs fine. The error disappears too in the version with AvalancheScheduler after wrapping the topAlg in Gaudi__Sequencer with Sequential=True . The data item that was previously missing now is present 'by accident', because the algorithms were put in correct order and executed sequentially.

I think it could be good to test multi-threading with multiple different algorithms instead of a two tests dedicated to multi-threading but the algorithm tests running with default scheduler

@tmadlener
Copy link
Contributor

A question that just come to me while looking at #172: Does this still work with non functional algorithms?

k4FWCore/include/k4FWCore/Transformer.h Outdated Show resolved Hide resolved
k4FWCore/include/k4FWCore/Transformer.h Outdated Show resolved Hide resolved
@jmcarcell
Copy link
Member Author

From @tmadlener

A question that just come to me while looking at #172: Does this still work with non functional algorithms?

It doesn't because support for it hasn't been implemented (but could be). I have added a few asserts so that now one gets a proper error message when using input / output types that are not part of the allowed ones. This has been discussed several times in the EDM4hep meetings and I think the agreement lean towards not letting algorithms use classes outside of EDM4hep or podio. If I remember well, this is the same that we have with the current support for functionals in k4FWCore.

@tmadlener
Copy link
Contributor

I think the question about which types can be transported through the event store between algorithms is a separate concern to the question of whether we want to support non-functional algorithms. For the first one, I think having a limitation to only support EDM4hep / podio generated types is OK in the long run. I think we will probably need support for non-functional algorithms as well. At least at the moment we need to be able to run non-functional algorithms as well as to pass non-EDM4hep objects via the TES, due to the MarlinWrapper. There we need to pass an LCEvent (or a wrapper around that) along, and I don't think we can ever make it functional.

@tmadlener
Copy link
Contributor

Another question that I just came across is how do we foresee the metadata handling here? Would that still simply work with the MetaDataHandle? Do the current MetaDataHandles even work well with Functional algorithms?

k4FWCore/components/IOSvc.cpp Outdated Show resolved Hide resolved
k4FWCore/components/IOSvc.cpp Outdated Show resolved Hide resolved
k4FWCore/components/IOSvc.cpp Outdated Show resolved Hide resolved
k4FWCore/components/IOSvc.cpp Show resolved Hide resolved
StatusCode code;
if (m_hiveWhiteBoard) {
if (!incident.context().valid()) {
info() << "No context found in IOSvc" << endmsg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
info() << "No context found in IOSvc" << endmsg;
info() << "No context found in incident" << endmsg;

Or why IOSvc?

k4FWCore/components/Writer.cpp Outdated Show resolved Hide resolved
k4FWCore/include/k4FWCore/Consumer.h Outdated Show resolved Hide resolved
k4FWCore/include/k4FWCore/DataHandle.h Outdated Show resolved Hide resolved
k4FWCore/include/k4FWCore/Transformer.h Show resolved Hide resolved
python/k4FWCore/ApplicationMgr.py Show resolved Hide resolved
@jmcarcell
Copy link
Member Author

jmcarcell commented Apr 30, 2024

A few answers to some questions
From @m-fila

lgorithms using runtime specification of collections with std::map seem to be incompatible with AvalancheScheduler.
Similarly to Read and Write algorithms they don't declare the runtime data dependencies, so the scheduler can't figure out the order in which they should be executed.

This is now fixed and the data flow graph looks correct with std::map

From @tmadlener (sorry I think I replied to another question in the reply to this):

A question that just come to me while looking at #172: Does this still work with non functional algorithms?

Yes, there are two tests runFunctionalMix.py and runFunctionalMixIOSvc.py that run functional algorithm and "old" algorithms together using both PodioInput and PodioOutput and showing that it works

Another question that I just came across is how do we foresee the metadata handling here? Would that still simply work with the MetaDataHandle? Do the current MetaDataHandles even work well with Functional algorithms?

Not implemented for now. MetaDataHandle is coupled to PodioInput and PodioOutput so it won't work here.

@jmcarcell
Copy link
Member Author

Update: rechecked and tuned the tests that mix "old" and functional algorithms in this PR: the conclusion is that when using PodioInput and PodioOutput the collections produced by functional algorithms are not saved (logical, since PodioOutput has not been changed to do that). When using IOSvc all collections are saved, the ones produced by "old" algorithms and functional ones. When running there doesn't seem to be any other differences. To be tested with the Marlin wrapper

@andresailer andresailer requested a review from m-fila May 15, 2024 13:14
@m-fila
Copy link
Contributor

m-fila commented May 16, 2024

Test FunctionalMTFile reports an ERROR:

33: Transformer          INFO Transforming 2 particles
33: IOSvc                INFO No context found in IOSvc
33: ApplicationMgr       INFO Application Manager Stopped successfully
33: NTupleSvc           ERROR ERROR while saving NTuples
33: ApplicationMgr       INFO Application Manager Finalized successfully
33: ApplicationMgr       INFO Application Manager Terminated successfully with a user requested ScheduledStop
1/1 Test #33: FunctionalMTFile .................   Passed    4.02 sec

Test FunctionalFile doesn't report any problems. I'm not sure if this is a real error, the file is saved properly

Copy link
Contributor

@m-fila m-fila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any violations of data or control flow. A few warnings could be silenced by putting a correct value explicitly (comments below), but in general should be ready to go 👍

Having said that I found one thing that was a bit puzzling and different with respect to plain Gaudi and previous k4FWCore algorithms. This isn't a deal-breaker

With the current design all of algorithm's inputs and outputs must be declared as the lists, even if an algorithm operates on singular objects.
For instance, a producer that takes nothing and produces a single collection has to be constructed with a list of keys for outputs (KeyValues) instead of a single key for output (KeyValue) as in plain Gaudi. All the provided keys are used in data dependency resolution but only the first key will be populated and put to an event store.

struct ExampleFunctionalProducer final : k4FWCore::Producer<edm4hep::MCParticleCollection()> {
// The pair in KeyValue can be changed from python and it corresponds
// to the name of the output collection
ExampleFunctionalProducer(const std::string& name, ISvcLocator* svcLoc)
: Producer(name, svcLoc, {}, KeyValues("OutputCollection", {"MCParticles"})) {}
// This is the function that will be called to produce the data
edm4hep::MCParticleCollection operator()() const override {
auto coll = edm4hep::MCParticleCollection();
coll.create(1, 2, 3, 4.f, 5.f, 6.f);
coll.create(2, 3, 4, 5.f, 6.f, 7.f);
// We have to return whatever collection type we specified in the
// template argument
return coll;
}

This is also reflected in the steering files:

  • This would be a standard code for plain Gaudi producer but here will result in an error because OutputCollection must be a list
     ExampleFunctionalProducer("Producer", OutputCollection="Foo")
  • This is will execute correctly:
     ExampleFunctionalProducer("Producer", OutputCollection=["Foo"])
  • this will also execute correctly but produce only Foo. A scheduler will see the algorithm as producing both "Foo" and "Bar" collections. Ideally it shouldn't be even possible to put a second item:
    ExampleFunctionalProducer("Producer", OutputCollection=["Foo", "Bar"])

While this is could be solved with a convention, I'd say it can be misleading for algorithm developers. The documentation on functional algorithms is already a bit cryptic and now we have algorithms that have signatures for singular object in input/output but in property system declare operating on lists of objects

k4FWCore/components/Reader.cpp Outdated Show resolved Hide resolved
k4FWCore/components/Reader.cpp Outdated Show resolved Hide resolved
k4FWCore/components/Writer.cpp Outdated Show resolved Hide resolved
python/k4FWCore/ApplicationMgr.py Outdated Show resolved Hide resolved
python/k4FWCore/ApplicationMgr.py Outdated Show resolved Hide resolved
@jmcarcell
Copy link
Member Author

I couldn't find any violations of data or control flow. A few warnings could be silenced by putting a correct value explicitly (comments below), but in general should be ready to go 👍

Having said that I found one thing that was a bit puzzling and different with respect to plain Gaudi and previous k4FWCore algorithms. This isn't a deal-breaker

With the current design all of algorithm's inputs and outputs must be declared as the lists, even if an algorithm operates on singular objects. For instance, a producer that takes nothing and produces a single collection has to be constructed with a list of keys for outputs (KeyValues) instead of a single key for output (KeyValue) as in plain Gaudi. All the provided keys are used in data dependency resolution but only the first key will be populated and put to an event store.

struct ExampleFunctionalProducer final : k4FWCore::Producer<edm4hep::MCParticleCollection()> {
// The pair in KeyValue can be changed from python and it corresponds
// to the name of the output collection
ExampleFunctionalProducer(const std::string& name, ISvcLocator* svcLoc)
: Producer(name, svcLoc, {}, KeyValues("OutputCollection", {"MCParticles"})) {}
// This is the function that will be called to produce the data
edm4hep::MCParticleCollection operator()() const override {
auto coll = edm4hep::MCParticleCollection();
coll.create(1, 2, 3, 4.f, 5.f, 6.f);
coll.create(2, 3, 4, 5.f, 6.f, 7.f);
// We have to return whatever collection type we specified in the
// template argument
return coll;
}

This is also reflected in the steering files:

* This would be a standard code for plain Gaudi producer but here will result in an error because `OutputCollection` must be a list
  ```python
   ExampleFunctionalProducer("Producer", OutputCollection="Foo")
  ```

* This is will execute correctly:
  ```python
   ExampleFunctionalProducer("Producer", OutputCollection=["Foo"])
  ```

* this will also execute correctly but produce only `Foo`. A scheduler will see the algorithm as producing both "Foo" and "Bar" collections. Ideally it shouldn't be even possible to put a second item:
  ```python
  ExampleFunctionalProducer("Producer", OutputCollection=["Foo", "Bar"])
  ```

While this is could be solved with a convention, I'd say it can be misleading for algorithm developers. The documentation on functional algorithms is already a bit cryptic and now we have algorithms that have signatures for singular object in input/output but in property system declare operating on lists of objects

That is indeed something that isn't very nice about the current way. I'll have a look at how to change that.

As discussed in the past meetings, we're merging this now which has gotten already quite big and issues will be fixed in other PRs.

@jmcarcell jmcarcell merged commit 5dc2a63 into main May 21, 2024
5 of 11 checks passed
@jmcarcell jmcarcell deleted the no-datasvc branch May 21, 2024 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants