-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add documentation about the functional algorithms (#50)
- Loading branch information
Showing
1 changed file
with
259 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,65 +1,287 @@ | ||
# Writing Gaudi Algorithms | ||
|
||
## Gaudi | ||
|
||
{% objectives "Learning Objectives" %} | ||
Gaudi is an event-processing framework. Algorithms can be defined by users and | ||
Gaudi will take care of running them for each event. In addition, Gaudi has a | ||
set of services and tools like logging and support for running in a | ||
multithreaded environment. | ||
|
||
This tutorial will teach you how to: | ||
The relationship between Gaudi with key4hep happens through | ||
[k4FWCore](https://github.com/key4hep/k4FWCore). k4FWCore has tools and | ||
utilities needed to be able to use (almost) seamlessly EDM4hep collections in | ||
Gaudi algorithms. We recommend checking out the | ||
[tests](https://github.com/key4hep/k4FWCore/tree/main/test/k4FWCoreTest) in this | ||
repository since they contain examples of algorithms (in particular of | ||
algorithms using `Gaudi::Functional`). | ||
|
||
* write an algorithm for Key4hep | ||
* interact with the cmake based build system | ||
* use other Gaudi components in the algorithms | ||
# Gaudi::Functional | ||
Using `Gaudi::Functional` is the recommended way of creating algorithms. The | ||
design is simple and at the same time enforces several constraints at | ||
compile-time, allowing for a quicker development cycle. In particular, we will | ||
see that our algorithms won't have an internal state and we obtain the benefit | ||
of being able to run in a multithreaded environment (almost) trivially[^1]. | ||
|
||
{% endobjectives %} | ||
[^1]: It's possible to find algorithms written based on GaudiAlg which is going to be removed from future versions of Gaudi. GaudiAlg was substituted by Gaudi::Algorithm, although the recommended way is to use Gaudi::Functional. | ||
|
||
## Setup | ||
We will need Gaudi, k4FWCore and all their dependencies. Installing these by | ||
ourselves is not easy but there are software stacks on cvmfs, see the | ||
{doc}`/setup-and-getting-started/README.md` to set up the key4hep stack. | ||
|
||
## Getting Started | ||
The easiest way of having a working repository is to copy the template | ||
repository that we provide in key4hep: | ||
|
||
Writing Gaudi components requires a bit of boilerplate code. | ||
Often it is easiest to start from existing files and modify them as needed. | ||
For this tutorial, there is a dedicated repository that contains an example. | ||
Start by cloning it locally: | ||
|
||
```bash | ||
``` bash | ||
git clone https://github.com/key4hep/k4-project-template | ||
``` | ||
|
||
It contains a CMake configuration (as described in more detail in the previous tutorial) so it can be built with: | ||
or ideally with ssh | ||
|
||
``` bash | ||
git clone [email protected]:key4hep/k4-project-template | ||
``` | ||
|
||
This template repository already has the cmake code that will make our | ||
algorithms know where Gaudi and k4FWCore and to properly link to them. In | ||
addition there are a few examples that combined with the tests in k4FWCore | ||
provide an overview of what's possible to do. The `k4-project-template` | ||
repository contains a CMake configuration (as described in more detail in the | ||
previous tutorial) so it can be built with: | ||
|
||
```bash | ||
cd k4-project-template | ||
mkdir build install | ||
cd build | ||
cmake .. -DCMAKE_INSTALL_PREFIX=../install | ||
make -j 4 | ||
make -j 4 install | ||
``` | ||
|
||
To run the algorithms contained in this repository, it is not necesary to run | ||
To run the algorithms contained in this repository you can use `k4run`, like: | ||
|
||
```bash | ||
k4run ../K4TestFWCore/options/createExampleEventData.py | ||
|
||
``` | ||
make install | ||
|
||
## Walkthrough of Functional Algorithms | ||
|
||
Functional algorithms in Gaudi are relatively straightforward to write. For each | ||
algorithm we want, we have to create a class that will inherit from one of the | ||
`Gaudi::Functional` classes. The most important function member will be | ||
`operator()` which is what will run over our events (or over none in case we are | ||
generating). There are several base classes in Gaudi (see a more complete list | ||
in https://lhcb.github.io/DevelopKit/03a-gaudi/): | ||
- Consumer, one or more inputs, no outputs | ||
- Producer, one or more outputs, no inputs | ||
- Transformer (and MultiTransformer), one or more inputs, one or more outputs | ||
|
||
The structure of our class (more precisely structs) will then be, in the general | ||
case of the transformer: | ||
|
||
``` cpp | ||
#include "GaudiAlg/Transformer.h" | ||
// Define BaseClass_t | ||
#include "k4FWCore/BaseClass.h" | ||
|
||
struct ExampleTransformer final | ||
: Gaudi::Functional::Transformer<colltype_out(const colltype_in&), BaseClass_t> { | ||
|
||
ExampleTransformer(const std::string& name, ISvcLocator* svcLoc); | ||
colltype_out operator()(const colltype_in& input) const override; | ||
}; | ||
``` | ||
you can use the `run` script in the `build` directory, like: | ||
Some key points: | ||
- The magic to make our algorithm work with EDM4hep collections happens by | ||
including `BaseClass.h` and passing `BaseClass_t` it as one of the template | ||
arguments to the Gaudi class we are inheriting from. | ||
- `operator()` is const, which means that it can't modify class members. This is | ||
intended and helps with multithreading by not having an internal state. | ||
```bash | ||
./run k4run ../K4TestFWCore/options/createExampleEventData.py | ||
Let's start with the first template argument. It's the signature of a function | ||
that returns one or more outputs and takes as input one or more inputs. | ||
One possible example would be to have these two lines before the class definition: | ||
``` cpp | ||
using colltype_in = edm4hep::MCParticleCollection; | ||
using colltype_out = edm4hep::MCParticleCollection; | ||
``` | ||
|
||
and then we have a transformer that will take one `MCParticleCollection` as | ||
input and return another one. If we have multiple inputs we keep adding | ||
arguments to the function arguments and if we don't have any we can leave that | ||
empty. For the output this is slightly more complicated because if there are | ||
more than one output we have to return an `std::tuple<OutputClass1, | ||
OutputClass2>`; if there aren't any outputs we can simply return `void`. | ||
|
||
Then we reach the constructor. We'll always initialize from the constructor of the | ||
class we're inheriting (in this example a `Transformer`) and then we'll | ||
initialize a set of `KeyValues`. These `KeyValues` will be how we define the | ||
names of our inputs and outputs so they can be found by other algorithms, read | ||
from a file or saved to a file. | ||
|
||
``` cpp | ||
ExampleTransformer(const std::string& name, ISvcLocator* svcLoc) | ||
: Transformer(name, svcLoc, | ||
KeyValue("InputCollection", "MCParticles"), | ||
KeyValue("OutputCollection", "NewMCParticles")) { | ||
// possibly do something | ||
} | ||
``` | ||
Here we are defining how we will name our input collection in the steering value | ||
(`InputCollection`) and giving it a default value. We're doing the same with the | ||
output collection. The order is important here: first inputs and then outputs | ||
and they are ordered. When we have more inputs we just add another line, like | ||
the one above for the input collection. For outputs, since they are bundled | ||
together in a `std::tuple` when there are several, we have to enclose the list | ||
of `KeyValue` with brackets, like | ||
``` cpp | ||
ExampleMultiTransformer(const std::string& name, ISvcLocator* svcLoc) | ||
: MultiTransformer(name, svcLoc, | ||
KeyValue("InputCollection", "MCParticles"), | ||
{ | ||
KeyValue("OutputCollection1", "NewMCParticles"), | ||
KeyValue("OutputCollection2", "SimTrackerHits"), | ||
KeyValue("OutputCollection3", "UsefulCollection"), | ||
} | ||
) { | ||
// possibly do something | ||
} | ||
``` | ||
|
||
Then in the `operator()` we can do whatever we want to do with our collections | ||
``` cpp | ||
colltype_out operator()(const colltype_in& input) const override { | ||
auto coll_out = edm4hep::MCParticleCollection(); | ||
for (const auto& particle : input) { | ||
auto new_particle = edm4hep::MutableMCParticle(); | ||
new_particle.setPDG(particle.getPDG() + 10); | ||
new_particle.setGeneratorStatus(particle.getGeneratorStatus() + 10); | ||
new_particle.setSimulatorStatus(particle.getSimulatorStatus() + 10); | ||
new_particle.setCharge(particle.getCharge() + 10); | ||
new_particle.setTime(particle.getTime() + 10); | ||
new_particle.setMass(particle.getMass() + 10); | ||
coll_out->push_back(new_particle); | ||
} | ||
return coll_out; | ||
``` | ||
When we return several collections we can bundle them in an `std::tuple` like this: | ||
## Exercise: Adding an Algorithm | ||
``` cpp | ||
return std::make_tuple(std::move(collection1), std::move(collection2)); | ||
``` | ||
|
||
The repository contains an `EmptyAlg` in `K4TestFWCore/src/components`. | ||
The complete example for reference can be found in the tests of k4FWCore: | ||
https://github.com/key4hep/k4FWCore/blob/main/test/k4FWCoreTest/src/components/ExampleFunctionalTransformer.cpp | ||
|
||
## The steering file | ||
|
||
* As a first exercise, copy and modify this algorithm to print out the current event number. | ||
The steering file is the file where we define which algorithms will run, what | ||
parameters they will use and how they will do it; what level of logging, if | ||
using multithreading, etc. | ||
|
||
* Second step: If you used `std::cout` in the first step, try to use the gaudi logging service instead. | ||
We start with some imports | ||
|
||
* Third Step: Print out a string before the event number that should be configurable at runtime. | ||
``` python | ||
from Gaudi.Configuration import INFO | ||
from Configurables import ExampleFunctionalTransformer | ||
from Configurables import ApplicationMgr | ||
from Configurables import k4DataSvc | ||
from Configurables import PodioOutput | ||
from Configurables import PodioInput | ||
``` | ||
|
||
* Finally: use the Gaudi Random Number Generator Service to approximate pi with a [Monte Carlo Integration](https://en.wikipedia.org/wiki/Monte_Carlo_integration) | ||
it's also possible to import everything from `Configurables` but it's better not | ||
to so that if we are using IDE or an editor with some kind of analysis it can | ||
tell us if we are using an undefined variable, for example. | ||
|
||
Then, the input: | ||
|
||
``` python | ||
podioevent = k4DataSvc("EventDataSvc") | ||
podioevent.input = "output_k4test_exampledata_producer.root" | ||
|
||
inp = PodioInput() | ||
inp.collections = [ | ||
"MCParticles", | ||
] | ||
``` | ||
|
||
We select the name of the input file and which collections we'll make available | ||
for the rest of the algorithms. | ||
|
||
For the output: | ||
|
||
``` python | ||
out = PodioOutput("out") | ||
out.filename = "output_k4test_exampledata_transformer.root" | ||
# The collections that we don't drop will also be present in the output file | ||
out.outputCommands = ["drop MCParticles"] | ||
``` | ||
|
||
we can select which collections we keep in the output file. By default the | ||
collections in the output file will be the same as in the input file. Check the | ||
[relevant | ||
documentation](https://github.com/key4hep/k4FWCore/blob/main/doc/PodioInputOutput.md) | ||
to learn more about `PodioInput` and `PodioOutput`. | ||
|
||
Our algorithm will look like this: | ||
|
||
``` python | ||
transformer = ExampleFunctionalTransformer("ExampleFunctionalTransformer", | ||
InputCollection="MCParticles", | ||
OutputCollection="NewMCParticles") | ||
``` | ||
|
||
If we have defined `Gaudi::Property`s for our algorithm it is also possible to | ||
change them by doing `transformer.property = value`; however with the names of | ||
the collections, if they are provided, they are set when creating the python | ||
object with our algorithm. | ||
|
||
Finally we define what to run: | ||
|
||
``` python | ||
ApplicationMgr(TopAlg=[inp, transformer, out], | ||
EvtSel="NONE", | ||
EvtMax=10, | ||
ExtSvc=[k4DataSvc("EventDataSvc")], | ||
OutputLevel=INFO, | ||
) | ||
``` | ||
|
||
We pass a list of the algorithms in `TopAlg`. `PodioInput` will be the first one | ||
and `PodioOutput` will be the last one when used. In `EvtMax` we set what is the | ||
maximum number of event that we are processing. Use -1 not to limit it. That | ||
means if we are processing a file, then read all the events in the file. We pass | ||
extra services to `ExtSvc` and set an `OutputLevel` that could be `DEBUG`, | ||
`WARNING` or `INFO` most of the time. | ||
|
||
## Initialize and finalize | ||
There are some occasions where we may want to run some code between the | ||
constructor and the `operator()`; that is the place for `initialize()`. There is | ||
also a way of doing something similar after processing with `finalize()`. For that, we | ||
can add to our classes those functions (we can also add only one of these): | ||
|
||
``` cpp | ||
StatusCode initialize() override; | ||
StatusCode finalize() override; | ||
``` | ||
|
||
and then we can implement them. | ||
|
||
Make sure to remember to return the corresponding status code, otherwise | ||
Gaudi will crash. For example: | ||
|
||
``` cpp | ||
StatusCode MyAlgorithm::initialize() { | ||
// do something | ||
return StatusCode::SUCCESS; | ||
} | ||
``` | ||
|
||
## Debugging: How to use GDB | ||
|
||
|
@@ -75,3 +297,14 @@ GDB console. To interrupt running of the Gaudi steering use `CTRL+C`. | |
|
||
More details how to run GDB with Gaudi can be found in | ||
[LHCb Code Analysis Tools](https://twiki.cern.ch/twiki/bin/view/LHCb/CodeAnalysisTools#Debugging_gaudirun_py_on_Linux_w). | ||
|
||
## Avoiding const in `operator()` | ||
There is a way of working around `operator()` being const and that is by adding | ||
the keyword `mutable` to our data member. This will allow us to change our data | ||
member inside `operator()` and will cause code that wasn't compiling because of | ||
this to compile. Of course, this is not a good idea because unless the member of | ||
our class is thread-safe, that means that our algorithm is no longer thread-safe | ||
and running with multiple threads can cause different results. Even worse than | ||
that, it's very possible that there are not any errors or crashes but the | ||
results are simply wrong from having several threads changing a member at the | ||
same time, for example. |