From d5d1abc64b8082e210b4d0d9911f215e69b6f145 Mon Sep 17 00:00:00 2001 From: Mateusz Jakub Fila Date: Mon, 16 Dec 2024 17:19:01 +0100 Subject: [PATCH 1/4] update readme --- README.md | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 12b69561..ed2c140e 100644 --- a/README.md +++ b/README.md @@ -1,25 +1,40 @@ # k4FWCore (key4hep FrameWork Core) -k4FWCore is a Gaudi package that provides the PodioDataService, which allows to +k4FWCore is a Gaudi package that provides the IOSvc, which allows to use podio-based event data models like EDM4hep in Gaudi workflows. -k4FWCore also provides the `k4run` script used to run Gaudi steering files. +k4FWCore also provides the `k4run` script used to run Gaudi steering files. See the [documentation](doc/k4run-args.md) for more information. ## Components ### Basic I/O -#### k4DataSvc +| Current | Legacy | Description | +|---------|--------|-| +| IOSvc | k4DataSvc | Service handling the PODIO types and collections | +| Reader | PodioInput | Algorithm to read data from input files on disk. | +| Writer | PodioOutput | Algorithm to write data to an output file on disk. | +| MetadataSvc | MetaDataHandle | Service/Handle handling user defined metadata | -Component wrapping the PodioDataService to handle PODIO types and collections. +See the [documentation](doc/IO.md) for more information. -#### PodioInput +### Auxiliary -Algorithm to read data from one or multiple input file(s) on disk. +### Collection Merger -#### PodioOutput +Algorithm merging multiple collections of the same type into a single collection. -Algorithm to write data to an output file on disk. +### EventHeaderCreator + +Algorithm creating new `edm4hep::EventHeaderCollection` data object. + +### EventCounter + +Algorithm counting processed events and printing heart-bit. + +### UniqueIDGenSvc + +Service generating unique, reproducible numbers to be used for seeding RNG used by the algorithms. See the [documentation](doc/uniqueIDGen.md) for more information. ## k4run ``` @@ -57,6 +72,8 @@ print(my_opts[0].foo) * Gaudi +* EDM4HEP + ## Installation and downstream usage. k4FWCore is a CMake project. After setting up the dependencies (use for example `source /cvmfs/sw.hsf.org/key4hep/setup.sh`) From 13da285de0b81a42180b5589dc4f6784fe61a6c0 Mon Sep 17 00:00:00 2001 From: Mateusz Jakub Fila Date: Tue, 17 Dec 2024 10:39:36 +0100 Subject: [PATCH 2/4] update and move old IO documentation to legacy page --- ...putOutput.md => LegacyPodioInputOutput.md} | 20 +++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) rename doc/{PodioInputOutput.md => LegacyPodioInputOutput.md} (84%) diff --git a/doc/PodioInputOutput.md b/doc/LegacyPodioInputOutput.md similarity index 84% rename from doc/PodioInputOutput.md rename to doc/LegacyPodioInputOutput.md index 600327f0..615689fc 100644 --- a/doc/PodioInputOutput.md +++ b/doc/LegacyPodioInputOutput.md @@ -16,14 +16,18 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> -# Reading and writing EDM4hep files in Gaudi - -The facilities to read and write EDM4hep (or in general event data models based -on podio) are provided by [`k4FWCore`](https://github.com/key4hep/k4FWCore). -This page will describe their usage, but not go into too much details of their -internals. This page also assumes a certain familiarity with Gaudi, i.e. most of -the snippets just show a minimal configuration part, and not a complete runnable -example. +# Legacy reading and writing EDM4hep files in Gaudi with the legacy k4DataSvc + +:::{caution} +`k4DataSvc` is a legacy service previously used in K4FWCore for reading and writing data in EDM4hep or other data models based on PODIO. + +The currently used service is `IOSvc`, which offers improved streamlined functionality and better support for modern workflows. For detailed documentation on `IOSvc`, refer to [this documentation](IO.md). +::: + +This page will describe the usage of legacy [k4FWCore](https://github.com/key4hep/k4FWCore) +facilities to read and write EDM4hep. This page also assumes a certain +familiarity with Gaudi, i.e. most of the snippets just show a minimal +configuration part, and not a complete runnable example. ## The `k4DataSvc` From d499d700e6bb4edb0ec5e52fdbb91afbd5605f75 Mon Sep 17 00:00:00 2001 From: Mateusz Jakub Fila Date: Tue, 17 Dec 2024 10:42:09 +0100 Subject: [PATCH 3/4] add documentation for IOSvc --- doc/PodioInputOutput.md | 235 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 235 insertions(+) create mode 100644 doc/PodioInputOutput.md diff --git a/doc/PodioInputOutput.md b/doc/PodioInputOutput.md new file mode 100644 index 00000000..4e97a1f5 --- /dev/null +++ b/doc/PodioInputOutput.md @@ -0,0 +1,235 @@ + +# Reading and writing EDM4hep files in Gaudi + +The facilities to read and write EDM4hep (or in general event data models based on podio) are provided by [k4FWCore](https://github.com/key4hep/k4FWCore). This page will describe their usage, but not go into too much details of their internals. This page also assumes a certain familiarity with Gaudi, i.e. most of the snippets just show a minimal configuration part, and not a complete runnable example. + +## Accessing event data + +`IOSvc` is an external Gaudi service for reading and writing EDM4hep and other PODIO-based data models. The service should be imported from `k4FWCore` and named "IOSvc" as other components may look for it under this name. + +```python +from k4FWCore import IOSvc + +io_svc = IOSvc("IOSvc") +``` + +After instantiation the service should be register as an external service in the `ApplicationMgr`. Similarly, it's important to import the `ApplicationMgr` from `k4FWCore`: + +```python +from k4FWCore import ApplicationMgr + +ApplicationMgr( + # other args + ExtSvc=[ + io_svc, + # other services + ] +) +``` + +### Reading events + +The `IOSvc` supports reading ROOT files containing PODIO-based data-models such as EDM4hep. Both files written with ROOT TTree or RNTuple backend are supported with the backend inferred automatically from the files themselves. + +The `Input` property can be used to specify the input. The `IOSvc` will not read any files unless the `Input` property is specified. + +::::{tab-set} +:::{tab-item} Python +```python +io_svc.Input = "input.root" +``` +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.Input input.root +``` +::: +:::: + +:::{note} +The value assigned to the `Input` will be processed as is, in particular without regular expression or glob expansion. +::: + +A list of filenames can be given in order to specify multiple input files: + +::::{tab-set} +:::{tab-item} Python +```python +io_svc.Input = ["input.root", "another_input.root", ] +``` +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.Input input.root another_input.root +``` +::: +:::: + + +During processing, for each event in the Gaudi event loop the `IOSvc` will read a frame from the input and populate the Gaudi Transient Event Store (TES) with the collections stored in that frame. + +The `FirstEventEntry` property of `IOSvc` can be used to start processing from a given frame instead of from the first frame in the input: + +::::{tab-set} +:::{tab-item} Python +```python +io_svc.FirstEventEntry = 7 # default 0 +``` +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.FirstEventEntry 7 +``` +::: +:::: + +A list of collection names can be assigned to the `CollectionNames` property of `IOSvc` to limit the number of collections that will be populated. Without specifying the `CollectionNames` all present collections will be read and put into TES. + +::::{tab-set} +:::{tab-item} Python +```python +io_svc.CollectionNames = ["MCParticles", "SimTrackerHits"] +``` +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.CollectionNames "MCParticles" "SimTrackerHits" +``` +::: +:::: + +### Writing events + +The `IOSvc` supports writing PODIO-based data-models such as EDM4hep to the ROOT output. The `Output` property can be used to specify the output. The `IOSvc` will not write any files unless the `Output` property is specified. + +::::{tab-set} +:::{tab-item} Python +```python +io_svc.Output = "output.root" +``` +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.Output output.root +``` +::: +:::: + +:::{note} +Unlike the `Input`, the `Output` property should be a single string even when writing multiple files is expected. When the size limit for an output file is reached, the system will automatically open a new file and start writing to it. +::: + +The writing backend can be specified with the `OutputType` property of `IOSvc`. The allowed values are `"ROOT"` for TTree-based output or `"RNTuple"` for RNTuple-based output. By default the `"ROOT"` backend is used. + +::::{tab-set} +:::{tab-item} Python +```python +io_svc.OutputType = "RNTuple" +``` +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.OutputType "RNTuple" +``` +::: +:::: + +During processing, at the end of each event from the Gaudi event loop the `IOSvc` will write a frame with the collection present in TES. By default all the collections will be written. The `outputCommands` property of `IOSvc` can be used to specify commands to select which collections should be written. For example, the following commands will skip writing all the collections except for the collections named `MCParticles1`, `MCParticles2` and `SimTrackerHits`: + +::::{tab-set} +:::{tab-item} Python +```python +io_svc.outputCommands = [ + "drop *", + "keep MCParticles1", + "keep MCParticles2", + "keep SimTrackerHits", +] +``` +::: +:::{tab-item} CLI +```sh +k4run --IOSvc.outputCommands \ + "drop *" \ + "keep MCParticles1" \ + "keep MCParticles2" \ + "keep SimTrackerHits" +``` +::: +:::: + +## Accessing metadata + +The k4FWCore provides the `MetadataSvc` that allows accessing user metadata in PODIO-based data-models. There is no need to instantiate the `MetadataSvc` explicitly when using `IOSvc` as `IOSvc` can instantiate it on its own if needed. + +When both `Input` and `Output` properties of `IOSvc` are defined, all the metadata originally present in the input will be propagated to the output, possibly adding also any user metadata created during processing. + +Unlike event data, metadata is not exposed to users through the Gaudi TES and cannot be accessed directly by algorithms in the same way. Instead, handling metadata is encapsulated within the algorithm implementation itself. For more details on how this is managed, refer to the developer documentation. + + +## Migrating from the legacy `k4DataSvc` + +Migrating from the legacy `k4DataSvc` or `PodioDataSvc` is rather straightforward. On a steering file level the `PodioDataSvc` should be replaced with the `IOSvc`, while the `PodioInput` and `PodioOutput` algorithms should be removed. For example: + +```diff +-from Configurables import k4DataSvc +-from Configurables import PodioInput +-from Configurables import PodioOutput ++from k4FWCore import IOSvc +from k4FWCore import ApplicationMgr +from Configurables import SelectorAlg + +-podioevent = k4DataSvc("EventDataSvc") +-podioevent.input = "example_input.root" ++io_svc = IOSvc("IOSvc") ++io_svc.Input= "example_output.root" + +-inp = PodioInput() +-inp.collections = ["MCParticles", "SimTrackerHits", "TrackerHits", "Tracks"] ++io_svc.CollectionNames = ["MCParticles", "SimTrackerHits"] + +alg = SelectorAlg( + "Selector", + InputParticles="MCParticles", + InputHits="SimTrackerHits", + Output="SelectedParticles", +) + +-oup = PodioOutput() +-oup.filename = "example_output.root" +-oup.outputCommands = ["drop MCParticles"] ++io_svc.Output = "example_output.root" ++oup.outputCommands = ["drop MCParticles"] + + +ApplicationMgr( +- TopAlg=[inp, alg,oup], ++ TopAlg=[alg], + EvtSel="NONE", +- ExtSvc=[podioevent], ++ ExtSvc=[io_svc], +) +``` + +Both functional algorithms and classic algorithms are compatible with either `IOSvc` or `PodioDataSvc`. + +The biggest challenge for the transition are the algorithms and services that explicitly request and operate on `PodioDataSvc`. These components are not compatible with `IOSvc` and their internals have to be adapted for the usage with `IOSvc` case by case. + +If you encounter an algorithm or service that seems to be incompatible `IOSvc`, please open an issue in the bugtracker to report it for further investigation. From fe93da571f150dfebc16fffaea82a6379ec7111e Mon Sep 17 00:00:00 2001 From: Mateusz Jakub Fila Date: Tue, 17 Dec 2024 11:14:26 +0100 Subject: [PATCH 4/4] add comment on IOSvc default name --- doc/PodioInputOutput.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/PodioInputOutput.md b/doc/PodioInputOutput.md index 4e97a1f5..f412fdd2 100644 --- a/doc/PodioInputOutput.md +++ b/doc/PodioInputOutput.md @@ -27,7 +27,7 @@ The facilities to read and write EDM4hep (or in general event data models based ```python from k4FWCore import IOSvc -io_svc = IOSvc("IOSvc") +io_svc = IOSvc("IOSvc") # or just IOSvc() as "IOSvc" name is used by default ``` After instantiation the service should be register as an external service in the `ApplicationMgr`. Similarly, it's important to import the `ApplicationMgr` from `k4FWCore`: