Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to remap names in the standalone LCIO to EDM4hep conversion #61

Merged
merged 3 commits into from
May 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 21 additions & 2 deletions doc/LCIO2EDM4hep.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,15 @@ that it can patch in potentially missing collections on the fly. This additional
information comes in the form of a third argument to `lcio2edm4hep` and is
effectively a list of collection names and their types that comprise the
superset of all collections appearing in at least one event in the input LCIO
file. The format looks like this, where each collection is a single line
containing the name first and than its type, e.g.
file. The format looks like this

```
name[:output-name] type-name
```

Each collection is a single line containing the name first (including an
optional output name [see below](#renaming-collections-on-the-fly)) and than its
type. The simplest form looks like this:

```
SETSpacePoints TrackerHit
Expand Down Expand Up @@ -61,6 +68,18 @@ of the `colltypefile` to determine the contents of the output. If that contains
only a subset of all collections, only that subset will be converted. Missing
collections will still be patched in, in this case.

## Renaming collections on the fly
The optional `[:output-name]` part of each collection can be used to remap the
names of the collections in the input LCIO file to a different name in the
output EDM4hep file, e.g.

```
MCParticle:MCParticles MCParticle
```

will read the `MCParticle` collection from the input file but store it as
`MCParticles` in the output file.

# Library usage of the conversion functions
The conversion functions are designed to also be usable as a library. The overall design is to make the conversion a two step process. Step one is converting the data and step two being the resolving of the relations and filling of subset collection.

Expand Down
16 changes: 10 additions & 6 deletions k4EDM4hep2LcioConv/include/k4EDM4hep2LcioConv/k4Lcio2EDM4hepConv.h
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ using MutableTrackerHit3D = edm4hep::TrackerHit;
#include <string>
#include <tuple>
#include <unordered_map>
#include <utility>
#include <vector>

namespace LCIO2EDM4hepConv {
Expand Down Expand Up @@ -103,13 +104,16 @@ podio::Frame convertRunHeader(EVENT::LCRunHeader* rheader);
/**
* Convert a complete LCEvent from LCIO to EDM4hep.
*
* A second, optional argument can be passed to limit the collections to
* convert to the subset that is passed. NOTE: There is an implicit assumption
* here that collsToConvert only contains collection names that are present in
* the passed evt. There is no exception handling internally to guard against
* collections that are missing.
* A second, optional argument can be passed to limit the collections to convert
* to the subset that is passed. Additionally, it allows to rename collections
* on the fly where the first element of each pair is the (original) LCIO name
* and the second one is the one that is used for the EDM4hep collection.
*
* NOTE: There is an implicit assumption here that collsToConvert only contains
* collection names that are present in the passed evt. There is no exception
* handling internally to guard against collections that are missing.
*/
podio::Frame convertEvent(EVENT::LCEvent* evt, const std::vector<std::string>& collsToConvert = {});
podio::Frame convertEvent(EVENT::LCEvent* evt, const std::vector<std::pair<std::string, std::string>>& = {});

/**
* Convert an LCIOCollection by dispatching to the specific conversion
Expand Down
22 changes: 14 additions & 8 deletions k4EDM4hep2LcioConv/src/k4Lcio2EDM4hepConv.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -62,20 +62,26 @@ std::vector<edm4hep::utils::ParticleIDMeta> getPIDMetaInfo(const EVENT::LCCollec
return pidInfos;
}

podio::Frame convertEvent(EVENT::LCEvent* evt, const std::vector<std::string>& collsToConvert) {
podio::Frame convertEvent(EVENT::LCEvent* evt, const std::vector<std::pair<std::string, std::string>>& collsToConvert) {
auto typeMapping = LcioEdmTypeMapping{};
std::vector<CollNamePair> edmevent;
std::vector<std::pair<std::string, EVENT::LCCollection*>> LCRelations;

const auto& lcioNames = [&collsToConvert, &evt]() {
const auto collNames = [&collsToConvert, &evt]() {
if (collsToConvert.empty()) {
return *evt->getCollectionNames();
const auto evtColls = evt->getCollectionNames();
std::vector<std::pair<std::string, std::string>> collNames{};
collNames.reserve(evtColls->size());
for (const auto& name : *evtColls) {
collNames.emplace_back(name, name);
}
return collNames;
}
return collsToConvert;
return std::move(collsToConvert);
}();

// In this loop the data gets converted.
for (const auto& lcioname : lcioNames) {
for (const auto& [lcioname, edm4hepName] : collNames) {
const auto& lcioColl = evt->getCollection(lcioname);
const auto& lciotype = lcioColl->getTypeName();
if (lciotype == "LCRelation") {
Expand All @@ -86,22 +92,22 @@ podio::Frame convertEvent(EVENT::LCEvent* evt, const std::vector<std::string>& c
}

if (!lcioColl->isSubset()) {
for (auto&& [name, edmColl] : convertCollection(lcioname, lcioColl, typeMapping)) {
for (auto&& [name, edmColl] : convertCollection(edm4hepName, lcioColl, typeMapping)) {
if (edmColl != nullptr) {
edmevent.emplace_back(std::move(name), std::move(edmColl));
}
}
}
}
// Filling of the Subset Colections
for (const auto& lcioname : lcioNames) {
for (const auto& [lcioname, edm4hepName] : collNames) {

auto lcioColl = evt->getCollection(lcioname);
if (lcioColl->isSubset()) {
const auto& lciotype = lcioColl->getTypeName();
auto edmColl = fillSubset(lcioColl, typeMapping, lciotype);
if (edmColl != nullptr) {
edmevent.emplace_back(lcioname, std::move(edmColl));
edmevent.emplace_back(edm4hepName, std::move(edmColl));
}
}
}
Expand Down
64 changes: 48 additions & 16 deletions standalone/lcio2edm4hep.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,29 +20,56 @@ using ROOTWriter = podio::ROOTFrameWriter;
#include <fstream>
#include <iostream>
#include <iterator>
#include <optional>
#include <string>
#include <utility>
#include <vector>

std::vector<std::pair<std::string, std::string>> getNamesAndTypes(const std::string& collTypeFile) {
/// Simple helper struct to group information about a collection to be converted
struct NamesType {
std::string lcioName{}; ///< The name in the lcio file
std::string edm4hepName{}; ///< The name in the edm4hep file
std::string typeName{}; ///< The LCIO type
};

/// Convert a config file line into a NamesType struct
std::optional<NamesType> fromConfigLine(std::string line) {
NamesType info;
std::stringstream sline(std::move(line));
std::string name;
// This only looks for the first two words in the line and ignores
// everything that comes after that.
if (!(sline >> name >> info.typeName)) {
std::cerr << "need a name (mapping) and a type per line" << std::endl;
return std::nullopt;
}

if (const auto colon = name.find(':'); colon != std::string::npos) {
info.lcioName = name.substr(0, colon);
info.edm4hepName = name.substr(colon + 1);
} else {
info.lcioName = name;
info.edm4hepName = name;
}

return info;
}

std::vector<NamesType> getNamesAndTypes(const std::string& collTypeFile) {
std::ifstream input_file(collTypeFile);
std::vector<std::pair<std::string, std::string>> names_types;
std::vector<NamesType> names_types;

if (!input_file.is_open()) {
std::cerr << "Failed to open file countaining the names and types of the LCIO Collections." << std::endl;
std::cerr << "Failed to open file containing the names and types of the LCIO Collections." << std::endl;
}
std::string line;
while (std::getline(input_file, line)) {
std::stringstream sline(std::move(line));
std::string name, type;
// This only looks for the first two words in the line and ignores everything that comes after that.
if (!(sline >> name >> type)) {
std::cerr << "need a name and a type per line" << std::endl;
auto lineInfo = fromConfigLine(std::move(line));
if (!lineInfo) {
return {};
}
names_types.emplace_back(std::move(name), std::move(type));
names_types.emplace_back(std::move(lineInfo.value()));
}

input_file.close();

return names_types;
Expand Down Expand Up @@ -132,26 +159,31 @@ int main(int argc, char* argv[]) {
const auto args = parseArgs({argv, argv + argc});

UTIL::CheckCollections colPatcher{};
std::vector<std::pair<std::string, std::string>> namesTypes{};
std::vector<NamesType> namesTypes{};
const bool patching = !args.patchFile.empty();
if (patching) {
namesTypes = getNamesAndTypes(args.patchFile);
if (namesTypes.empty()) {
std::cerr << "The provided list of collection names and types does not satisfy the required format: Pair of Name "
"and Type per line separated by space"
"(mapping) and Type per line separated by space"
<< std::endl;
return 1;
}
colPatcher.addPatchCollections(namesTypes);
std::vector<std::pair<std::string, std::string>> patchNamesTypes{};
patchNamesTypes.reserve(namesTypes.size());
for (const auto& [name, _, type] : namesTypes) {
patchNamesTypes.emplace_back(name, type);
}
colPatcher.addPatchCollections(patchNamesTypes);
}
// Construct a vector of collections to convert. If namesTypes is empty, this
// will be empty, and convertEvent will fall back to use the collections in
// the event
const auto collsToConvert = [&namesTypes]() {
std::vector<std::string> names;
std::vector<std::pair<std::string, std::string>> names{};
names.reserve(namesTypes.size());
for (const auto& [name, type] : namesTypes) {
names.emplace_back(name);
for (const auto& [lcioName, edm4hepName, _] : namesTypes) {
names.emplace_back(lcioName, edm4hepName);
}
return names;
}();
Expand Down
Loading