-
Notifications
You must be signed in to change notification settings - Fork 0
Home
After the initial design and protoytpe, we have keep improving the component.
The current version is a proof of concept (PoC), which offers the following features:
- automatic generation of the manifest, using as starting point a description of the data-metadata relations expressed in json-ld format.
- validation of the manifest
- upload of the metadata to the graphDB
- versioning of the manifest
- comparison of the multiple versions of a manifest and update of the graph according to the differences between them
The current component works as an extension of the b2safe core package (https://github.com/EUDAT-B2SAFE/B2SAFE-core), according to its architecture. So it cannot be deployed independently.
The specific rule set for the metadata needs to be added to the iRODS configuration in /etc/irods/server_config.json.
The specific python scripts need to be linked under the iRODS path: /var/lib/irods/msiExecCmd_bin.
Assuming the component is deployed in the following path: /opt/eudat/b2safe-metadata, then a set of configuration files is placed under /opt/eudat/b2safe-metadata/conf. In particular:
- mets_factory.conf
- b2safe_neo4j.conf
- EudatControlledVocabulary.jsonld
- metadata.json
they must be modified according to the documentation.
Moreover the path to b2safe_neo4j.conf must be added in file the rulebase/metadata.re:
getMetadataConfParameters(*mdConfPath) {
*mdConfPath="/opt/eudat/b2safe-metadata/conf/b2safe_neo4j.conf";
}
Finally two additional software are required:
- a graphDB, implemented through neo4j (v2).
- a messaging system
Once configured the component can be used to publish the metadata to the local metadata store.
There are a couple of test scripts, which can be used to verify that the configuration is fine.
A typical workflow could be the following one:
- the user uploads a collection with the manifest,
- the B2SAFE administrator defines a cron job to execute periodically the rule to extract the system metadata (EUDATPushMetadata(*path, *queue) in the b2safe core component) and push them to a messaging system
- the B2SAFE administrator defines a cron job to execute periodically the rule (EUDATStoreMetadata(*collPath, *user)) to parse the manifest and create/update the graph in the graphDB (local metadata store)
The manifest can be written by the user or generated automatically, starting from the metadata.json, using the script cmd/mets_factory.py (see executables)
-
The prototype shows that the neo4j is suitable for metadata saving. One can take the idea of the graph as a tool for the representation of all metadata. So system and community metadata, no matter in what schema it was saved, would be represented as nodes and relations between them.
-
To deal with many kinds of metadata representations and to be able to collect not only the system but also the descriptive/community specific metadata, we will need to think about an automated metadata extraction through different metadata parsers combined in a server component, that can analyze a collection and build a chain of such parsers to extract, connect and save as much metadata of one collection as possible.
-
To connect the metadata storages, that are aside B2SAFE, and make search in all storages possible we need a central component that can act as some kind of “registry server”.
- the script cmd/mets_factory.py is able to produce a manifest document, taking as input the description of the metadata relations in json linked data format and the iRODS path of the collection. It is able to link together the root manifest with others, in case they are available in the sub-collections. However, it is not smart enough to exclude from the root manifest the objects, already tracked in the sub-collection manifests, therefore it is possible that the manifests within the same hierarchy have overlaps.