From 4ad8e9e9dd3b48907dab282c46ecf037f55cb2d6 Mon Sep 17 00:00:00 2001 From: Rolf Krahl Date: Wed, 3 Jan 2024 19:09:50 +0100 Subject: [PATCH] Add the text content for the subsection on ICAT data XML files --- doc/src/file-icatdata.rst | 73 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 72 insertions(+), 1 deletion(-) diff --git a/doc/src/file-icatdata.rst b/doc/src/file-icatdata.rst index a012638..b856e62 100644 --- a/doc/src/file-icatdata.rst +++ b/doc/src/file-icatdata.rst @@ -70,13 +70,83 @@ ICAT data XML files ~~~~~~~~~~~~~~~~~~~ In this section we describe the ICAT data file format using the XML -backend. +backend. Consider the following example: .. literalinclude:: ../examples/icatdump-simple-1.xml :language: xml +The root element of ICAT data XML files is ``icatdata``. It may +optionally have one ``head`` subelement and one or more ``data`` +subelements. + +The ``head`` element will be ignored by :ref:`icatingest`. It serves +to provide some information on the context of the creation of the data +file, which may be useful for debugging in case of issues. + +The content of each ``data`` element is one chunk according to the +logical structure explained above. The present example contains two +chunks. Each element within the ``data`` element corresponds to an +ICAT object according to the ICAT schema. In the present example, the +first chunk contains five User objects and three Grouping objects. +The second chunk only contains one Investigation. + +These object elements should have an ``id`` attribute that may be used +to reference the object in relations later on. The ``id`` value has +no meaning other than this file internal referencing between objects. +The subelements of the object elements correspond to the object's +attributes and relations in the ICAT schema. All many-to-one +relations must be provided and reference already existing objects, +e.g. they must either already have existed before starting the +ingestion or appear earlier in the ICAT data file than the referencing +object, so that they will be created earlier. The related object may +either be referenced by id using the special attribute ``ref`` or by +the related object's attribute values, using XML attributes of the +same name. In the latter case, the attribute values must uniquely +define the related object. + +The object elements may include one-to-many relations. In this case, +the related objects will be created along with the parent in one +single cascading call. Alternatively, these related objects may be +added separately as subelements of the ``data`` element later in the +file. In the present example, the Grouping object include their +related UserGroup objects. Note that these UserGroups include their +relation to the User. The User object is referenced by their +respective id in the ``ref`` attribute. But the UserGroups do not +include their relation with Grouping. That relationship is implied by +the parent relation of the object in the file. + +In a similar way, the Investigation in the second chunk includes +related InvestigationGroups that will be created along with the +Investigation. The InvestigationGroup objects include a reference to +the corresponding Grouping. Note that these references go across +chunk boundaries. The index that caches the object ids to resolve +object relations from the first chunk that did contain the ids of the +Groupings will already have been discarded from memeory when the +second chunk is read. But the references use the key that can be +passed to :meth:`icat.client.Client.searchUniqueKey` to search these +Groupings from ICAT. + +Finally note the the file format also depends on the ICAT schema +version: the present example can only be ingested into ICAT server 5.0 +or newer, because the attributes fileCount and fileSize have been +added to Investigation in this version. With older ICAT versions, it +will fail because the attributes are not defined. + +Consider a second example, it defines a subset of the same content +as the previous example: + .. literalinclude:: ../examples/icatdump-simple-2.xml :language: xml + :lines: 1-9,28-52,56-58,70-82,108 + +The difference is that we now add the Usergroup objects separately in +direct subelements of ``data`` instead of including them in the +related Grouping objects. + +You will find more extensive examples in the source distribution of +python-icat. The distribution also provides XML Schema Definition +files for the ICAT data XML file format corresponding to various ICAT +schema versions. ICAT data YAML files ~~~~~~~~~~~~~~~~~~~~ @@ -89,6 +159,7 @@ backend. .. literalinclude:: ../examples/icatdump-simple-2.yaml :language: yaml + :lines: 1-7,10-11,14,23-45,52-60 .. [#dc] There is one exception: DataCollections don't have a