Skip to content

Commit

Permalink
Merge pull request #176 from FAIRmat-NFDI/165-merging-of-multiple-files
Browse files Browse the repository at this point in the history
Introduces a new mapping flag and auto merge of partial NeXus files feature
  • Loading branch information
sherjeelshabih authored Nov 21, 2023
2 parents d904c8d + 32abc7a commit 4cc45d3
Show file tree
Hide file tree
Showing 16 changed files with 251 additions and 28 deletions.
8 changes: 3 additions & 5 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# This file is autogenerated by pip-compile with Python 3.9
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile --extra=dev --output-file=dev-requirements.txt pyproject.toml
Expand All @@ -19,7 +19,6 @@ astroid==2.15.5
attrs==22.1.0
# via
# cattrs
# pytest
# requests-cache
backcall==0.2.0
# via ipython
Expand Down Expand Up @@ -191,6 +190,8 @@ matplotlib-scalebar==0.8.1
# via orix
mccabe==0.7.0
# via pylint
mergedeep==1.3.4
# via pynxtools (pyproject.toml)
mpmath==1.2.1
# via sympy
mypy==1.2.0
Expand Down Expand Up @@ -341,8 +342,6 @@ psutil==5.9.2
# pyxem
ptyprocess==0.7.0
# via pexpect
py==1.11.0
# via pytest
pycifrw==4.4.5
# via diffpy-structure
pycodestyle==2.9.1
Expand Down Expand Up @@ -511,7 +510,6 @@ typing-extensions==4.3.0
# astroid
# mypy
# numcodecs
# pylint
tzdata==2023.3
# via pytz-deprecation-shim
tzlocal==4.3
Expand Down
36 changes: 36 additions & 0 deletions examples/json_map/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# JSON Map Reader

## What is this reader?

This reader is designed to allow users of pynxtools to convert their existing data with the help of a map file. The map file tells the reader what to pick from your data files and convert them to FAIR NeXus files. The following formats are supported as input files:
* HDF5 (any extension works i.e. h5, hdf5, nxs, etc)
* JSON
* Python Dict Objects Pickled with [pickle](https://docs.python.org/3/library/pickle.html). These can contain [xarray.DataArray](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html) objects as well as regular Python types and Numpy types.

It accepts any NXDL file that you like as long as your mapping file contains all the fields.
Please use the --generate-template function of the dataconverter to create a .mapping.json file.

```console
user@box:~$ dataconverter --nxdl NXmynxdl --generate-template > mynxdl.mapping.json
```
##### Details on the [mapping.json](/pynxtools/dataconverter/readers/json_map/README.md#the-mappingjson-file) file.

## How to run these examples?

### Automatically merge partial NeXus files
```console
user@box:~$ dataconverter --nxdl NXiv_temp --input-file voltage_and_temperature.nxs --input-file current.nxs --output auto_merged.nxs
```

### Map and copy over data to new NeXus file
```console
user@box:~$ dataconverter --nxdl NXiv_temp --mapping merge_copied.mapping.json --input-file voltage_and_temperature.nxs --input-file current.nxs --output merged_copied.nxs
```

### Map and link over data to new NeXus file
```console
user@box:~$ dataconverter --nxdl NXiv_temp --mapping merge_linked.mapping.json --input-file voltage_and_temperature.nxs --input-file current.nxs --output merged_linked.nxs
```

## Contact person in FAIRmat for this reader
Sherjeel Shabih
Binary file added examples/json_map/auto_merged.nxs
Binary file not shown.
Binary file added examples/json_map/current.nxs
Binary file not shown.
35 changes: 35 additions & 0 deletions examples/json_map/merge_copied.mapping.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"/@default": "entry",
"/ENTRY[entry]/DATA[data]/current": "/entry/data/current",
"/ENTRY[entry]/DATA[data]/current_295C": "/entry/data/current_295C",
"/ENTRY[entry]/DATA[data]/current_300C": "/entry/data/current_300C",
"/ENTRY[entry]/DATA[data]/current_305C": "/entry/data/current_305C",
"/ENTRY[entry]/DATA[data]/current_310C": "/entry/data/current_310C",
"/ENTRY[entry]/DATA[data]/temperature": "/entry/data/temperature",
"/ENTRY[entry]/DATA[data]/voltage": "/entry/data/voltage",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/voltage_controller/calibration_time": "/entry/instrument/environment/voltage_controller/calibration_time",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/voltage_controller/run_control": "/entry/instrument/environment/voltage_controller/run_control",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/voltage_controller/value": "/entry/instrument/environment/voltage_controller/value",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/temperature_controller/calibration_time": "/entry/instrument/environment/temperature_controller/calibration_time",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/temperature_controller/run_control": "/entry/instrument/environment/temperature_controller/run_control",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/temperature_controller/value": "/entry/instrument/environment/temperature_controller/value",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/current_sensor/calibration_time": "/entry/instrument/environment/current_sensor/calibration_time",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/current_sensor/run_control": "/entry/instrument/environment/current_sensor/run_control",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/current_sensor/value": "/entry/instrument/environment/current_sensor/value",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/independent_controllers": ["voltage_controller", "temperature_control"],
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/measurement_sensors": ["current_sensor"],
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/NXpid[heating_pid]/description": "/entry/instrument/environment/heating_pid/description",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/NXpid[heating_pid]/setpoint": "/entry/instrument/environment/heating_pid/setpoint",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/NXpid[heating_pid]/K_p_value": "/entry/instrument/environment/heating_pid/K_p_value",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/NXpid[heating_pid]/K_i_value": "/entry/instrument/environment/heating_pid/K_i_value",
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/NXpid[heating_pid]/K_d_value": "/entry/instrument/environment/heating_pid/K_d_value",
"/ENTRY[entry]/PROCESS[process]/program": "Bluesky",
"/ENTRY[entry]/PROCESS[process]/program/@version": "1.6.7",
"/ENTRY[entry]/SAMPLE[sample]/name": "super",
"/ENTRY[entry]/SAMPLE[sample]/atom_types": "Si, C",
"/ENTRY[entry]/definition": "NXiv_temp",
"/ENTRY[entry]/definition/@version": "1",
"/ENTRY[entry]/experiment_identifier": "dbdfed37-35ed-4aee-a465-aaa0577205b1",
"/ENTRY[entry]/experiment_description": "A simple IV temperature experiment.",
"/ENTRY[entry]/start_time": "2022-05-30T16:37:03.909201+02:00"
}
25 changes: 25 additions & 0 deletions examples/json_map/merge_linked.mapping.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"/@default": "entry",
"/ENTRY[entry]/DATA[data]/current": {"link": "current.nxs:/entry/data/current"},
"/ENTRY[entry]/DATA[data]/current_295C": {"link": "current.nxs:/entry/data/current_295C"},
"/ENTRY[entry]/DATA[data]/current_300C": {"link": "current.nxs:/entry/data/current_300C"},
"/ENTRY[entry]/DATA[data]/current_305C": {"link": "current.nxs:/entry/data/current_305C"},
"/ENTRY[entry]/DATA[data]/current_310C": {"link": "current.nxs:/entry/data/current_310C"},
"/ENTRY[entry]/DATA[data]/temperature": {"link": "voltage_and_temperature.nxs:/entry/data/temperature"},
"/ENTRY[entry]/DATA[data]/voltage": {"link": "voltage_and_temperature.nxs:/entry/data/voltage"},
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/voltage_controller": {"link": "voltage_and_temperature.nxs:/entry/instrument/environment/voltage_controller"},
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/temperature_controller": {"link": "voltage_and_temperature.nxs:/entry/instrument/environment/temperature_controller"},
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/current_sensor": {"link": "current.nxs:/entry/instrument/environment/current_sensor"},
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/independent_controllers": ["voltage_controller", "temperature_control"],
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/measurement_sensors": ["current_sensor"],
"/ENTRY[entry]/INSTRUMENT[instrument]/ENVIRONMENT[environment]/NXpid[heating_pid]": {"link": "voltage_and_temperature.nxs:/entry/instrument/environment/heating_pid"},
"/ENTRY[entry]/PROCESS[process]/program": "Bluesky",
"/ENTRY[entry]/PROCESS[process]/program/@version": "1.6.7",
"/ENTRY[entry]/SAMPLE[sample]/name": "super",
"/ENTRY[entry]/SAMPLE[sample]/atom_types": "Si, C",
"/ENTRY[entry]/definition": "NXiv_temp",
"/ENTRY[entry]/definition/@version": "1",
"/ENTRY[entry]/experiment_identifier": "dbdfed37-35ed-4aee-a465-aaa0577205b1",
"/ENTRY[entry]/experiment_description": "A simple IV temperature experiment.",
"/ENTRY[entry]/start_time": "2022-05-30T16:37:03.909201+02:00"
}
Binary file added examples/json_map/merged_copied.nxs
Binary file not shown.
Binary file added examples/json_map/merged_linked.nxs
Binary file not shown.
Binary file added examples/json_map/voltage_and_temperature.nxs
Binary file not shown.
17 changes: 17 additions & 0 deletions pynxtools/dataconverter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,26 @@ Options:
parameters the converter supports.
--undocumented Shows a log output for all undocumented
fields
--mapping TEXT Takes a <name>.mapping.json file and
converts data from given input files.
--help Show this message and exit.
```

#### Merge partial NeXus files into one

```console
user@box:~$ dataconverter --nxdl nxdl --input-file partial1.nxs --input-file partial2.nxs
```

#### Map an HDF5/JSON/(Python Dict pickled in a pickle file)

```console
user@box:~$ dataconverter --nxdl nxdl --input-file any_data.hdf5 --mapping my_custom_map.mapping.json
```

#### You can find actual examples with data files at [`examples/json_map`](../../examples/json_map/).


#### Use with multiple input files

```console
Expand Down
17 changes: 13 additions & 4 deletions pynxtools/dataconverter/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def get_names_of_all_readers() -> List[str]:


# pylint: disable=too-many-arguments,too-many-locals
def convert(input_file: Tuple[str],
def convert(input_file: Tuple[str, ...],
reader: str,
nxdl: str,
output: str,
Expand Down Expand Up @@ -153,7 +153,7 @@ def parse_params_file(params_file):
)
@click.option(
'--reader',
default='example',
default='json_map',
type=click.Choice(get_names_of_all_readers(), case_sensitive=False),
help='The reader to use. default="example"'
)
Expand Down Expand Up @@ -192,15 +192,20 @@ def parse_params_file(params_file):
default=False,
help='Shows a log output for all undocumented fields'
)
@click.option(
'--mapping',
help='Takes a <name>.mapping.json file and converts data from given input files.'
)
# pylint: disable=too-many-arguments
def convert_cli(input_file: Tuple[str],
def convert_cli(input_file: Tuple[str, ...],
reader: str,
nxdl: str,
output: str,
generate_template: bool,
fair: bool,
params_file: str,
undocumented: bool):
undocumented: bool,
mapping: str):
"""The CLI entrypoint for the convert function"""
if params_file:
try:
Expand All @@ -216,6 +221,10 @@ def convert_cli(input_file: Tuple[str],
sys.tracebacklimit = 0
raise IOError("\nError: Please supply an NXDL file with the option:"
" --nxdl <path to NXDL>")
if mapping:
reader = "json_map"
if mapping:
input_file = input_file + tuple([mapping])
convert(input_file, reader, nxdl, output, generate_template, fair, undocumented)


Expand Down
11 changes: 10 additions & 1 deletion pynxtools/dataconverter/hdfdict.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,16 @@ def _recurse(hdfobject, datadict):
elif isinstance(value, h5py.Dataset):
if not lazy:
value = unpacker(value)
datadict[key] = value
datadict[key] = (
value.asstr()[...]
if h5py.check_string_dtype(value.dtype)
else value
)

if "attrs" in dir(value):
datadict[key + "@"] = {}
for attr, attrval in value.attrs.items():
datadict[key + "@"][attr] = attrval

return datadict

Expand Down
53 changes: 46 additions & 7 deletions pynxtools/dataconverter/readers/json_map/README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,63 @@
# JSON Map Reader

This reader allows you to convert either data from a .json file or an xarray exported as a .pickle using a flat .mapping.json file.
## What is this reader?

This reader is designed to allow users of pynxtools to convert their existing data with the help of a map file. The map file tells the reader what to pick from your data files and convert them to FAIR NeXus files. The following formats are supported as input files:
* HDF5 (any extension works i.e. h5, hdf5, nxs, etc)
* JSON
* Python Dict Objects Pickled with [pickle](https://docs.python.org/3/library/pickle.html). These can contain [xarray.DataArray](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html) objects as well as regular Python types and Numpy types.

It accepts any NXDL file that you like as long as your mapping file contains all the fields.
Please use the --generate-template function of the dataconverter to create a .mapping.json file.

```console
user@box:~$ python convert.py --nxdl NXmynxdl --generate-template > mynxdl.mapping.json
user@box:~$ dataconverter --nxdl NXmynxdl --generate-template > mynxdl.mapping.json
```

There are some example files you can use:

[data.mapping.json](/tests/data/dataconverter/readers/json_map/data.mapping.json)

[data.mapping.json](/tests/data/tools/dataconverter/readers/json_map/data.mapping.json)

[data.json](/tests/data/tools/dataconverter/readers/json_map/data.json)
[data.json](/tests/data/dataconverter/readers/json_map/data.json)

```console
user@box:~$ python convert.py --nxdl NXtest --input-file data.json --input-file data.mapping.json --reader json_map
user@box:~$ dataconverter --nxdl NXtest --input-file data.json --mapping data.mapping.json
```

##### [Example](/examples/json_map/) with HDF5 files.

## The mapping.json file

This file is designed to let you fill in the requirements of a NeXus Application Definition without writing any code. If you already have data in the formats listed above, you just need to use this mapping file to help the dataconverter pick your data correctly.

The mapping files will always be based on the Template the dataconverter generates. See above on how to generate a mapping file.
The right hand side values of the Template keys are what you can modify.

Here are the three different ways you can fill the right hand side of the Template keys:
* Write the nested path in your datafile. This is indicated by a leading `/` before the word `entry` to make `/entry/data/current_295C` below.
Example:

```json
"/ENTRY[entry]/DATA[data]/current_295C": "/entry/data/current_295C",
"/ENTRY[entry]/NXODD_name/posint_value": "/a_level_down/another_level_down/posint_value",
```

* Write the values directly in the mapping file for missing data from your data file.

```json

"/ENTRY[entry]/PROCESS[process]/program": "Bluesky",
"/ENTRY[entry]/PROCESS[process]/program/@version": "1.6.7"
```

* Write JSON objects with a link key. This follows the same link mechanism that the dataconverter implements. In the context of this reader, you can only use external links to your data files. In the example below, `current.nxs` is an already existing HDF5 file that we link to in our new NeXus file without copying over the data. The format is as follows:
`"link": "<filename>:<path_in_file>"`
Note: This only works for HDF5 files currently.

```json
"/ENTRY[entry]/DATA[data]/current_295C": {"link": "current.nxs:/entry/data/current_295C"},
"/ENTRY[entry]/DATA[data]/current_300C": {"link": "current.nxs:/entry/data/current_300C"},
```

## Contact person in FAIRmat for this reader
Sherjeel Shabih
Sherjeel Shabih
Loading

0 comments on commit 4cc45d3

Please sign in to comment.