Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start documentation #43

Merged
merged 9 commits into from
May 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .github/workflows/build_docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: build_docs
on:
push:
branches: [main]
permissions:
contents: write
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Configure Git Credentials
run: |
git config user.name github-actions[bot]
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
- uses: actions/setup-python@v4
with:
python-version: 3.x
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
- uses: actions/cache@v3
with:
key: mkdocs-material-${{ env.cache_id }}
path: .cache
restore-keys: |
mkdocs-material-
- run: pip install ".[docs]"
- run: mkdocs gh-deploy --force
71 changes: 4 additions & 67 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

# A reader for XPS data

# Installation
## Installation

It is recommended to use python 3.11 with a dedicated virtual environment for this package.
Learn how to manage [python versions](https://github.com/pyenv/pyenv) and
Expand All @@ -24,76 +24,13 @@ pip install pynxtools[xps]

for the latest development version.


# Purpose
## Purpose
This reader plugin for [`pynxtools`](https://github.com/FAIRmat-NFDI/pynxtools) is used to translate diverse file formats from the scientific community and technology partners
within the field of X-ray photoelectron spectroscopy into a standardized representation using the
[NeXus](https://www.nexusformat.org/) application definition [NXmpes](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXmpes.html#nxmpes).

## Supported file formats
The reader decides which parser to use based on the file extension of the files provided. For the main XPS files, the following file extensions are supported:
- .spe, .pro: [Phi MultiPak](https://www.phi.com/surface-analysis-equipment/genesis.html#software:multi-pak-data-reduction-software/) files, propietary format of PHI Electronics
- .sle: [SpecsLabProdigy](https://www.specs-group.com/nc/specs/products/detail/prodigy/) files, propietary format of SPECS GmbH (1 and v4)
- .xml: SpecsLab 2files, XML format from SPECS GmbH (v1.6)
- .vms: VAMAS files, ISO standard data transfer format ([ISO 14976](https://www.iso.org/standard/24269.html)), both in regular and irregular format
- .xy: SpecsLabProdigy export format in XY format (including all export settings)
- .txt:
- exported by [Scienta Omicron](https://scientaomicron.com/en) instruments
- exported by [CasaXPS](https://www.casaxps.com/) analysis software

We are continously working on adding parsers for other data formats and technology partners. If you would like to implement a parser for your data, feel free to get in contact.

# Getting started
An example script to run the XPS reader in `pynxtools`:
```sh
! dataconverter \
--reader xps \
--nxdl NXmpes \
--input-file $<xps-file path> \
--input-file $<eln-file path> \
--output <output-file path>.nxs
```
Note that none of the supported file format have data/values for all required and recommended fields and attributes in NXmpes. In order for the validation step of the XPS reader to pass, you need to provide an ELN file that contains the missing values. Example raw and converted data can be found in [*pynxtools_xps/examples*](https://github.com/FAIRmat-NFDI/pynxtools-xps/tree/main/examples).


# Contributing

## Development install

Install the package with its dependencies:

```shell
git clone https://github.com/FAIRmat-NFDI/pynxtools-xps.git \\
--branch main \\
--recursive pynxtools_xps
cd pynxtools_xps
python -m pip install --upgrade pip
python -m pip install -e .
python -m pip install -e ".[dev,consistency_with_pynxtools]"
```

There is also a [pre-commit hook](https://pre-commit.com/#intro) available
which formats the code and checks the linting before actually commiting.
It can be installed with
```shell
pre-commit install
```
from the root of this repository.

## Development Notes
The development process is modular so that new parsers can be added. The design logic is the following:
1. First, [`XpsDataFileParser`](https://github.com/FAIRmat-NFDI/pynxtools-xps/blob/main/pynxtools_xps/file_parser.py#L36) selects the proper parser based on the file extensions of the provided files. It then calls a sub-parser that can read files with such extensions and calls the `parse_file` function of that reader. In addition, it selects a proper config file from
the `config` subfolder.
2. Afterwards, the NXmpes nxdl template is filled with the data in `XpsDataFileParser` using the [`config`](https://github.com/FAIRmat-NFDI/pynxtools-xps/tree/main/pynxtools_xps/config) file. Data that is not in the given main files can be added through the ELN file (and must be added for required fields in NXmpes).

## Test this software

Especially relevant for developers, there exists a basic test framework written in
[pytest](https://docs.pytest.org/en/stable/) which can be used as follows:

```shell
python -m pytest -sv tests
```
## Docs
Extensive documentation of this pynxtools plugin is available [here](fairmat-nfdi.github.io/pynxtools-xps/). You can find information about getting started, how-to guides, the supported file formats, how to get involved, and much more there.

## Contact person in FAIRmat for this reader
Lukas Pielsticker
72 changes: 67 additions & 5 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile --extra=dev --output-file=dev-requirements.txt pyproject.toml
# pip-compile --extra=dev --extra=docs --output-file=dev-requirements.txt pyproject.toml
#
asciitree==0.3.3
# via zarr
Expand All @@ -18,6 +18,8 @@ attrs==23.1.0
# via
# cattrs
# requests-cache
babel==2.14.0
# via mkdocs-material
backcall==0.2.0
# via ipython
blosc2==2.0.0
Expand All @@ -35,6 +37,7 @@ charset-normalizer==3.3.2
click==8.1.7
# via
# dask
# mkdocs
# pip-tools
# pynxtools
cloudpickle==3.0.0
Expand All @@ -44,6 +47,8 @@ colorama==0.4.6
# build
# click
# ipython
# mkdocs
# mkdocs-material
# pytest
# tqdm
comm==0.2.0
Expand Down Expand Up @@ -107,6 +112,8 @@ fsspec==2023.10.0
# hyperspy
future==0.18.3
# via uncertainties
ghp-import==2.1.0
# via mkdocs
gitdb==4.0.11
# via gitpython
gitpython==3.1.40
Expand Down Expand Up @@ -165,7 +172,11 @@ ipython==8.12.3
jedi==0.19.1
# via ipython
jinja2==3.1.2
# via hyperspy
# via
# hyperspy
# mkdocs
# mkdocs-macros-plugin
# mkdocs-material
joblib==1.3.2
# via scikit-learn
jupyter-client==8.6.0
Expand All @@ -192,8 +203,15 @@ locket==1.0.0
# via partd
lxml==4.9.3
# via fabio
markdown==3.6
# via
# mkdocs
# mkdocs-material
# pymdown-extensions
markupsafe==2.1.3
# via jinja2
# via
# jinja2
# mkdocs
matplotlib==3.7.4
# via
# ase
Expand All @@ -212,7 +230,25 @@ matplotlib-inline==0.1.6
matplotlib-scalebar==0.8.1
# via orix
mergedeep==1.3.4
# via pynxtools
# via
# mkdocs
# mkdocs-get-deps
# pynxtools
mkdocs==1.6.0
# via
# mkdocs-macros-plugin
# mkdocs-material
# pynxtools-xps (pyproject.toml)
mkdocs-get-deps==0.2.0
# via mkdocs
mkdocs-macros-plugin==1.0.5
# via pynxtools-xps (pyproject.toml)
mkdocs-material==9.5.20
# via pynxtools-xps (pyproject.toml)
mkdocs-material-extensions==1.3.1
# via
# mkdocs-material
# pynxtools-xps (pyproject.toml)
mpmath==1.3.0
# via sympy
msgpack==1.0.7
Expand Down Expand Up @@ -316,11 +352,14 @@ packaging==23.2
# hyperspy
# ipykernel
# matplotlib
# mkdocs
# pooch
# pytest
# scikit-image
# tables
# xarray
paginate==0.5.6
# via mkdocs-material
pandas==2.0.3
# via
# ifes-apt-tc-data-modeling
Expand All @@ -330,6 +369,8 @@ parso==0.8.3
# via jedi
partd==1.4.1
# via dask
pathspec==0.12.1
# via mkdocs
pickleshare==0.7.5
# via ipython
pillow==10.0.1
Expand All @@ -347,6 +388,7 @@ pip-tools==7.3.0
platformdirs==4.0.0
# via
# jupyter-core
# mkdocs-get-deps
# pooch
# requests-cache
# virtualenv
Expand Down Expand Up @@ -379,7 +421,11 @@ pycifrw==4.4.6
pyfai==2023.9.0
# via pyxem
pygments==2.17.2
# via ipython
# via
# ipython
# mkdocs-material
pymdown-extensions==10.8.1
# via mkdocs-material
pynxtools==0.0.10
# via pynxtools-xps (pyproject.toml)
pyparsing==3.1.1
Expand All @@ -397,10 +443,12 @@ pytest-timeout==2.2.0
# via pynxtools-xps (pyproject.toml)
python-dateutil==2.8.2
# via
# ghp-import
# hyperspy
# ipyparallel
# jupyter-client
# matplotlib
# mkdocs-macros-plugin
# pandas
pytz==2023.3.post1
# via
Expand All @@ -418,18 +466,28 @@ pyyaml==6.0.1
# dask
# hyperspy
# kikuchipy
# mkdocs
# mkdocs-get-deps
# mkdocs-macros-plugin
# pre-commit
# pymdown-extensions
# pynxtools
# pyyaml-env-tag
pyyaml-env-tag==0.1
# via mkdocs
pyzmq==25.1.1
# via
# ipykernel
# ipyparallel
# jupyter-client
radioactivedecay==0.4.21
# via ifes-apt-tc-data-modeling
regex==2024.4.28
# via mkdocs-material
requests==2.31.0
# via
# hyperspy
# mkdocs-material
# pooch
# pynxtools
# requests-cache
Expand Down Expand Up @@ -485,6 +543,8 @@ sympy==1.12
# radioactivedecay
tables==3.8.0
# via ifes-apt-tc-data-modeling
termcolor==2.4.0
# via mkdocs-macros-plugin
threadpoolctl==3.2.0
# via scikit-learn
tifffile==2023.7.10
Expand Down Expand Up @@ -564,6 +624,8 @@ urllib3==2.1.0
# types-requests
virtualenv==20.25.0
# via pre-commit
watchdog==4.0.0
# via mkdocs
wcwidth==0.2.12
# via
# prettytable
Expand Down
1 change: 1 addition & 0 deletions docs/explanation/appdefs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# The NeXus application definitions: NXmpes and NXxps
9 changes: 9 additions & 0 deletions docs/explanation/contextualization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# How to map pieces of information to NeXus

Conceptually, mapping between representations of concepts and instance data is a key tasks in information science. The plugin pynxtools-xps implements this specifically for the file and serialization formats used within the research field of photoelectron spectroscopy (PES).

In pynxtools-xps, the mapping from the vendor format is a two-step process:
1) First, each information piece is parsed from the experiment- and vendor-specific and assigned a name that describes what the reader developer thinks it semantically means. This naming can come from documentation of the original data, existing key-value infrastructure in the data file, or from domain knowledge of the reader developer. All data and metadata items are internally stored as a flat list of dictionaries, with each dictionary containing all information about a single XP spectrum.
2) This list of dicts is then mapped onto either the ([NXmpes](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXmpes.html) NeXus application definition or its specialization [NXxps](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXxps.html)). For this, a JSON config file is used that provides a concept map from the originally assigned keys towards the groups, fields, and attributes in the NeXus standard. Such transformations are configured via the respective files in the [*config*](https://github.com/FAIRmat-NFDI/pynxtools-xps/tree/main/pynxtools_xps/config) directory of pynxtools-xps.

Upon parsing, the XPS reader uses the config file to map the (meta-)data to a *template* which follows the NeXus application definitions. It also takes metadata provided through additional means (i.e., an electronic lab notebook (ELN) file) to fill in missing required and recommended fields and attributes in the application definition that were not provided in the raw data fikes. It is this *template* variable from which core functions like *convert.py* of the pynxtools write the actual NeXus/HDF5 file. The latter tool is also referred to as the dataconverter of [pynxtools](https://github.com/FAIRmat-NFDI/pynxtools).
1 change: 1 addition & 0 deletions docs/explanation/data_processing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Data processing with CasaXPS
25 changes: 25 additions & 0 deletions docs/explanation/implementation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Purpose and aim of pynxtools-xps
pynxtools-xps aims for the implementation of [FAIR principles of data stewardship](https://doi.org/10.1162/dint_r_00024) in photoelectron spectroscopy (PES). In many experimental fields, there has been a push towards such standardization and interoperability in recent yeards; however, there has been a distinct lack of such efforts in PES.

While there exists a widely adopted [ISO standard](https://www.iso.org/standard/24269.html) for data transfer in surface chemical analysis, it does not fully cover all of the information that is obtained in modern photoemission experiments. Within the FAIRmat project of the German National Research Data Infrastructure Germany (NFDI), we have spent considerable effort towards building developing an extensive and elaborated standard ([NXmpes](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXmpes.html) with its specialization [NXxps](https://fairmat-nfdi.github.io/nexus_definitions/classes/contributed_definitions/NXxps.html)) for harmonizing PES data using [NeXus](https://www.nexusformat.org/), a community-driven data-modeling framework for experiments.

The goal of pynxtools-xps is to provide a mapping from the diverse proprietary and open-source software solutions used in the XPS community towards this NeXus standard. The software implements a suggestion how diverse (meta)data from the research field of photoelectron spectroscopy can be parsed and normalized to enable users to compare data. The software can parse data provided by many different data providers and frequently used serialization and formatting. This data is then mapped onto the NeXus standard.

As part of [pynxtools and its plugin infrastructure](https://github.com/FAIRmat-NFDI/pynxtools), pynxtools-xps is fully integrated into the NOMAD research data management systems (RDMS), with the aim of facilitating harmonization of XPS data and enabling development of data-centric software tools and services.

# Software landscape in photoelectron spectroscopy - a mixture of proprietary and open-source solutions
As in many other experimental fields, the software landscape in photoelectron spectroscopy is extremely diverse, ranging from fully integrated software solution from technology partners, that integrate the measurement, post-processing, and data analysis, to custom-written software for specific uses cases.
While propietary software is often easy to use for end users, such software often writes to proprietary serialization formats (file or database entries). Not only are these formats not openly readable, but they are often not well-documented and the content and meaning of the semantic concepts is very often not documented publicly. While open-source software typically writes to more openly (and sometimes better documented) formats, they tend to loose much of the metadata that commercial vendors can provide with their data. pynxtools-xps aims at both endpoints of this spectrum and everything in-between: it provides an easy-to-use framework for writing a standardization parser for small, encapsulated solutions, while also providing the possibility of mapping the full richness of data and metadata acquired in a high-end XPS laboratory onto NeXus.

# Implementation design

pynxtools-xps is a community-based tool that provides a bottom-up approach for mapping XPS data onto the NXmpes and NXxps standards. Specificallly, the software contains example parsers for data that was measured by PES researchers in a wide array of experimental setups. The goal is not neccessarily to implement a fully comprehensive mapping of all possible existing file formats, but rather help the individual researcher or technology partner to start reading their data into the NeXus standard.

Therefore, the following design patterns guide our implementation:

- We do not consider that our work is complete (from the perspective of the idea in mind that a user can expect to drag-and-drop arbitrary content).
- We consider ontology matching a team effort that can only be achieved with technology partners and scientists working together.
- Our work is open to suggestions by the PES community, always realizing that just being able to read from a specific file alone is not solving the challenge that pynxtools-xps addresses.
- We provide specific tangible examples of (meta)data semantic mapping for specific file formats that are frequently used in XPS. These include the main formats of the leading vendors of PES spectrometers.
- The tool itself is build such that is easily extendable.
- The goal is to continously grow the number of parsers available for different communities. We therefore encourage researchers and technology partners to get in contact in order to get started with standardization in NeXus and NOMAD.
1 change: 1 addition & 0 deletions docs/explanation/nomad_integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# NOMAD integration
Loading
Loading