Documentation Part I - General docu (#24)

* Added mkdocs structure * Added contributors to README * Improved index.md and contact.md pages * Added content for general.md with diagrams Added docu on the normalize function * Added Bernadette to contributors
nomad-coe · Apr 29, 2024 · 53ba600 · 53ba600
1 parent b159b0c
commit 53ba600
Show file tree

Hide file tree

Showing 16 changed files with 235 additions and 25 deletions.
diff --git a/README.md b/README.md
@@ -131,3 +131,12 @@ If using VSCode, you can add the following snippet to your `.vscode/launch.json`
 where `${workspaceFolder}` refers to the NOMAD root.
 
 The settings configuration file `.vscode/settings.json` performs automatically applies the linting upon saving the file progress.
+
+
+## Main contributors
+| Name | E-mail     | Topics | Github profiles |
+|------|------------|--------|-----------------|
+| Dr. Nathan Daelman | [[email protected]](mailto:[email protected]) | DFT, Precision | [@ndaelman-hu](https://github.com/ndaelman-hu) |
+| Dr. Bernadette Mohr | [[email protected]](mailto:[email protected]) | MD, FF | [@Bernadette-Mohr](https://github.com/Bernadette-Mohr) |
+| Dr. José M. Pizarro | [[email protected]](mailto:[email protected]) | GW, DMFT, BSE | [@JosePizarro3](https://github.com/JosePizarro3) |
+| Dr. Joseph F. Rudzinski (**Coordinator**) | [[email protected]](mailto:[email protected]) | General | [@JFRudzinski](https://github.com/JFRudzinski) |
diff --git a/docs/assets/program.png b/docs/assets/program.png
diff --git a/docs/assets/simulation.png b/docs/assets/simulation.png
diff --git a/docs/assets/simulation_base.png b/docs/assets/simulation_base.png
diff --git a/docs/assets/simulation_composition.png b/docs/assets/simulation_composition.png
diff --git a/docs/contact.md b/docs/contact.md
@@ -4,17 +4,16 @@ NOMAD is an open source project that warmly welcomes community projects, contrib
 
 You can reach us by different channels. You can send as directly an email to the main contributors list:
 
-!!! info "Main contributors"
-    | Name | E-mail     | Topics |
-    |------|------------|--------|
-    | Dr. Nathan Daelman                    | [[email protected]](mailto:[email protected])       | DFT, parsers, normalizers  |
-    | Dr. Alvin Noe Ladines                 | [[email protected]](mailto:[email protected]) | Parsers, workflows           |
-    | Dr. José M. Pizarro                   | [[email protected]](mailto:[email protected])                 | GW, DMFT, BSE, parsers, workflows, normalizers        |
-    | Dr. Joseph F. Rudzinski (**Coordinator**) | [[email protected]](mailto:[email protected])   | MD, parsers, workflows, normalizers                          |
+| Name | E-mail     | Topics | Github profiles |
+|------|------------|--------|-----------------|
+| Dr. Nathan Daelman | [[email protected]](mailto:[email protected]) | DFT, Precision | [@ndaelman-hu](https://github.com/ndaelman-hu) |
+| Dr. Bernadette Mohr | [[email protected]](mailto:[email protected]) | MD, FF | [@Bernadette-Mohr](https://github.com/Bernadette-Mohr) |
+| Dr. José M. Pizarro | [[email protected]](mailto:[email protected]) | GW, DMFT, BSE | [@JosePizarro3](https://github.com/JosePizarro3) |
+| Dr. Joseph F. Rudzinski (**Coordinator**) | [[email protected]](mailto:[email protected]) | General | [@JFRudzinski](https://github.com/JFRudzinski) |
 
 
 Alternatively, you can also:
 
-- Open an issue in the [general NOMAD Github project](https://github.com/nomad-coe/nomad), or in one of the [sub-projects](https://github.com/nomad-coe/nomad/tree/develop/dependencies/parsers) related with specific parsers. Our Github profile tags are [@ndaelman-hu](https://github.com/ndaelman-hu), [@ladinesa](https://github.com/ladinesa), [@JosePizarro3](https://github.com/JosePizarro3), and [@JFRudzinski](https://github.com/JFRudzinski).
-- Write us in the [NOMAD MatSci forum](https://matsci.org/c/nomad/32). Our tags there are @NateD, @ladinesa, @JosePizarro, and @JFRudzinski.
-- Send an email to [[email protected]](mailto:support@nomad-lab.eu). Please, add in the subject "ATTN - Area C".
+- Open an [**issue**](https://github.com/nomad-coe/nomad-schema-plugin-simulation-data/issues) in the [Github project](https://github.com/nomad-coe/nomad-schema-plugin-simulation-data/), and tag any of us.
+- Join the [Discord channel](https://discord.gg/Gyzx3ukUw8) and ask us there directly.
+- If you are included as a contributor in the Github project, you can open new [**discussions**](https://github.com/nomad-coe/nomad-schema-plugin-simulation-data/discussions) regarding a new data schema or modelling you want to see covered.
diff --git a/docs/general.md b/docs/general.md
@@ -0,0 +1,112 @@
+# `Simulation` base section
+
+<!--
+Improve these paragraphs once `Program` and `BaseSimulation` are integrated in `basesections.py`
+--->
+In NOMAD, all the simulation metadata is defined in the `Simulation` section. You can find its Python schema definition in [src/nomad_simulations/general.py](https://github.com/nomad-coe/nomad-schema-plugin-simulation-data/blob/develop/src/nomad_simulations/general.py). This section will appear under the `data` section for the [*archive*](https://nomad-lab.eu/prod/v1/staging/docs/reference/glossary.html#archive) metadata structure of each [*entry*](https://nomad-lab.eu/prod/v1/staging/docs/reference/glossary.html#entry).
+
+The `Simulation` section inherits from a _base section_ `BaseSimulation`. In NOMAD, a set of [base sections](https://nomad-lab.eu/prod/v1/staging/docs/howto/customization/base_sections.html) derived from the [Basic Formal Ontology (BFO)](https://basic-formal-ontology.org/) are defined. We used them to define `BaseSimulation` as an [`Activity`](http://purl.obolibrary.org/obo/BFO_0000015). The UML diagram is:
+
+<div class="click-zoom">
+    <label>
+        <input type="checkbox">
+        <img src="../assets/simulation_base.png" alt="Simulation base section diagram." width="80%" title="Click to zoom in">
+    </label>
+</div>
+
+`BaseSimulation` contains the general information about the `Program` used, as well as general times of the simulation, e.g., the datetime at which it started (`datetime`) and ended (`datetime_end`). `Simulation` contains further information about the specific input and output sections ([see below](#sub-sections-in-simulation)) The detailed UML diagram of quantities and functions defined for `Simulation` is thus:
+
+<div class="click-zoom">
+    <label>
+        <input type="checkbox">
+        <img src="../assets/simulation.png" alt="Simulation quantities and functions UML diagram." width="50%" title="Click to zoom in">
+    </label>
+</div>
+
+??? question "Notation for the section attributes in the UML diagram"
+    We included the information of each attributes / quantities after its definition. The notation is:
+
+        <name-of-quantity>: <type-of-quantity>, <units-of-quantity>
+
+    Thus, `cpu1_start: np.float64, s` means that there is a quantity named `'cpu1_start'` of type `numpy.float64` and whose units are `'s'` (seconds).
+    We also include the existance of sub-sections by bolding the name, i.e.:
+
+        <name-of-sub-section>: <sub-section-definition>
+
+    E.g., there is a sub-section under `Simulation` named `'model_method'` whose section defintion can be found in the `ModelMethod` section. We will represent this sub-section containment in more complex UML diagrams in the future using the containment arrow (see below for [an example using `Program`](#program)).
+
+We use double inheritance from `EntryData` in order to populate the `data` section in the NOMAD archive. All of the base sections discussed here are subject to the [public normalize function](normalize.md) in NOMAD. The private function `set_system_branch_depth()` is related with the [ModelSystem base section](model_system/model_system.md).
+
+## Main sub-sections in `Simulation` {#sub-sections-in-simulation}
+
+The `Simulation` base section is composed of 4 main sub-sections:
+
+1. `Program`: contains all the program information, e.g., `name` of the program, `version`, etc.
+2. `ModelSystem`: contains all the system information about geometrical positions of atoms, their states, simulation cells, symmetry information, etc.
+3. `ModelMethod`: contains all the methodological information, and it is divided in two main aspects: the mathematical model or approximation used in the simulation (e.g., `DFT`, `GW`, `ForceFields`, etc.) and the numerical settings used to compute the properties (e.g., meshes, self-consistent parameters, basis sets settings, etc.).
+4. `Outputs`: contains all the output properties, as well as references to the `ModelSystem` used to obtain such properties. It might also contain information which will populate `ModelSystem` (e.g., atomic occupations, atomic moments, crystal field energies, etc.).
+
+!!! note "Self-consistent steps, SinglePoint entries, and more complex workflows."
+    The minimal unit for storing data in the NOMAD archive is an [*entry*](https://nomad-lab.eu/prod/v1/staging/docs/reference/glossary.html#entry). In the context of simulation data, an entry may contain data from a calculation on an individual system configuration (e.g., a single-point DFT calculation) using **only** the above-mentioned sections of the `Simulation` section. Information from self-consistent iterations to converge properties for this configuration are also contained within these sections.
+
+    More complex calculations that involve multiple configurations require the definition of a *workflow* section within the archive. Depending on the situation, the information from individual workflow steps may be stored within a single or multiple entries. For example, for efficiency, the data from workflows involving a large amount of configurations, e.g., molecular dynamics trajectories, are stored within a single entry. Other standard workflows store the single-point data in separate entries, e.g.,  a `GW` calculation is composed of a `DFT SinglePoint` entry and a `GW SinglePoint` entry. Higher-level workflows, which simply connect a series of standard or custom workflows, are typically stored as a separate entry. You can check the [NOMAD simulations workflow schema](https://github.com/nomad-coe/nomad-schema-plugin-simulation-workflow) for more information.
+
+The following schematic represents a simplified representation of the `Simulation` section (note that the arrows here are a simple way of visually defining _inputs_ and _outputs_):
+
+<div class="click-zoom">
+    <label>
+        <input type="checkbox">
+        <img src="../assets/simulation_composition.png" alt="Simulation composition diagram." width="90%" title="Click to zoom in">
+    </label>
+</div>
+
+### `Program` {#program}
+
+The `Program` base section contains all the information about the program / software / code used to perform the simulation. We consider it to be a [`(Continuant) Entity`](http://purl.obolibrary.org/obo/BFO_0000002) and contained within `BaseSimulation` as a sub-section. The detailed UML diagram is:
+
+<div class="click-zoom">
+    <label>
+        <input type="checkbox">
+        <img src="../assets/program.png" alt="Program quantities and functions UML diagram." width="75%" title="Click to zoom in">
+    </label>
+</div>
+
+
+When [writing a parser](https://nomad-lab.eu/prod/v1/staging/docs/howto/customization/parsers.html), we recommend to start by instantiating the `Program` section and populating its quantities, in order to get acquainted with the NOMAD parsing infrastructure.
+
+For example, imagine we have a file which we want to parse with the following information:
+```txt
+! * * * * * * *
+! Welcome to SUPERCODE, version 7.0
+...
+```
+
+We can parse the program `name` and `version` by matching the texts (see, e.g., [Wikipedia page for Regular expressions, also called _regex_](https://en.wikipedia.org/wiki/Regular_expression)):
+
+```python
+from nomad.parsing.file_parser import TextParser, Quantity
+from nomad_simulations import Simulation, Program
+
+
+class SUPERCODEParser:
+    """
+    Class responsible to populate the NOMAD `archive` from the files given by a
+    SUPERCODE simulation.
+    """
+
+    def parse(self, filepath, archive, logger):
+        output_parser = TextParser(
+            quantities=[
+                Quantity('program_version', r'version *([\d\.]+) *', repeats=False)
+            ]
+        )
+        output_parser.mainfile = filepath
+
+        simulation = Simulation()
+        simulation.program = Program(
+            name='SUPERCODE',
+            version=output_parser.get('program_version'),
+        )
+        # append `Simulation` as an `archive.data` section
+        archive.data.append(simulation)
+```
diff --git a/docs/howto_use.md b/docs/howto_use.md
@@ -0,0 +1,4 @@
+# How to use the `Simulation` schema
+
+!!! warning
+    This page is still under construction.
diff --git a/docs/index.md b/docs/index.md
@@ -3,17 +3,9 @@
     <div id="cy"></div>
 -->
 
-**Welcome to the NOMAD documentation for the Schema developed for Computational Materials Scientists**, where you can find information about how to use the NOMAD standard schema for your own simulations.
+**Welcome to the NOMAD documentation for the Schema developed for Computational Materials Scientists**, where you can find information about how to use the NOMAD schema definition to store the data output by your simulations.
+This project contains all the information about the main base sections and their `SubSections` and `Quantities` relevant for simulations. We propose here a general schema which could then be used as a basis to build more specific schemas.
 
-NOMAD is a free open-source data management platform for Materials Science which follows the F.A.I.R. (Findable, Accessible, Interoperable, and Reusable) principles. This documentation page is a part of the more [general NOMAD documentation](https://nomad-lab.eu/prod/v1/staging/docs/), and more specifically, a part on the usage of [NOMAD base sections](https://nomad-lab.eu/prod/v1/staging/docs/howto/customization/base_sections.html).
+NOMAD is a free open-source data management platform for Materials Science which follows the F.A.I.R. (Findable, Accessible, Interoperable, and Reusable) principles. This documentation page is a part of the more [general NOMAD documentation](https://nomad-lab.eu/prod/v1/staging/docs/), as well as on the usage of [NOMAD base sections](https://nomad-lab.eu/prod/v1/staging/docs/howto/customization/base_sections.html).
 
-
-
-!!! info "Main contributors"
-    Dr. Nathan Daelman, [[email protected]](mailto:[email protected])
-
-    Dr. Alvin Noe Ladines, [[email protected]](mailto:[email protected])
-
-    Dr. José M. Pizarro, [[email protected]](mailto:[email protected])
-
-    Dr. Joseph F. Rudzinski, [[email protected]](mailto:[email protected])
+When designing the sections, we follow [SOLID principles](https://www.geeksforgeeks.org/solid-principle-in-programming-understand-with-real-life-examples/) for object-oriented programming. And throughout this documentation, we will use [UML diagrams](https://en.wikipedia.org/wiki/Class_diagram), both in a simplified and in a detailed manner, to draw the schemas relationships.
diff --git a/docs/model_method/model_method.md b/docs/model_method/model_method.md
@@ -0,0 +1,4 @@
+# `ModelMethod`
+
+!!! warning
+    This page is still under construction.
diff --git a/docs/model_system/model_system.md b/docs/model_system/model_system.md
@@ -0,0 +1,4 @@
+# `ModelSystem`
+
+!!! warning
+    This page is still under construction.
diff --git a/docs/normalize.md b/docs/normalize.md
@@ -0,0 +1,56 @@
+# The `normalize()` function
+
+Each base section defined using the NOMAD schema has a set of public functions which can be used at any moment when reading and parsing files in NOMAD. The `normalize(archive, logger)` function is a special case of such functions, which warrants an in-depth description.
+
+This function is run within the NOMAD infrastructure by the [`MetainfoNormalizer`](https://github.com/nomad-coe/nomad/blob/develop/nomad/normalizing/metainfo.py) in the following order:
+
+1. A child section's `normalize()` function is run before their/its parents' `normalize()` function.
+2. For sibling sections, the `normalize()` function is executed from the smaller to the larger `normalizer_level` attribute. If `normalizer_level` is not set or if they are the same for two different sections, the order is established by the attributes definition order in the parent section.
+3. Using `super().normalize(archive, logger)` runs the inherited section normalize function.
+
+Let's see some examples. Imagine having the following `Section` and `SubSection` structure:
+
+```python
+from nomad.datamodel.data import ArchiveSection
+
+
+class Section1(ArchiveSection):
+    normalizer_level = 1
+
+    def normalize(self, achive, logger):
+        # some operations here
+        pass
+
+
+class Section2(ArchiveSection):
+    normalizer_level = 0
+
+    def normalize(self, achive, logger):
+        super().normalize(archive, logger)
+        # Some operations here or before `super().normalize(archive, logger)`
+
+
+class ParentSection(ArchiveSection):
+
+    sub_section_1 = SubSection(Section1.m_def, repeats=False)
+
+    sub_section_2 = SubSection(Section2.m_def, repeats=True)
+
+    def normalize(self, achive, logger):
+        super().normalize(archive, logger)
+        # Some operations here or before `super().normalize(archive, logger)`
+```
+
+Now, `MetainfoNormalizer` will be run on the `ParentSection`. Applying **rule 1**, the `normalize()` functions of the `ParentSection`'s childs are executed first. The order of these functions is established by **rule 2** with the `normalizer_level` atrribute, i.e., all the `Section2` (note that `sub_section_2` is a list of sections) `normalize()` functions are run first, then `Section1.normalize()`. Then, the order of execution will be:
+
+1. `Section2.normalize()`
+2. `Section1.normalize()`
+3. `ParentSection.normalize()`
+
+In case we do not assign a value to `Section1.normalizer_level` and `Section2.normalizer_level`, `Section1.normalize()` will run first before `Section2.normalize()`, due to the order of `SubSection` attributes in `ParentSection`. Thus the order will be in this case:
+
+1. `Section1.normalize()`
+2. `Section2.normalize()`
+3. `ParentSection.normalize()`
+
+By checking on the `normalize()` functions and **rule 3**, we can establish whether `ArchiveSection.normalize()` will be run or not. In `Section1.normalize()`, it will not, while in the other sections, `Section2` and `ParentSection`, it will.
diff --git a/docs/outputs/outputs.md b/docs/outputs/outputs.md
@@ -0,0 +1,4 @@
+# `Outputs`
+
+!!! warning
+    This page is still under construction.
diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css
@@ -122,4 +122,25 @@
 
 .nomad-button--card-action {
     margin: 1em 0 0 0 !important;
+}
+
+.click-zoom {
+    text-align: center;
+}
+
+.click-zoom input[type=checkbox] {
+    z-index: 0;
+    display: none
+}
+
+.click-zoom img {
+    transition: transform 0.25s ease;
+    z-index: 0;
+    cursor: zoom-in
+}
+
+.click-zoom input[type=checkbox]:checked~img {
+    transform: scale(2.3);
+    z-index: 0;
+    cursor: zoom-out
 }
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -5,6 +5,13 @@ site_author: FAIRmat consortium
 repo_url: https://github.com/nomad-coe/nomad-schema-plugin-simulation-data
 nav:
   - Home: index.md
+  - Simulation schema:
+    - Overview: general.md
+    - ModelSystem: model_system/model_system.md
+    - ModelMethod: model_method/model_method.md
+    - Outputs: outputs/outputs.md
+  - How to use the Simulation schema: howto_use.md
+  - The normalize function: normalize.md
   - Contact: contact.md
 theme:
   name: material
@@ -54,9 +61,6 @@ markdown_extensions:
       toc_depth: 3
   - pymdownx.arithmatex:
       generic: true
-  - pymdownx.emoji:
-      emoji_index: !!python/name:materialx.emoji.twemoji
-      emoji_generator: !!python/name:materialx.emoji.to_svg
 extra_css:
   - stylesheets/extra.css
 extra_javascript:

diff --git a/pyproject.toml b/pyproject.toml
@@ -13,6 +13,7 @@ readme = "README.md"
 authors = [
     { name = "Jose M. Pizarro", email = "[email protected]" },
     { name = "Nathan Daelman", email = "[email protected]" },
+    { name = "Bernadette Mohr", email = "[email protected]" },
     { name = "Joseph F. Rudzinski", email = "[email protected]" }
 ]
 license = { text = "Apache-2.0" }