-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend model system #58
base: develop
Are you sure you want to change the base?
Conversation
- TODO: move to own subsection
- Clarify some descriptions in the schema - Update tests - TODO: add angles
- Improve combo generation: unique combos only now - Robustify `AtomicCell.get_distances()` - Add quantities storing elemental combinations - TODO: fix error for matrix storage of `str` (coming from elementatl combinations)
- Apply approximations in testing - Add angles to the tests - Add first prototyp of `decorator` checks
- TODO: extend testing - TODO: check `Cell` structure
- Fix minor bugs in src
- Fix bugs in src
I doubt it does not break anything, especially the class Example:
def method_a(self):
...
return self
def method_b(self):
...
return self
def method_c(self):
...
return self
example = Example()
example_2 = example.method_c().method_a().method_c().method_c() Did you check that the methods (besides normalize, in which I won't do it) you implemented this return, this kind of logic is what we want? |
positions = Quantity( | ||
type=np.float64, | ||
shape=['n_cell_points', 3], | ||
unit='meter', | ||
description=""" | ||
Positions of all the atoms in Cartesian coordinates. | ||
Positions of all the atoms in absolute, Cartesian coordinates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for exactly? I think perhaps the name should reflect the difference from the atomic_cell positions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for exactly?
With a (periodic) cell defined, you could either express the position vectors in terms of the lattice vector basis set, or the basis set in which the lattice vectors were defined (typically spanning a Cartesian space).
We end up going with the latter, since it simplifies large-scale manipulation. I think it's also the most common convention in ab initio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question was what is the difference between the positions stored here and the positions stored in atomic_cell? And can/should the attribute name here denote that difference (i.e., not just the description)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ow, yes, I see. Tbh, I think some confusion entered the setup. Cell
is both the parent to AtomicCell
, but seems to also contain a grid, i.e. n_cell_points
. The positions
and the rest refer to them, though my correction may be wrong.
Let me take a closer look. Chema did push some changes here. I'll get back to you, after merging those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JosePizarro3 could you pls clarify the meaning of Cell.n_cell_points
, especially wrt to AtomicCell
?
I actually added a note for you there when I asked for the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo in the description. This should be fixed.
I tried to break the geometric quantities of a cell and the specific atoms sitting in those positions. Cell
contains thus the lattice_vectors
and the 0-dimensional points in real space, i.e., positions
(and also velocities and other info about the Cell), while AtomsState
has the info of the atom sitting in that position.
So essentially, these are the same both in Cell
and AtomicCell
. The only addition of the atomic cell is the atoms state in each of these positions. Whether these positions
are occupied by entities which are atoms, this depends on the problem. It could be a more complex thing which is centered around them, and for modeling purposes is described like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So essentially, these are the same both in Cell and AtomicCell.
Great! That's what I had to know. Then I'll remove positions
and velocities
from AtomicCell
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo in the description.
Where?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo in the description.
Where?
Change atoms
for points
. And you can rename the quantity above from n_cell_points
to n_points
.
…labels` as per reviewer request
… feedback) - Add dihedrals to computation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good for now, just take a look at my comments.
positions = Quantity( | ||
type=np.float64, | ||
shape=['n_cell_points', 3], | ||
unit='meter', | ||
description=""" | ||
Positions of all the atoms in Cartesian coordinates. | ||
Positions of all the atoms in absolute, Cartesian coordinates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question was what is the difference between the positions stored here and the positions stored in atomic_cell? And can/should the attribute name here denote that difference (i.e., not just the description)?
Let me know if you need a review. I opened a question about returning self, and whether if this is what we want. |
- Resolve merge conflicts in `AtomicCell.to_ase_atoms()`
I held off on responding, until testing the normalization. No issues there. It's also unclear why any should occur.
Exactly, it's a way of supporting functional pipelines. I mentioned adding this 2x before. It's not really a big deal, but we can go over the points you raised.
There's no special logic going on here, just functional composition. I.e. the behavior of these functions does not change ('), and therefore it does not invalidate any old code. Regarding "good practices" (''), it's just supporting a more compact, often clearer style of writing code ('''). Unless we want to constrain devs to a very procedural style, there's no issue. Actually, these kinds of pipelines are very common in modules like (') 1 disclaimer: it does not change the behavior of functions that return ('') Good practices more so have to do with producing safe code (including CI/CD pipelines) and clear abstractions (supplemented with good docs). While style consistency can be conducive here, the real objective is readability. If a different style produces better readability in some subcases, then it's appropriate there. Python itself supports a mix of procedural and declaritive alternatives. (''') The nice aspect of writing code this way, is that you explain all the manipulations on top of an object in 1 line. You can now produce complex, but well-defined states without having to trigger the section's entire normalization (which might fail if some data is missing).
We can try it out here first. It would be good to apply it consistently, yes. |
I think you can simply leave it out. I really doubt this does not break the I also have my doubts on readability, as returning self and chaining methods are rather unreadable. The only benefit is just writing one-line code. But I guess computer scientists would have a different point of view, perhaps, on the fact that |
…those in `Cell`. - Rename `Cell.n_cell_points` to `Cell.n_objects`
- Update tests to be in-line with new elemental order (central atoms in the middle)
- TODO: add testing for dihedral angles - TODO: decide on binning
Btw, it seems to me that this feature would live better in the nomad-analysis plugin, but we can leave it here for now. |
- Migrate cutoff quantities to `GeometryDistribution` - Update neighbor_list in pipeline and normalization - Update docs - TODO: add bin units - TODO: verify neighbor_list elements - TODO: remove `check_attributes`
- clean up `to_ase_atoms`
- ensure that `neighbor_list`'s order matches `ase_atoms`
- Remove `type` in favor of `n_elem` - Reorganize `AtomicCell.normalize()`
This PR adds inter-atomic distances and angles, with optional extension to dihedral angles, to the NOMAD schema.
Schema
These distributions are stored in
AtomicCell.geometry_distributions: list[GeometryDistribution]
as a repeating subsection. Information on the (cutoffs, used to set up the) bins, is stored underAtomicCell.geometry_analysis_cutoffs
.GeometryDistribution
objects are effectively histograms by element pair / triple. This is a choice to facilitate searches and visualizations over the distribution themselves, which are normalized to reproduce the same frequency for primitive cells as supercells. Additional advantages include a limited storage consumption. It is, however, not suitable for extracting the exact atomic pairs / triples matching a given distance / angle.Note: triples / quadruples are defined around a specific geometric primitive (point / line piece). Their notation accounts for this, by placing them up front in the elements list.
Additional functionalities
To keep state within
AtomicCell
simple and exert control over the stages in calculation, pure Python helper classes with the following functions are added, in line with the Single Responsibility Principle:DistributionFactory
scans the combinatorial space of elements and generates theirDistribution
s.Distribution
leveragesase
for computing the distances / angles for the elemental combo provided. It can also produce aDistributionHistogram
of itself.DistributionHistogram
contain the histogram version of their combo distribution.While only
DistributionHistogram
will eventually be written out toGeometryDistribution
, the rest of the data artifacts are retained underAtomicCell
during run-time. As such, once computed, they can be used in other computation, or data analysis (conversion topandas.DataFrame
in a separate PR).Inclusion of the actual computation pipeline, meanwhile, is toggled via
AtomicCell.analyze_geometry: bool
.The execution is handled by the
normalizer
.Miscellaneous
To facilitate a cleaner check for missing NOMAD attributes, i.e. quantities, subsections, I added the
check_attributes
decorator. It is used inAtomicCell.set_ase_atoms
, as an example. If accepted, I recommend that it be reallocated toutils.py
.To support a more functional pipeline notation, next to the current imperative style, I make all computational functions and
normalize
returnself
. I checked with Area D that the latter does not violate thenormalize
interface specifications.P.s.: I'm considering adding this PR description as documentation. Let me know your thoughts.