Make DefectThermodynamics initialisation (defect entry sorting) mor…

…e efficient, and update `TODO`s
SMTG-Bham · Dec 17, 2023 · cdfe532 · cdfe532
1 parent cefd810
commit cdfe532
Show file tree

Hide file tree

Showing 4 changed files with 55 additions and 66 deletions.
diff --git a/docs/Dev_ToDo.md b/docs/Dev_ToDo.md
@@ -71,8 +71,6 @@
 - Automate `pydefect` shallow defect analysis? At least have notebook showing how to manually do this (Adair's done before?).
   - Should tag parsed defects with `is_shallow` (or similar), and then omit these from plotting/analysis
     (and note this behaviour in examples/docs)
-- Change formation energy plotting and tabulation to `DefectPhaseDiagram` methods rather than standalone
-  functions – with `pymatgen` update what's the new architecture?
 - Better automatic defect formation energy plot colour handling (auto-change colormap based on number of defects, set similar colours for similar defects (types and inequivalent sites)) – and more customisable?
   - `aide` labelling of defect species in formation energy plots? See `labellines` package for this (as used in `pymatgen-analysis-defects` chempots plotting)
   - Ordering of defects plotted (and thus in the legend) should be physically relevant (whether by energy, or defect type etc.)
@@ -88,19 +86,13 @@
   workflow which ppl often mess up. Can use modified code from `config-coord-plots` (but actually to
   scale and automatically/sensibly parsed etc.)(also see `CarrierCapture` functionalities)
 - Option for degeneracy-weighted ('reduced') formation energy diagrams, similar to reduced energies in SOD. See Slack discussion and CdTe pyscfermi notebooks.
-- Brouwer diagrams. Also see Fig. 6a of the `AiiDA-defects` preprint, want plotting tools like this (some could be PR'd to `py-sc-fermi`)
-- Functions to output data and python objects to plug and play with `py-sc-fermi`, `AiiDA`, `CarrierCapture`.
-  - Alex Squires has functions/notebooks from Seán (-> Xinwei -> Jiayi) -> Alex, for transferring to
-    `py-sc-fermi` and generating nice plots with the outputs, so add this and make it our suggested
-    workflow in the docs etc.
-  - `py-sc-fermi` may have functionality for dealing with complex defect concentrations in the future
-    (see Slack with Alex; 07/06/23)
-- `pydefect` integration, so we can use:
+- `pydefect` integration? So we can use:
   - Handling of shallow defects
   - Readily automated with `vise` if one wants (easy high-throughput and can setup primitive calcs (BS, DOS, dielectric).
   - Some nice defect structure and eigenvalue analysis
   - GKFO correction
 - Showcase `py-sc-fermi` plotting (e.g. from thesis notebook) using `interface` functionality. When doing, add CdTe data as test case for this part of the code. Could also add an optional right-hand-side y-axis for defect concentration (for a chosen anneal temp) to our TLD plotting (e.g. `concentration_T = None`) as done for thesis, noting in docstring that this obvs doesn't account for degeneracy! Also carrier concentration vs Fermi level plots as done in the Kumagai PRX paper? (once properly integrated, add and ask Alex to check/test?)
+  Brouwer diagrams; show examples of these in docs using `py-sc-fermi` interface tools. Also see Fig. 6a of the `AiiDA-defects` preprint, want plotting tools like this (some could be PR'd to `py-sc-fermi`)
 
 ## Housekeeping
 - Clean `README` with bullet-point summary of key features, and sidebar like `SnB`. Add correction plots and other example outputs (see MRS poster for this).
@@ -130,7 +122,7 @@
     possibly `NKRED` etc. (make a function to generate `vasp_ncl` calculation files with `ISMEAR = -5`, with option to set different kpoints) - if `ISMEAR = 0` - converged kpoints still prohibitively large, use vasp_converge_files again to check for quicker convergence with ISMEAR = -5.
   - Use `NKRED = 2` for `vasp_ncl` chempot calcs, if even kpoints and over 4. Often can't use `NKRED` with `vasp_std`, because we don't know beforehand the kpts in the IBZ (because symmetry on for `vasp_std` chempot calcs)(same goes for `EVENONLY = True`).
   - Readily-usable in conjunction with `atomate`, `AiiDA`(-defects), `CarrierCapture`, and give some
-    examples. Add as optional dependencies.
+    quick examples? Add as optional dependencies.
   - Workflow diagram with: https://twitter.com/Andrew_S_Rosen/status/1678115044348039168?s=20
   - Note about `ISPIN = 1` for even no. of electrons defect species, **if you're sure there's no
     magnetic ordering!** – which you can check in the `OUTCAR` by looking at `magnetization (x)` `y`

diff --git a/doped/analysis.py b/doped/analysis.py
@@ -613,10 +613,10 @@ def __init__(
                 Filename to save the parsed defect entries dict (`DefectsParser.defect_dict`)
                 to in `output_path`, to avoid having to re-parse defects when later analysing
                 further and aiding calculation provenance. Can be reloaded using the `loadfn`
-                function from `monty.serialization` as shown in the docs, or
-                `DefectThermodynamics.from_json()`. If None (default), set as
-                "{Chemical Formula}_defect_dict.json" where {Chemical Formula} is the
-                chemical formula of the host material. If False, no json file is saved.
+                function from `monty.serialization` (and then input to `DefectThermodynamics`
+                etc). If None (default), set as "{Chemical Formula}_defect_dict.json" where
+                {Chemical Formula} is the chemical formula of the host material.
+                If False, no json file is saved.
 
         Attributes:
             defect_dict (dict):
@@ -625,7 +625,6 @@ def __init__(
                 defect calculation folder name (_if it is a recognised defect name_),
                 else it is set to the default `doped` name for that defect.
         """
-        # TODO: Need to add `DefectThermodynamics.from_json()` etc methods as mention in docstring here
         self.output_path = output_path
         self.dielectric = dielectric
         self.skip_corrections = skip_corrections

diff --git a/doped/generation.py b/doped/generation.py
@@ -2212,35 +2212,26 @@ def _sort_defect_entries(defect_entries_dict, element_list=None, symm_ops=None):
     except ValueError as value_err:
         # possibly defect entries with names not in doped format, try sorting without using name:
         try:
+
+            def _defect_entry_sorting_func(defect_entry):
+                name_from_defect = get_defect_name_from_defect(
+                    defect_entry.defect,
+                    element_list=element_list,
+                    symm_ops=symm_ops,
+                )
+                return (
+                    defect_entry.defect.defect_type.value,
+                    element_list.index(_first_and_second_element(defect_entry.defect.name)[0]),
+                    element_list.index(_first_and_second_element(defect_entry.defect.name)[1]),
+                    # name without charge:
+                    name_from_defect,
+                    defect_entry.charge_state,  # charge state
+                )
+
             return dict(
                 sorted(
                     defect_entries_dict.items(),
-                    key=lambda s: (
-                        s[1].defect.defect_type.value,
-                        element_list.index(
-                            _first_and_second_element(
-                                get_defect_name_from_defect(
-                                    s[1].defect,
-                                    element_list=element_list,
-                                    symm_ops=symm_ops,
-                                )
-                            )[0]
-                        ),
-                        element_list.index(
-                            _first_and_second_element(
-                                get_defect_name_from_defect(
-                                    s[1].defect,
-                                    element_list=element_list,
-                                    symm_ops=symm_ops,
-                                )
-                            )[1]
-                        ),
-                        # name without charge:
-                        get_defect_name_from_defect(
-                            s[1].defect, element_list=element_list, symm_ops=symm_ops
-                        ),
-                        s[1].charge_state,  # charge state
-                    ),
+                    key=lambda s: _defect_entry_sorting_func(s[1]),  # sort by defect entry object
                 )
             )
         except ValueError:

diff --git a/doped/thermodynamics.py b/doped/thermodynamics.py
@@ -26,20 +26,10 @@
 from doped.generation import _sort_defect_entries
 from doped.utils.plotting import _TLD_plot
 
-# TODO: Previous `pymatgen` issues, fixed?
-#   - Currently the `PointDefectComparator` object from `pymatgen.analysis.defects.thermodynamics` is
-#     used to group defect charge states for the transition level plot / transition level map outputs.
-#     For interstitials, if the closest Voronoi site from the relaxed structure thus differs significantly
-#     between charge states, this will give separate lines for each charge state. This is kind of ok,
-#     because they _are_ actually different defect sites, but should have intelligent defaults for dealing
-#     with this (see `TODO` in `thermo_from_defect_dict` in `analysis.py`; at least similar colours for
-#     similar defect types, an option to just show amalgamated lowest energy charge states for each
-#     _defect type_). NaP is an example for this - should have a test built for however we want to handle
-#     cases like this. See Ke's example case too with different interstitial sites.
-#   - GitHub issue related to `DefectThermodynamics`: https://github.com/SMTG-Bham/doped/issues/3 -> Think
-#     about how we want to refactor the `DefectThermodynamics` object!
-#   - Note that if you edit the entries in a DefectThermodynamics after creating it, you need to
-#     `thermo.find_stable_charges()` to update the transition level map etc.
+# TODO: Add dpd_all_transition_levels function to dope_stuff (or wherever these functions go when
+#  modularised), which iteratively prints all potential charge transition levels for the input
+#  DefectPhaseDiagram object, including metastable states. Template using the dpd_transition_levels
+#  function that's already in dope_stuff; see https://github.com/SMTG-Bham/doped/issues/3 for code
 
 
 def bold_print(string: str) -> None:
@@ -117,7 +107,7 @@ class DefectThermodynamics(MSONable):
 
     def __init__(
         self,
-        defect_entries: List[DefectEntry],
+        defect_entries: Union[List[DefectEntry], Dict[str, DefectEntry]],
         chempots: Optional[Dict] = None,
         el_refs: Optional[Dict] = None,
         vbm: Optional[float] = None,
@@ -133,13 +123,13 @@ def __init__(
 
 
         Args:
-            defect_entries ([DefectEntry]):
-                A list of DefectEntry objects. Note that `DefectEntry.name` attributes
-                 are used for grouping and plotting purposes! These should end be in
-                 the format "{defect_name}_{optional_site_info}_{charge_state}". If
-                 the DefectEntry.name attribute is not defined or does not end with
-                 the charge state, then the entry will be renamed with the doped
-                 default name.
+            defect_entries ([DefectEntry] or {str: DefectEntry}):
+                A list or dict of DefectEntry objects. Note that `DefectEntry.name`
+                attributes are used for grouping and plotting purposes! These should
+                be in the format "{defect_name}_{optional_site_info}_{charge_state}".
+                If the DefectEntry.name attribute is not defined or does not end with
+                the charge state, then the entry will be renamed with the doped
+                default name.
             chempots (dict):
                 Dictionary of chemical potentials to use for calculating the defect
                 formation energies. This can have the form of
@@ -176,6 +166,8 @@ def __init__(
                 If None (default), will use "gap" from the calculation_metadata
                 dict attributes of the DefectEntry objects in `defect_entries`.
         """
+        if isinstance(defect_entries, dict):
+            defect_entries = list(defect_entries.values())
         self.defect_entries = defect_entries
         self.chempots, self.el_refs = _parse_chempots(chempots, el_refs)
 
@@ -364,8 +356,6 @@ def from_defect_dict(
         Returns:
             doped DefectThermodynamics object (DefectThermodynamics)
         """
-        # TODO: Can we make the thermo generation more efficient? What's the bottleneck in it's
-        #  initialisation? `pymatgen` site-matching that can be avoided?
         # TODO: (1) refactor the site-matching to just use the already-parsed site positions, and then
         #  merge interstitials according to this algorithm:
         # 1. For each interstitial defect type, count the number of parsed calculations per charge
@@ -379,6 +369,18 @@ def from_defect_dict(
         #  (2) optionally retain/remove unstable (in the gap) charge states (rather than current
         #  default range of (VBM - 1eV, CBM + 1eV))...
         # When doing this, add DOS object attribute, to then use with Alex's doped - py-sc-fermi code.
+
+        # Related TODO: Previous `pymatgen` issues, fixed?
+        # - Currently the `PointDefectComparator` object from `pymatgen.analysis.defects.thermodynamics`
+        #   is used to group defect charge states for the transition level plot / transition level map
+        #   outputs. For interstitials, if the closest Voronoi site from the relaxed structure thus
+        #   differs significantly between charge states, this will give separate lines for each charge
+        #   state. This is kind of ok, because they _are_ actually different defect sites, but should
+        #   have intelligent defaults for dealing with this (at least similar colours for similar defect
+        #   types, an option to just show amalgamated lowest energy charge states for each _defect type_).
+        #   NaP is an example for this - should have a test built for however we want to handle cases like
+        #   this. See Ke's example case too with different interstitial sites.
+
         # TODO: Should loop over input defect entries and check that the same bulk (energy and
         #  calculation_metadata) was used in each case (by proxy checks that same bulk/defect
         #  incar/potcar/kpoints settings were used in all cases, from each bulk/defect combination being
@@ -423,6 +425,11 @@ def _parse_transition_levels(self):
         charge state, then the entry will be renamed with the doped default
         name.
 
+        Note that if the entries in DefectThermodynamics are edited after
+        initialisation, then DefectThermodynamics._parse_transition_levels()
+        should be called if the attributes (transition_level_map etc.) and
+        outputs (plotting and tabulation) are to be updated properly.
+
         Code for parsing the transition levels was originally templated from
         the pyCDT (pymatgen<=2022.7.25) thermodynamics code (deleted in later
         versions).