Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/relax and bands wc #254

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Conversation

bosonie
Copy link
Collaborator

@bosonie bosonie commented Dec 16, 2021

This PR introduce a code agnostic workchain that performs relaxation and subsequently the calculation of bands.
It also has the possibility to use seekpath to generates the high symmetries kpoints path for bands. This process might change the structure, therefore, when seekpath is called, a further scf step is needed on the nex structure. This is automatically handled by the workchain. The inputs for the further scf are exposed. They are the inputs of a CommonRelaxInpuGenerator, but where the default relaxation_type is NONE and all the other defaults have been removed. If not defined by user, the same inputs of the first relaxation are used (except the new default for relaxation_type).

Limitation of the current implementation (will be improved later):

  1. For the moment the inputs of relaxation and bands calculations are the same as the inputs of
    CommonRelaxInputGenerator and CommonBandsInputGenerator respectively and they are non_db.
    So their value will not be stored as inputs of the workchain.
  2. For the moment it is not possible to run relaxation with one plugin and bands with another.
  3. The possibility of overrides to the code-dependent inputs is not possible for the moment. Discussions are ongoing on how to improve it.

the relaxation and subsequently the calculation of bands.
It expose inputs of the `CommonRelaxInputGenerator` and makes them "non_db"
so that the ports of `CommonRelaxInputGenerator` can be used direcly with
no modifications.
Once an automatic generation of the kpoints path for bands is
performed by SeeKpath, this might refere to a conventional
structure that is different from the input one. For this reason it is necessary
to run a further scf before calculating bands.
The inputs for the further scf are now exposed. They are the inputs of a
`CommonRelaxInpuGenerator`, but where the default relaxation_type is NONE and
all the other defaults have been removed.
If not defined by user, the same inputs of the first relaxation are used
(except the new default for relaxation_type)
@bosonie bosonie requested a review from sphuber December 16, 2021 10:09
@bosonie
Copy link
Collaborator Author

bosonie commented Dec 16, 2021

@sphuber I would like especially a feedback on how to handle defaults of the second_relax_inputs. I thought that using populate_defaults = False was enough for the exposed inputs to forget the previous defaults. This is not the case. I had to manually force them to be () which is equal to UNSPECIFIED in the current plumpy implementation (https://github.com/aiidateam/plumpy/blob/develop/plumpy/ports.py). Is this correct? Seems fragile.

@bosonie
Copy link
Collaborator Author

bosonie commented Dec 16, 2021

I will also now write few tests

@sphuber
Copy link
Collaborator

sphuber commented Dec 16, 2021

@sphuber I would like especially a feedback on how to handle defaults of the second_relax_inputs.

Why do you want to get rid of the defaults?

I thought that using populate_defaults = False was enough for the exposed inputs to forget the previous defaults.

Nope, that is not what it is supposed to do. Setting populate_defaults = False when exposing the inputs of a process is only used when the process is actually instantiated. This calls PortnameSpace.pre_process which will normally fill in the defaults in the inputs it received for the ports that were not explicitly specified. But for some cases this is not what you want. If you make a subprocess completely optional and rely on determining whether it should be run based on whether any inputs were defined, you don't want the defaults to be automatically added, because this means it will always be run, even if the user didn't specify anything.

@bosonie
Copy link
Collaborator Author

bosonie commented Dec 16, 2021

If you make a subprocess completely optional and rely on determining whether it should be run based on whether any inputs were defined, you don't want the defaults to be automatically added, because this means it will always be run, even if the user didn't specify anything.

This is exactly what I want. But unfortunately what I see is different. I set all the inputs to have populate_defaults = False but the inputs are filled in anyway when the process is instantiated! Is it again a misbehavior of the particular implementation we have here? I investigate more..

@sphuber
Copy link
Collaborator

sphuber commented Dec 17, 2021

This is exactly what I want. But unfortunately what I see is different. I set all the inputs to have populate_defaults = False but the inputs are filled in anyway when the process is instantiated! Is it again a misbehavior of the particular implementation we have here? I investigate more..

But populate_defaults is a property of the namespace not the port. You are now setting it to the ports. You are also still using the manual create_namespace and absorb. We already discussed during the coding week that you should just use expose_inputs. There you can pass the populate_defaults in the namespace options, i.e.:

spec.expose_inputs(CommonRelaxInputGenerator, namespace='relax', exclude=('structure',),
            namespace_options={'required': False, 'populate_defaults': False,
            'help': 'Inputs for the `CommonRelaxInputGenerator`, if not specified at all, the relaxation step is skipped.'})

@bosonie
Copy link
Collaborator Author

bosonie commented Dec 17, 2021

Ok, thanks @sphuber, I understand, but then this does not cover my use case, since I want one input in the namespace to be populated, and the others no. How do you do that?
I will change to expose_inputs in the next commit

Copy link
Collaborator

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bosonie , have some comments on design and implementation

aiida_common_workflows/workflows/relax_and_bands.py Outdated Show resolved Hide resolved
Comment on lines 155 to 156
namspac = spec.inputs.create_port_namespace('relax_inputs')
namspac.absorb(CommonRelaxInputGenerator.spec().inputs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use expose_inputs. If the namespace is fully optional, you set populate_defaults = False:

        spec.expose_inputs(
            CommonRelaxInputGenerator,
            namespace='relax',
            exclude=('structure',),
            namespace_options={'populate_defaults': False, 'required': False}
        )
        spec.expose_inputs(
            CommonRelaxInputGenerator,
            namespace='scf',
            exclude=('structure',),
            namespace_options={'populate_defaults': False, 'required': False}
        )
        spec.expose_inputs(
            CommonBandsInputGenerator,
            namespace='bands'
        )


namspac = spec.inputs.create_port_namespace('relax_inputs')
namspac.absorb(CommonRelaxInputGenerator.spec().inputs)
namspac['protocol'].non_db = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this manually, I would write a recursive function and call it:

def set_port_non_db(port):
    """Set ``non_db =  True`` for the given port.

    .. note:: If ``port``, is a port namespace, the function will be called recursively on all sub ports.

    :param port: a port or port namespace.
    """
    if isinstance(port, PortNamespace):
        for subport in port.values():
            set_port_non_db(subport)
    elif isinstance(port, InputPort):
        port.non_db = True

Then in the define method, you simply call the method:

set_port_non_db(spec.inputs)

aiida_common_workflows/workflows/relax_and_bands.py Outdated Show resolved Hide resolved
aiida_common_workflows/workflows/relax_and_bands.py Outdated Show resolved Hide resolved
namspac3[key].populate_defaults = False
namspac3[key].default = ()
namspac3['relax_type'].required = True
namspac3['relax_type'].default = RelaxType.NONE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't rely on this through a default because then a user can run anything else. We should always set this no? So just set it in the workchain step when preparing the inputs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was against the idea to fully expose the inputs of the sub-processes, but ok, this makes sense. And also avoids the problem of my last comments above.

Comment on lines 201 to 211
spec.outline(
cls.initialize,
cls.run_relax,
cls.prepare_bands,
if_(cls.should_run_other_scf)(
cls.fix_inputs,
cls.run_relax
),
cls.run_bands,
cls.inspect_bands
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was wondering if we should have a slightly different outline:

Suggested change
spec.outline(
cls.initialize,
cls.run_relax,
cls.prepare_bands,
if_(cls.should_run_other_scf)(
cls.fix_inputs,
cls.run_relax
),
cls.run_bands,
cls.inspect_bands
)
spec.outline(
cls.setup,
if_(cls.should_run_relax)(
cls.run_relax,
cls.inspect_relax,
),
if_(cls.should_run_scf)(
cls.run_seekpath,
cls.run_scf,
cls.inspect_scf,
),
cls.run_bands,
cls.inspect_bands,
cls.results,
)

I think this is a bit more explicit and clear as to what will happen.

This would make the relax optional, which is still useful, because the user can then compute the bands for an already optimized structure but be sure that the bands are computed in exactly the same way as if it were to include relaxation.

I would also call the second relax run an SCF, because that is what it is, even though we are using the RelaxWorkflow but that is merely because we decided that this is the way an SCF is run.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am sensing a conceptual difference here. In my mind the user would NOT have the possibility to decide whether to run an extra scf or not. To determine the presence of an extra step is only the fact that seekpath was used (and in the future maybe the decision to us a different code for bands compared to relax). The user could only select some inputs for this step.

Your idea only gives the additional flexibility to run an extra scf even when it is not necessary (when seekpath is not used) and I believe it is a valid use case.

However I have strong doubts about your other suggestion to make the relaxation optional

This would make the relax optional, which is still useful, because the user can then compute the bands for an already optimized structure but be sure that the bands are computed in exactly the same way as if it were to include relaxation.

This can be done already setting the relax_type to None in the relax input. So it is not needed. I do not see any other case when this is useful and the price to pay is very high. In fact I believe that making both the relaxstep and the scf step optional is difficult to support. How do we understand that the user wants only the scf and no the relaxation? If we make populate_defaults = False and required = False for both relaxation and scf, then we have to implement additional logic that checks that one of the two is set, and that the one selected sets at least the structure, protocol, engines. Basically it makes useless to expose the inputs since None of the properties of the inputs port can be used, they must be implemented all over again. It seems to me an additional complication that has no motivation to exist.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that making both optional adds more complexity. But I don't understand your current design. If I understand correctly, in prepare_bands you check whether explicit kpoints were specified in the bands option, in which case you don't rerun the relax workflow as scf. But the first relax workflow and seekpath are always run. So it is practically guaranteed that the structure changes, even if the user sets RelaxType.NONE. Even if the relax doesn't change the structure, it is possible that seekpath still normalizes the structure, which will cause the k-points to longer be commensurate.

I guess what I am saying is that we should make the relax+seekpath optional, or otherwise allowing to directly specify k-points doesn't make any sense. Or we just require relax+seekpath+scf always and just don't support the user specifying their own k-points. At least that would be consistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. The relaxation always run. The use of SeeKpath is optional. If you set a kpoints for bands in input, SeekPath is not used. This is a valid use case, for instance, I know the kpoints path along which I want to calculate the bands, and I also want a relaxation of just the atoms coordinates. Of course it does not make much sense to specify a kpoints path and also allow relaxation with variable cell, since you will not know the final cell shape. But the rest is perfectly fine.

When you DO NOT use Seekpath (i.e. you specify kpoints for bands in input), you do not need an extra scf. In fact, you pass to the bands calculation the remote_folder of the relaxation and this guarantees to copy the correct final structure and the density matrix / wave-functions. However, based to your suggestion, I understood that the possibility for the user to trigger an extra scf anyway, is a valid use case. For instance a user wants to do a final scf with "precise" protocol before calculating bands.

To summarize, three options are available

  1. relax+seekpath+scf+bands
  2. relax+kp_in_input+bands
  3. relax+kp_in_input+scf+bands

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course it does not make much sense to specify a kpoints path and also allow relaxation with variable cell, since you will not know the final cell shape. But the rest is perfectly fine.

This is what was confusing me. Should we maybe have validation in there to prevent this? If you specify kpoints, then the relax cannot change the cell.

To summarize, three options are available

relax+seekpath+scf+bands
relax+kp_in_input+bands
relax+kp_in_input+scf+bands

This is great, this makes things a lot clearer. I would propose to at least describe this in the docstring of the workchain. And then maybe adapt the outline to the following that makes it a lot clearer what is happening:

        spec.outline(
            cls.setup,
            cls.run_relax,
            cls.inspect_relax,
            if_(cls.should_run_seekpath)(
                cls.run_seekpath,
                cls.run_scf,
                cls.inspect_scf,
            ).elif_(cls.should_run_scf)(
                cls.run_scf,
                cls.inspect_scf,
            ),
            cls.run_bands,
            cls.inspect_bands,
            cls.results,
        )

aiida_common_workflows/workflows/relax_and_bands.py Outdated Show resolved Hide resolved
@bosonie
Copy link
Collaborator Author

bosonie commented Dec 17, 2021

I addressed almost all your comments @sphuber. Two are open for discussion. Thanks for your review. However I found another problematic behavior of the ports. It seems to me that the populate_default = False is ignored in some cases. I will report about that somewhere as soon as I understood correctly. So you can comment on the new implementation, but even if it is ok, do not merge please.

@bosonie
Copy link
Collaborator Author

bosonie commented Dec 17, 2021

The issue I found is here aiidateam/aiida-core#5286

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants