[Proposal]: Better handling of MD-type jobs where a termination may not be a failure #2408

tomdemeyere · 2024-08-10T09:03:03Z

What new feature would you like to see?

This proposal aim to extend the work of the terminate() function by making it call a new failed_schema() function. Ideally this function should attempt to fetch the current available results. For example in the case of a timed-out MD, the traj and log should be read and put in a dictionary in the same way that this is already done.

The problem, as mentioned by @Andrew-S-Rosen is that:

To do what you're suggesting, we would need to try to prepare a schema, write it to disk, and then terminate. Output files will also not always be able to be parsed (e.g. if the calculator crashes instantly), and this would cause the schema generation to crash.

Indeed such function would need to be full or try/exception as no assumption is made on the current state of the calculation. My interpretation is that such function should not attempt to summarise results (no call to pymatgen etc...) but to barely read what is available: the calculation is not done. In the case of logfile and trajfile that's easy, the files are known. In the case of software specific files, it would be nice to come up with a solution to attempt to read them, for example by using #2407.

From the discussions in #2399

The text was updated successfully, but these errors were encountered:

Andrew-S-Rosen · 2024-08-10T15:12:08Z

This is certainly doable. That said, I would propose that this behavior is toggleable via the global settings, such as SETTINGS.STORE_FAILED_JOBS: bool = False. There are two reasons for this: 1) cloud databases like MongoDB are often limited on space, and storing failed calculations may not be desirable; 2) storing the failed jobs to disk and/or database would be a fairly notable breaking change since anyone querying their database for calculation results would now have to add an additional query that only selects "successful" calculations. Silently storing failed outputs in the database will cause downstream problems for people, so this would be an opt-in feature.

It does not seem terribly difficult to implement. The idea would basically be to use a more flexible version of quacc.schemas.ase.Summarize.run to parse the code's main log file along with the input Atoms and calculator parameters. We would also need to store the job state for all jobs (success or failure) so this can be queried. If the parse is unsuccessful (say it fails immediately due to some weird input parameters), then there is not much to store other than the input Atoms and the calculator parameters along with the job state.

tomdemeyere added the enhancement New feature or request label Aug 10, 2024

tomdemeyere changed the title ~~[Proposal]: create a "failed schema" that is written in case of job failure.~~ [Proposal]: create a "failed_schema" summarising error and available results in case of job failure Aug 10, 2024

Andrew-S-Rosen mentioned this issue Aug 10, 2024

[feature request] be able to perform continuation with Quacc #2399

Closed

Andrew-S-Rosen changed the title ~~[Proposal]: create a "failed_schema" summarising error and available results in case of job failure~~ [Proposal]: Better handling of MD-type jobs where a termination may not be a failure Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal]: Better handling of MD-type jobs where a termination may not be a failure #2408

[Proposal]: Better handling of MD-type jobs where a termination may not be a failure #2408

tomdemeyere commented Aug 10, 2024 •

edited

Loading

Andrew-S-Rosen commented Aug 10, 2024 •

edited

Loading

[Proposal]: Better handling of MD-type jobs where a termination may not be a failure #2408

[Proposal]: Better handling of MD-type jobs where a termination may not be a failure #2408

Comments

tomdemeyere commented Aug 10, 2024 • edited Loading

What new feature would you like to see?

Andrew-S-Rosen commented Aug 10, 2024 • edited Loading

tomdemeyere commented Aug 10, 2024 •

edited

Loading

Andrew-S-Rosen commented Aug 10, 2024 •

edited

Loading