You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This proposal aim to extend the work of the terminate() function by making it call a new failed_schema() function. Ideally this function should attempt to fetch the current available results. For example in the case of a timed-out MD, the traj and log should be read and put in a dictionary in the same way that this is already done.
To do what you're suggesting, we would need to try to prepare a schema, write it to disk, and then terminate. Output files will also not always be able to be parsed (e.g. if the calculator crashes instantly), and this would cause the schema generation to crash.
Indeed such function would need to be full or try/exception as no assumption is made on the current state of the calculation. My interpretation is that such function should not attempt to summarise results (no call to pymatgen etc...) but to barely read what is available: the calculation is not done. In the case of logfile and trajfile that's easy, the files are known. In the case of software specific files, it would be nice to come up with a solution to attempt to read them, for example by using #2407.
tomdemeyere
changed the title
[Proposal]: create a "failed schema" that is written in case of job failure.
[Proposal]: create a "failed_schema" summarising error and available results in case of job failure
Aug 10, 2024
This is certainly doable. That said, I would propose that this behavior is toggleable via the global settings, such as SETTINGS.STORE_FAILED_JOBS: bool = False. There are two reasons for this: 1) cloud databases like MongoDB are often limited on space, and storing failed calculations may not be desirable; 2) storing the failed jobs to disk and/or database would be a fairly notable breaking change since anyone querying their database for calculation results would now have to add an additional query that only selects "successful" calculations. Silently storing failed outputs in the database will cause downstream problems for people, so this would be an opt-in feature.
It does not seem terribly difficult to implement. The idea would basically be to use a more flexible version of quacc.schemas.ase.Summarize.run to parse the code's main log file along with the input Atoms and calculator parameters. We would also need to store the job state for all jobs (success or failure) so this can be queried. If the parse is unsuccessful (say it fails immediately due to some weird input parameters), then there is not much to store other than the input Atoms and the calculator parameters along with the job state.
Andrew-S-Rosen
changed the title
[Proposal]: create a "failed_schema" summarising error and available results in case of job failure
[Proposal]: Better handling of MD-type jobs where a termination may not be a failure
Aug 12, 2024
What new feature would you like to see?
This proposal aim to extend the work of the terminate() function by making it call a new
failed_schema()
function. Ideally this function should attempt to fetch the current available results. For example in the case of a timed-out MD, the traj and log should be read and put in a dictionary in the same way that this is already done.The problem, as mentioned by @Andrew-S-Rosen is that:
Indeed such function would need to be full or try/exception as no assumption is made on the current state of the calculation. My interpretation is that such function should not attempt to summarise results (no call to pymatgen etc...) but to barely read what is available: the calculation is not done. In the case of logfile and trajfile that's easy, the files are known. In the case of software specific files, it would be nice to come up with a solution to attempt to read them, for example by using #2407.
From the discussions in #2399
The text was updated successfully, but these errors were encountered: