Trying to load a partially recorded benchmark from disk raises ValueError #134

janosh · 2022-05-03T08:19:33Z

This code

benchmark_path = "tmp-benchmark.json"
mbbm = MatbenchBenchmark.from_file(benchmark_path)

raises

ValueError: Cannot validate task matbench_jdft2d unless all folds recorded!; folds [0, 2, 3, 4] not recorded!

if

mbbm.to_file(benchmark_path)

was previously written to disk with only some folds recorded.

For the purpose of splitting folds into slurm array jobs, it would be very useful if it was possible to read and write partial benchmarks. I tried commenting out the validation line

matbench/matbench/task.py

Line 214 in c3b910e

obj.validate()

and everything appears to be working fine. The line probably has a purpose but perhaps that could be achieved differently while also allowing partial benchmark writing?

The text was updated successfully, but these errors were encountered:

ardunn · 2022-08-19T00:16:10Z

This is a good point. I never really considered people would be using Matbench in a parallel fashion but now that they are it makes sense to think of a more comprehensive and robust solution. I'll do some thinking on my side but if you have ideas for how to do this while still allowing validation on loading I'm open to suggestions

Something off the top of my head is just introducing a conditional that will validate only if all folds are recorded. The purpose of the validation is to really check for any possible errors before it is saved as a complete benchmark (and used by the doc builder to actually create the docs) to avoid downstream debugging chaos. But I can't immediately forsee any scenario where the benchmark checker would allow an incomplete task without error, so maybe just a simple conditional would work.

janosh · 2022-09-22T22:47:55Z

Yes, only doing the validation once all folds are recorded would be a good solution. 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to load a partially recorded benchmark from disk raises ValueError #134

Trying to load a partially recorded benchmark from disk raises ValueError #134

janosh commented May 3, 2022

ardunn commented Aug 19, 2022

janosh commented Sep 22, 2022

Trying to load a partially recorded benchmark from disk raises ValueError #134

Trying to load a partially recorded benchmark from disk raises ValueError #134

Comments

janosh commented May 3, 2022

ardunn commented Aug 19, 2022

janosh commented Sep 22, 2022