diff --git a/docs/baseline.md b/docs/baseline.md index 2c69a27eb..d9ff12bc3 100644 --- a/docs/baseline.md +++ b/docs/baseline.md @@ -1,4 +1,4 @@ -# ToyMix Baseline +# ToyMix Baseline - Test set metrics From the paper to be released soon. Below, you can see the baselines for the `ToyMix` dataset, a multitasking dataset comprising of `QM9`, `Zinc12k` and `Tox21`. The datasets and their splits are available on [this link](https://zenodo.org/record/7998401). The following baselines are all for models with ~150k parameters. @@ -25,6 +25,7 @@ One can observe that the smaller datasets (`Zinc12k` and `Tox21`) beneficiate fr | | GINE | 0.201 ± 0.007 | 0.783 ± 0.007 | 0.345 ± 0.02 | 0.177 ± 0.0008 | 0.836 ± 0.004 | **0.455 ± 0.008** | # LargeMix Baseline +## LargeMix test set metrics From the paper to be released soon. Below, you can see the baselines for the `LargeMix` dataset, a multitasking dataset comprising of `PCQM4M_N4`, `PCQM4M_G25`, `PCBA_1328`, `L1000_VCAP`, and `L1000_MCF7`. The datasets and their splits are available on [this link](https://zenodo.org/record/7998401). The following baselines are all for models with 4-6M parameters. @@ -58,6 +59,7 @@ While `PCQM4M_G25` has no noticeable changes, the node predictions of `PCQM4M_N4 | | GIN | 0.1862 ± 0.0003 | 0.6202 ± 0.0091 | 0.3876 ± 0.0017 | 0.1874 ± 0.0013 | 0.6367 ± 0.0066 | **0.4198 ± 0.0036** | | | GINE | **0.1856 ± 0.0005** | 0.6166 ± 0.0017 | 0.3892 ± 0.0035 | 0.1873 ± 0.0009 | 0.6347 ± 0.0048 | 0.4177 ± 0.0024 | +## LargeMix training set loss Below is the loss on the training set. One can observe that the multi-task model always underfits the single-task, except on the two `L1000` datasets.