This repository contains code and instructions for reproducing figures and numbers for the accompanying paper “Implicit kernel meta-learning using kernel integral forms” published as an oral at the proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2022). It also contains the necessary functions to generate the datasets (synthetic and real) used in the paper for benchmarking meta-learning algorithms and may be of interest to the meta-learning community independently of the algorithm.
There is a supplied conda environment file. Running
conda create --name ikml --file spec-file.txt
will install the conda environment with the necessary
dependencies into the conda environment ikml
.
All sections below assumes that you are in the ikml
conda environment
conda activate ikml
You also need to install the current package locally
pip install .
The preprint introduces two new meta-learning regression datasets
Air Quality
- Meta-learning tasks derived from Beijing air quality timeseries data spanning a few years
Gas Sensor
- Meta-learning tasks derived Experimental timeseries data from gas sensors
There is a supplied Makefile that automates most of the pulling and creating of the datasets. Please make sure you have a version of Make installed. Please change directory so that your current working directory is this git repository. All commands below assume that you are in the top level of the repository.
For running, and evaluating experiments I use Guild AI. While the experiments
can be run using python directly they been engineered with Guild in mind. The
configurable parameters for each experiment can be found in the ./guild.yml
file, leaving them as-is gives the experiments of the paper. You can see below
how to change the parameters of interest.
In a terminal where your current working directory is this git repo, run
make create_datasets
This will pull and build all of the datasets used for benchmarking.
Generate the data by choosing a list of input dimensions
L='[d_1, d_2, ..., d_L]'
note that this has to be a list even if it’s just one element in it. The plotting has been hardcoded to work with d being in {1, 2, 5, 10, 20, 30}, so if you want to generate the plots, only pick d from this set. From the command line run
guild run signal_recovery:bochner d=${L}
to generate the data for plotting.
After this has finished running you can look up the guild ID of the batch run
guild runs
which will output something like
[1:d9cd5613] signal_recovery:bochner 2021-03-07 13:26:52 completed X_bases_sigma=0.2 X_marginal_sigma=0.2 alpha_sigma=1.0 boch_hidden_d [2:365f3857] signal_recovery:bochner+ 2021-03-07 13:26:52 completed
and you want the ID of the batch run (you can notice it by the +
after the run
command), here 365f3857
.
You can generate the plots by running
python scripts/toy_regression/signal_recovery/get_learning_curves.py --guild_id 365f3857
and they can be found in plots/toy_regression/signal_recovery/bochner
.
Make sure that you have generated the datasets by following Creating the datasets.
With DATASET
being either air_quality
or gas_sensor
and
SEED='[seed_1, ..., seed_L]'
if you want to reproduce the results set
SEED='[1, 2, 3, 4, 5]'
Get results of IKML by running
guild run ${DATASET}:bochner_ikml seed=${SEED}
which will run IKML with the Bochner kernel over 5 independent runs. You can
also get the results of the other algorithm benchmarked, run guild operations
to
see all of the available options.
After retrieving the Guild ID for the batch run, denoted by ID
(see Synthetic Data if you don’t know
what this means), you can get the mean and 1 standard deviation of the
meta-{val, test} RMSEs by running
python scripts/get_risk.py --guild_id ${ID}
which will print the results.
To generate the plots run the algorithms on your dataset of choice. Consult
guild operations
to see how to run each algorithm on the dataset you want. The
plots can then be generated by running
python scripts/plot_learning_curves.py --mkl_id ${MKL_ID} \
--lsq_bias_id ${LSQ_BIAS_ID} \
--maml_id ${MAML_ID} \
--r2d2_id ${R2D2_ID} \
--gauss_id ${GAUSS_ID} \
--gauss_oracle_id ${GAUSS_ORACLE_ID} \
--bochner_id ${BOCHNER_ID} \
--y_upper_lim ${Y_UPPER_LIM} \
--y_lower_lim ${Y_LOWER_LIM} \
--output_dir ${OUTPUT_DIR}
where the IDs are the batch IDs generated from running guild
on the dataset over
a list of seeds. Note that leaving out an ID argument just leaves out that
algorithm from the plot, so it’s possible to plot a subset of the learning
curves. The --output_dir
argument is the name of the directory in plots
that the
plots will be saved to, and will be created if it doesn’t exist. The y limit
arguments allows to recreat the plots. For Air Quality
the lower and upper
limits are 10 and 60, while for Gas Sensor
they are 0 and 40.
If you want to ask a question or reach out to me feel free to use my academic
email address [email protected]
!
If you want to reference this work (please do!) use the following bibentry
@inproceedings{ falk2022implicit, title={Implicit kernel meta-learning using kernel integral forms}, author={John Isak Texas Falk and Carlo Ciliberto and massimiliano pontil}, booktitle={The 38th Conference on Uncertainty in Artificial Intelligence}, year={2022}, url={https://openreview.net/forum?id=rNgqwPUsqgq} }