This repository contains implementation, results and experimenal scripts for reliable Pseudo-Label Selection, as introduced in the paper "In all Likelihoods: How to Reliably Select Pseudo-Labeled Data for Self-Training in Semi-Supervised Learning". More specifically,
- R contains implementation of multi-model PLS, multi-lable (wieghted and unweighted) PLS and alternative PLS methods to benchmark against
- benchmarking provides files for experiments (section 5), in order to reproduce results, see setup below
- data contains real-world data used in experiments
- experimental results and visualization thereof will be saved in plots and results
- all results can be found in plots
- In order to reproduce experiments, please read setup further below
Banknote data (q = 3, subsample of size n = 160, share of unlabeled = 0.8)
Banknote data (q = 3, subsample of size n = 120, share of unlabeled = 0.8)
Banknote data (q = 3, subsample of size n = 80, share of unlabeled = 0.8)
Banknote data (q = 3, subsample of size n = 40, share of unlabeled = 0.8)
Mushrooms data (q =3, n = 120, share of unlabeled = 0.8)
Mushrooms data (q =3, n = 160, share of unlabeled = 0.8)
Mushrooms data (q =3, n = 200, share of unlabeled = 0.8)
Simulated data (q = 6, n = 60, share of unlabeled = 0.8)
Simulated data (q = 6, n = 100, share of unlabeled = 0.8)
Simulated data (q = 6, n = 140, share of unlabeled = 0.8)
Simulated data (q = 6, n = 160, share of unlabeled = 0.8)
Simulated data (q = 6, n = 180, share of unlabeled = 0.8)
Simulated data (q = 6, n = 200, share of unlabeled = 0.8)
Cars data (q =3, n = 32, share of unlabeled = 0.7)
Cars data (q =3, n = 32, share of unlabeled = 0.9)
Cars data (q =3, n = 32, share of unlabeled = 0.95)
First and foremost, please install all dependencies by sourcing this file.
Then download the implementations of BPLS with PPP and concurring PLS methods and save in a folder named "R":
- Supervised Baseline
- Probability Score
- Predictive Variance
- PPP (Bayes-optimal)
- PPP (Bayes-optimal) Bayesian Neural Net
- Likelihood (max-max)
- Utilities for PPP
In order to reproduce the papers' key results (and visualizations thereof) further download these scripts and save in respective folder:
- in folder analysis/
- in folder benchmarks/
- in folder benchmarks/experiments/
Eventually, download benchmarks/experiments_simulated_data.R and run from benchmarks/ (estimated runtime: 30 CPU hours)
Important: Create empty folders results and plots where experimental results will be stored automatically. In addition, you can access them as object after completion of the experiments.
- R 4.2.0
- R 4.1.6
- R 4.0.3
on
- Linux Ubuntu 20.04
- Linux Debian 10
- Windows 11 Pro Build 22H2
Additional experimental setups can now easily be created by modifying benchmarks/experiments_simulated_data.R
Find data and files to read in data in folder data.