In all Likelihoods

Robust Selection of Pseudo-Labeled Data

Introduction, Table of Contents

This repository contains implementation, results and experimenal scripts for reliable Pseudo-Label Selection, as introduced in the paper "In all Likelihoods: How to Reliably Select Pseudo-Labeled Data for Self-Training in Semi-Supervised Learning". More specifically,

R contains implementation of multi-model PLS, multi-lable (wieghted and unweighted) PLS and alternative PLS methods to benchmark against
benchmarking provides files for experiments (section 5), in order to reproduce results, see setup below
data contains real-world data used in experiments
experimental results and visualization thereof will be saved in plots and results
all results can be found in plots
In order to reproduce experiments, please read setup further below

Results

Banknote data

Banknote data (q = 3, subsample of size n = 160, share of unlabeled = 0.8)

Banknote data (q = 3, subsample of size n = 120, share of unlabeled = 0.8)

Banknote data (q = 3, subsample of size n = 80, share of unlabeled = 0.8)

Banknote data (q = 3, subsample of size n = 40, share of unlabeled = 0.8)

Mushrooms data

Mushrooms data (q =3, n = 120, share of unlabeled = 0.8)

Mushrooms data (q =3, n = 160, share of unlabeled = 0.8)

Mushrooms data (q =3, n = 200, share of unlabeled = 0.8)

Simulated data

Simulated data (q = 6, n = 60, share of unlabeled = 0.8)

Simulated data (q = 6, n = 100, share of unlabeled = 0.8)

Simulated data (q = 6, n = 140, share of unlabeled = 0.8)

Simulated data (q = 6, n = 160, share of unlabeled = 0.8)

Simulated data (q = 6, n = 180, share of unlabeled = 0.8)

Simulated data (q = 6, n = 200, share of unlabeled = 0.8)

Cars data

Cars data (q =3, n = 32, share of unlabeled = 0.7)

Cars data (q =3, n = 32, share of unlabeled = 0.9)

Cars data (q =3, n = 32, share of unlabeled = 0.95)

Setup

First and foremost, please install all dependencies by sourcing this file.

Then download the implementations of BPLS with PPP and concurring PLS methods and save in a folder named "R":

Supervised Baseline
Probability Score
Predictive Variance
PPP (Bayes-optimal)
PPP (Bayes-optimal) Bayesian Neural Net
Likelihood (max-max)
Utilities for PPP

In order to reproduce the papers' key results (and visualizations thereof) further download these scripts and save in respective folder:

in folder analysis/
- analysis and visualization
in folder benchmarks/
- global setup of experiments
in folder benchmarks/experiments/

Eventually, download benchmarks/experiments_simulated_data.R and run from benchmarks/ (estimated runtime: 30 CPU hours)

Important: Create empty folders results and plots where experimental results will be stored automatically. In addition, you can access them as object after completion of the experiments.

Tested with

R 4.2.0
R 4.1.6
R 4.0.3

on

Linux Ubuntu 20.04
Linux Debian 10
Windows 11 Pro Build 22H2

Further experiments

Additional experimental setups can now easily be created by modifying benchmarks/experiments_simulated_data.R

Data

Find data and files to read in data in folder data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

In all Likelihoods

Robust Selection of Pseudo-Labeled Data

Introduction, Table of Contents

Results

Banknote data

Mushrooms data

Simulated data

Cars data

Setup

Tested with

Further experiments

Data

Files

README.md

Latest commit

History

README.md

File metadata and controls

In all Likelihoods

Robust Selection of Pseudo-Labeled Data

Introduction, Table of Contents

Results

Banknote data

Mushrooms data

Simulated data

Cars data

Setup

Tested with

Further experiments

Data