In all Likelihoods

Robust Selection of Pseudo-Labeled Data

Introduction, Table of Contents

This repository contains implementation, results and experimenal scripts for reliable Pseudo-Label Selection, as introduced in the paper "In all Likelihoods: How to Reliably Select Pseudo-Labeled Data for Self-Training in Semi-Supervised Learning". More specifically,

R contains implementation of multi-model PLS, multi-lable (wieghted and unweighted) PLS and alternative PLS methods to benchmark against
benchmarking provides files for experiments (section 5), in order to reproduce results, see setup below
data contains real-world data used in experiments
experimental results and visualization thereof will be saved in plots and results
all results can be found in plots
In order to reproduce experiments, please read setup further below

Results

Banknote data

Banknote data (q = 3, subsample of size n = 160, share of unlabeled = 0.8)

Banknote data (q = 3, subsample of size n = 120, share of unlabeled = 0.8)

Banknote data (q = 3, subsample of size n = 80, share of unlabeled = 0.8)

Banknote data (q = 3, subsample of size n = 40, share of unlabeled = 0.8)

Mushrooms data

Mushrooms data (q =3, n = 120, share of unlabeled = 0.8)

Mushrooms data (q =3, n = 160, share of unlabeled = 0.8)

Mushrooms data (q =3, n = 200, share of unlabeled = 0.8)

Simulated data

Simulated data (q = 6, n = 60, share of unlabeled = 0.8)

Simulated data (q = 6, n = 100, share of unlabeled = 0.8)

Simulated data (q = 6, n = 140, share of unlabeled = 0.8)

Simulated data (q = 6, n = 160, share of unlabeled = 0.8)

Simulated data (q = 6, n = 180, share of unlabeled = 0.8)

Simulated data (q = 6, n = 200, share of unlabeled = 0.8)

Cars data

Cars data (q =3, n = 32, share of unlabeled = 0.7)

Cars data (q =3, n = 32, share of unlabeled = 0.9)

Cars data (q =3, n = 32, share of unlabeled = 0.95)

Setup

First and foremost, please install all dependencies by sourcing this file.

Then download the implementations of BPLS with PPP and concurring PLS methods and save in a folder named "R":

In order to reproduce the papers' key results (and visualizations thereof) further download these scripts and save in respective folder:

in folder analysis/
- analysis and visualization
in folder benchmarks/
- global setup of experiments
in folder benchmarks/experiments/

Eventually, download benchmarks/experiments_simulated_data.R and run from benchmarks/ (estimated runtime: 30 CPU hours)

Important: Create empty folders results and plots where experimental results will be stored automatically. In addition, you can access them as object after completion of the experiments.

Tested with

R 4.2.0
R 4.1.6
R 4.0.3

on

Linux Ubuntu 20.04
Linux Debian 10
Windows 11 Pro Build 22H2

Further experiments

Additional experimental setups can now easily be created by modifying benchmarks/experiments_simulated_data.R

Data

Find data and files to read in data in folder data.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
R		R
analyze		analyze
benchmarks		benchmarks
data		data
plots		plots
results		results
.gitignore		.gitignore
README.md		README.md
_setup_session.R		_setup_session.R
reliable-pls.Rproj		reliable-pls.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

In all Likelihoods

Robust Selection of Pseudo-Labeled Data

Introduction, Table of Contents

Results

Banknote data

Mushrooms data

Simulated data

Cars data

Setup

Tested with

Further experiments

Data

About

Releases

Packages

Languages

rodemann/reliable-pls

Folders and files

Latest commit

History

Repository files navigation

In all Likelihoods

Robust Selection of Pseudo-Labeled Data

Introduction, Table of Contents

Results

Banknote data

Mushrooms data

Simulated data

Cars data

Setup

Tested with

Further experiments

Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages