Skip to content

Latest commit

 

History

History
171 lines (93 loc) · 6.93 KB

README.md

File metadata and controls

171 lines (93 loc) · 6.93 KB

In all Likelihoods

Robust Selection of Pseudo-Labeled Data

Introduction, Table of Contents

This repository contains implementation, results and experimenal scripts for reliable Pseudo-Label Selection, as introduced in the paper "In all Likelihoods: How to Reliably Select Pseudo-Labeled Data for Self-Training in Semi-Supervised Learning". More specifically,

  • R contains implementation of multi-model PLS, multi-lable (wieghted and unweighted) PLS and alternative PLS methods to benchmark against
  • benchmarking provides files for experiments (section 5), in order to reproduce results, see setup below
  • data contains real-world data used in experiments
  • experimental results and visualization thereof will be saved in plots and results
  • all results can be found in plots
  • In order to reproduce experiments, please read setup further below

Results

Banknote data

Banknote data (q = 3, subsample of size n = 160, share of unlabeled = 0.8)

Banknote data (q = 3, subsample of size n = 120, share of unlabeled = 0.8)

Banknote data (q = 3, subsample of size n = 80, share of unlabeled = 0.8)

Banknote data (q = 3, subsample of size n = 40, share of unlabeled = 0.8)

Mushrooms data

Mushrooms data (q =3, n = 120, share of unlabeled = 0.8)

Mushrooms data (q =3, n = 160, share of unlabeled = 0.8)

Mushrooms data (q =3, n = 200, share of unlabeled = 0.8)

Simulated data

Simulated data (q = 6, n = 60, share of unlabeled = 0.8)

Simulated data (q = 6, n = 100, share of unlabeled = 0.8)

Simulated data (q = 6, n = 140, share of unlabeled = 0.8)

Simulated data (q = 6, n = 160, share of unlabeled = 0.8)

Simulated data (q = 6, n = 180, share of unlabeled = 0.8)

Simulated data (q = 6, n = 200, share of unlabeled = 0.8)

Cars data

Cars data (q =3, n = 32, share of unlabeled = 0.7)

Cars data (q =3, n = 32, share of unlabeled = 0.9)

Cars data (q =3, n = 32, share of unlabeled = 0.95)

Setup

First and foremost, please install all dependencies by sourcing this file.

Then download the implementations of BPLS with PPP and concurring PLS methods and save in a folder named "R":

In order to reproduce the papers' key results (and visualizations thereof) further download these scripts and save in respective folder:

Eventually, download benchmarks/experiments_simulated_data.R and run from benchmarks/ (estimated runtime: 30 CPU hours)

Important: Create empty folders results and plots where experimental results will be stored automatically. In addition, you can access them as object after completion of the experiments.

Tested with

  • R 4.2.0
  • R 4.1.6
  • R 4.0.3

on

  • Linux Ubuntu 20.04
  • Linux Debian 10
  • Windows 11 Pro Build 22H2

Further experiments

Additional experimental setups can now easily be created by modifying benchmarks/experiments_simulated_data.R

Data

Find data and files to read in data in folder data.