Skip to content

rodemann/reliable-pls

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

In all Likelihoods

Robust Selection of Pseudo-Labeled Data

Introduction, Table of Contents

This repository contains implementation, results and experimenal scripts for reliable Pseudo-Label Selection, as introduced in the paper "In all Likelihoods: How to Reliably Select Pseudo-Labeled Data for Self-Training in Semi-Supervised Learning". More specifically,

  • R contains implementation of multi-model PLS, multi-lable (wieghted and unweighted) PLS and alternative PLS methods to benchmark against
  • benchmarking provides files for experiments (section 5), in order to reproduce results, see setup below
  • data contains real-world data used in experiments
  • experimental results and visualization thereof will be saved in plots and results
  • all results can be found in plots
  • In order to reproduce experiments, please read setup further below

Results

Banknote data

Banknote data (q = 3, subsample of size n = 160, share of unlabeled = 0.8)

Banknote data (q = 3, subsample of size n = 120, share of unlabeled = 0.8)

Banknote data (q = 3, subsample of size n = 80, share of unlabeled = 0.8)

Banknote data (q = 3, subsample of size n = 40, share of unlabeled = 0.8)

Mushrooms data

Mushrooms data (q =3, n = 120, share of unlabeled = 0.8)

Mushrooms data (q =3, n = 160, share of unlabeled = 0.8)

Mushrooms data (q =3, n = 200, share of unlabeled = 0.8)

Simulated data

Simulated data (q = 6, n = 60, share of unlabeled = 0.8)

Simulated data (q = 6, n = 100, share of unlabeled = 0.8)

Simulated data (q = 6, n = 140, share of unlabeled = 0.8)

Simulated data (q = 6, n = 160, share of unlabeled = 0.8)

Simulated data (q = 6, n = 180, share of unlabeled = 0.8)

Simulated data (q = 6, n = 200, share of unlabeled = 0.8)

Cars data

Cars data (q =3, n = 32, share of unlabeled = 0.7)

Cars data (q =3, n = 32, share of unlabeled = 0.9)

Cars data (q =3, n = 32, share of unlabeled = 0.95)

Setup

First and foremost, please install all dependencies by sourcing this file.

Then download the implementations of BPLS with PPP and concurring PLS methods and save in a folder named "R":

In order to reproduce the papers' key results (and visualizations thereof) further download these scripts and save in respective folder:

Eventually, download benchmarks/experiments_simulated_data.R and run from benchmarks/ (estimated runtime: 30 CPU hours)

Important: Create empty folders results and plots where experimental results will be stored automatically. In addition, you can access them as object after completion of the experiments.

Tested with

  • R 4.2.0
  • R 4.1.6
  • R 4.0.3

on

  • Linux Ubuntu 20.04
  • Linux Debian 10
  • Windows 11 Pro Build 22H2

Further experiments

Additional experimental setups can now easily be created by modifying benchmarks/experiments_simulated_data.R

Data

Find data and files to read in data in folder data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages