Skip to content

A python fast feature selection for supervised learning

Notifications You must be signed in to change notification settings

ivangarcia88/ffselection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fast supervised feature selection framework

Datasets with high dimensionality represent a challenge to existing learning methods. The presence of irrelevant and redundant features in a dataset can degrade the performance of the models inferred from it. In large datasets, manual management of features tends to be impractical. This frameworks allows to remove redundant and irrelevant features in supervised datasets.

Compilation

To install requirements and compile in a debian based platform execute the script "setup.sh"

$ ./setup.sh

To install in other linux distribution install the following packages:

  • g++
  • libboost-python-dev
  • python-dev
  • python-numpy
  • python-pandas
  • python-sklearn
  • python-matplotlib

Then execute in the terminal:

$ make
$ make wrapper

MICTools

MICTools allows to identify correlation between variables in a dataset. It can be used independently and compiled as a standalone application.

MICSelect

MICSelect perform the feature selection, it requires MICTools.

How to reproduce the results in the paper

Experiments were executed on ubuntu 18.04 using python 3.6

  1. Compile mictools
  2. Create the folder "datasets-test" in the root folder of this proyect
  3. Create the folder "s10" inside the folder "datasets-test"
  4. Create the folder "x20" inside the folder "datasets-test"
  5. Download datasets https://u.pcloud.link/publink/show?code=XZT2pOkZgHaO7WBaWzVmGRmMdkdjLY39hK2V
  6. Run MICSelect with every dataset inside ("datasets-input") with the parameters (-y target -s 10)
  7. Move every output ("datasets-output") to folder "datasets-test/s10"
  8. Run MICSelect with every dataset inside ("datasets-input") with the parameters (-y target -x 20)
  9. Move every output ("datasets-output") to folder "datasets-test/x20"
  10. Runs result.py

About

A python fast feature selection for supervised learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published