Fast supervised feature selection framework

Datasets with high dimensionality represent a challenge to existing learning methods. The presence of irrelevant and redundant features in a dataset can degrade the performance of the models inferred from it. In large datasets, manual management of features tends to be impractical. This frameworks allows to remove redundant and irrelevant features in supervised datasets.

Compilation

To install requirements and compile in a debian based platform execute the script "setup.sh"

$ ./setup.sh

To install in other linux distribution install the following packages:

g++
libboost-python-dev
python-dev
python-numpy
python-pandas
python-sklearn
python-matplotlib

Then execute in the terminal:

$ make
$ make wrapper

MICTools

MICTools allows to identify correlation between variables in a dataset. It can be used independently and compiled as a standalone application.

MICSelect

MICSelect perform the feature selection, it requires MICTools.

How to reproduce the results in the paper

Experiments were executed on ubuntu 18.04 using python 3.6

Compile mictools
Create the folder "datasets-test" in the root folder of this proyect
Create the folder "s10" inside the folder "datasets-test"
Create the folder "x20" inside the folder "datasets-test"
Download datasets https://u.pcloud.link/publink/show?code=XZT2pOkZgHaO7WBaWzVmGRmMdkdjLY39hK2V
Run MICSelect with every dataset inside ("datasets-input") with the parameters (-y target -s 10)
Move every output ("datasets-output") to folder "datasets-test/s10"
Run MICSelect with every dataset inside ("datasets-input") with the parameters (-y target -x 20)
Move every output ("datasets-output") to folder "datasets-test/x20"
Runs result.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Fast supervised feature selection framework

Compilation

MICTools

MICSelect

How to reproduce the results in the paper

Files

README.md

Latest commit

History

README.md

File metadata and controls

Fast supervised feature selection framework

Compilation

MICTools

MICSelect

How to reproduce the results in the paper