Skip to content
/ TINTO Public

TINTO: Software to convert Tidy Data into Image for Classification with 2-Dimensional Convolutional Neural Networks

License

Notifications You must be signed in to change notification settings

oeg-upm/TINTO

Repository files navigation

License DOI Python Version Open In Colab

TINTO: Converting Tidy Data into Image for Classification with 2-Dimensional Convolutional Neural Networks

TINTO Logo

TINTO is an engine that constructs Synthetic Images from Tidy Data (also knows as Tabular Data).

Citing TINTO: If you used TINTO in your work, please cite the SoftwareX:

@article{softwarex_TINTO,
    title = {TINTO: Converting Tidy Data into Image for Classification with 2-Dimensional Convolutional Neural Networks},
    journal = {SoftwareX},
    author = {Manuel Castillo-Cara and Reewos Talla-Chumpitaz and Raúl García-Castro and Luis Orozco-Barbosa},
    volume={22},
    pages={101391},
    year = {2023},
    issn = {2352-7110},
    doi = {https://doi.org/10.1016/j.softx.2023.101391}
}

And use-case developed in INFFUS Paper

@article{inffus_TINTO,
    title = {A novel deep learning approach using blurring image techniques for Bluetooth-based indoor localisation},
    journal = {Information Fusion},
    author = {Reewos Talla-Chumpitaz and Manuel Castillo-Cara and Luis Orozco-Barbosa and Raúl García-Castro},
    volume = {91},
    pages = {173-186},
    year = {2023},
    issn = {1566-2535},
    doi = {https://doi.org/10.1016/j.inffus.2022.10.011}
}

Description

The growing interest in the use of algorithms-based machine learning for predictive tasks has generated a large and diverse development of algorithms. However, it is widely known that not all of these algorithms are adapted to efficient solutions in certain tidy data format datasets. For this reason, novel techniques are currently being developed to convert tidy data into images with the aim of using Convolutional Neural Networks (CNNs). TINTO offers the opportunity to convert tidy data into images through the representation of characteristic pixels by implementing two dimensional reduction algorithms: PCA and t-SNE. Our proposal also includes a blurring technique, which adds more ordered information to the image and can improve the classification task in CNNs.

Logo

Main Features

  • Supports all CSV data in Tidy Data format.
  • For now, the algorithm converts tabular data for binary and multi-class classification problems into machine learning.
  • Input data formats:
    • Tabular files: The input data must be in CSV, taking into account the Tidy Data format.
    • Tidy Data: The target (variable to be predicted) should be set as the last column of the dataset. Therefore, the first columns will be the features.
    • All data must be in numerical form. TINTO does not accept data in string or any other non-numeric format.
  • Two dimensionality reduction algorithms are used in image creation, PCA and t-SNE from the Scikit-learn Python library.
  • The synthetic images to be created will be in black and white, i.e. in 1 channel.
  • The synthetic image dimensions can be set as a parameter when creating them.
  • The synthetic images can be created using characteristic pixels or blurring painting technique (expressing an overlap of pixels as the maximum or average).
  • Runs on Linux, Windows and macOS systems.
  • Compatible with Python 3.7 or higher.

Video Documentation

TINTO-short-withSound.mp4

Getting Started

TINTO is easy to use in terminal:

Fist, it is important to install all previus libraries

    pip install -r requirements.txt

To run the engine via command line and see all the arguments you just need to execute the following:

    python tinto.py -h

TINTO Logo

The default parameter are the following:

  • Dimensional Reduction Algorithm (-alg): Select the dimensionality reduction algorithm to be used for image creation. The PCA** or t-SNE algorithms can be chosen. By default, use the PCA** algorithm.
  • Image size (-px): 20x20 pixels
  • Blurring (-B): for default is False, i.e., it do not use Blurring technique and create de images with characteristic pixels
  • Amplification (-aB): Only if Blurring is True. It is the blurring amplification and for default is PI number, i.e., 3.141592653589793 aprox.
  • Blurring distance (-dB): Only if Blurring is True. It is Blurring distance and for default is 0.1 (10%).
  • Blurring steps (-sB): Only if Blurring is True. It is Blurring steps and for default is 4, i.e., expand 4 pixels the blurring.
  • Blurring option (-oB): Only if Blurring is True. It is the Blurring option and for default is mean, i.e., if two pixels are overlaping, calculate the average number of this two overlaping pixels.
  • Save Configuration (-sC): Save the configurarion in a pikle object. It is False for default.
  • Load Configuration (-lC): Load the configurarion in a pikle object. It is False for default.
  • Seed (-sd): Set a seed for the random numbers. It is 20 for default.
  • _t_SNE times replication (-tt): It is only used when t-SNE is used. It is t-SNE times replication and for defaultd is 4.
  • Verbose (-v). Show in terminal the execution. For default is False.

Previous considerations

Please note that the following considerations must be taken into account before running the script:

  • Data must be in CSV with the default separator, i.e., commas.
  • Only create images when we have data for a binary or multi-class classification problem.
  • The last column should be the targer (variable to predict).
  • The first columns will be the characteristics.
  • All variables must be in numerical format.
  • The script takes by default the first row as the name of each feature, therefore, the different features must be named.
  • Each sample (row) of the dataset will correspond to an image.

For example, the following table shows a classic example of the IRIS CSV dataset as it should look like for the run:

sepal length sepal width petal length petal width target
4.9 3.0 1.4 0.2 1
7.0 3.2 4.7 1.4 2
6.3 3.3 6.0 2.5 3

Simple example without Blurring

The following example shows how to create 20x20 images with characteristic pixels, i.e. without blurring.

    python tinto.py "iris.csv" "iris_images"

The images are created with the following considerations regarding the parameters used:

  • python: to launch the Python script
  • tinto.py: the name of the script
  • iris.csv: the dataset to use. In this example, the IRIS dataset is used.
  • iris/: the folder where the images will be saved.

Also, as no other parameters are indicated, you will choose the following parameters which are set by default:

  • Image size: 20x20 pixels
  • Blurring: No blurring will be used.
  • Seed: with the seed set to 20.

Within the folder named "iris/" we can find subfolders with numbers where each number corresponds to the target used. For example, for the dataset iris.csv we will have three subfolders named "1/", "2/" and "3/". The following Figure shows an image created according to the example seen.

TINTO characteristic pixel

More specific example

The following example shows how to create with blurring with a more especific parameters.

    python tinto.py "iris.csv" "iris_images_tSNE" -B -alg t-SNE -oB maximum -px 30 -sB 5

The images are created with the following considerations regarding the parameters used:

  • Blurring (-B): Create the images with blurring technique.
  • Dimensional Reduction Algorithm (-alg): t-SNE is used.
  • Blurring option (-oB): Create de images with maximum value of overlaping pixel
  • Image size (-px): 30x30 pixels
  • Blurring steps (-sB): Expand 5 pixels the blurring.

TINTO blurring

How to use in CNN

Once the images have been created by TINTO, they can be imported into any project using CNNs.

In order to facilitate their use, a Jupyter Notebook has been created in which you can see how the images are read and how they can be used as input in a CNN.

Click here to TINTO crash course in Google Colab

License

TINTO is available under the Apache License 2.0.

Authors

Ontology Engineering Group, Universidad Politécnica de Madrid.

Contributors

See the full list of contributors here.

Ontology Engineering Group Universidad Politécnica de Madrid Universidad Nacional de Educación a Distancia Universidad de Castilla-La Mancha