Classification using Decision Tree From Scratch
Classifying letter recognition data based on ID3 Algorithm using Decision Tree and got the confusion matrix.
The tree is trained based on Information Gain (IG) criterion.
The tree is trained based on Gini Index criterion.
The two attributes with the most IG are swapped and the tree is trained.
Using Random Forest, clustered the attributes into K folds, and trained K trees and found the K with the best accuracy.
Implemented by two approaches, one using cells(memory consuming), and the other, using nested queries. Details in the code!
The Dataset is a mat file, which can be easily read using loadmat command in MATLAB. It includes 4000 test, and 16000 training handwritten black-white alphabet letters, totally in 26 classes. Each instance has 16 features, such as number of different pixels, mean and variance of the balck pixels in different ways and etc.
code/initial_tree.m
: Execute this file to train the Decision Tree based on IG.code/initial_tree_GINI.m
: Execute this file to train the Decision Tree based on Gini index.code/tree_changed_atts.m
: For the 3rd partcode/random_forest.m
: Contains the 4th part.