Traditional Artificial Intelligence (AI) algorithms are designed to solve specific problems. Deep blue was purpose built for playing chess. Game enemy AI are built to attack the player. Most traditional methods of AI approximate a solution for a specific problem.
Machine learning tries to learn more general concepts and work in changing dynamic contexts.
There are various ways for learning to happen.
-
Supervised Learning: The algorithm is given inputs as well as the expected output in a training set. The goal is to learn general rules that map inputs to the correct outputs for future inputs.
-
Unsupervised Learning: The algorithm is given inputs without expected outputs. The goal is to discover hidden patterns that can be used for future learning.
-
Reinforcement Learning: The algorithm works with a given input without the expected outputs, and a specific goal that it should aim to achieve. The algorithm should determine if it is closer to it's goal or not.
-
Classification: Inputs are categorised into two or more groups. Classification generally uses supervised learning.
-
Regression: Similar to classification, where the outputs are not discrete. There may be categories that were previously unknown.
-
Clustering: Inputs are categorised into groups, however, the groups are not known. Clustering generally uses unsupervised learning.
Collecting and preparing data is one of the most important part of machine learning. The attributes that describe an object are called FEATURES. The classification of that object is called a LABEL.
-
Remove redundant features. E.g. temperature in Fahrenheit, and temperature in Celsius is redundant.
-
Remove features that don't add value to classifying the object.
-
Ensure no single feature automatically determine the label.
Ensure that you have the below software installed.
Software you require
Software you require
In this Python example we're providing a list of fruit and try to classify them as apples or oranges. For simplicity, we're using the weight of the fruit and the roughness of it's skin. If it's rough we're almost sure it's an orange. If it's smooth we're almost sure it's an apple.
from sklearn import tree
#The features of fruit. [weight, rough(1) or smooth(0)]
features = [[140, 1],[130, 1],[150, 0],[170, 0]]
#The labels of the fruit in the feature list. Orange(1) or Apple(0)
labels = [0, 0, 1, 1]
#The classifier that we're using. A simple decision tree.
clf = tree.DecisionTreeClassifier()
#Training the classifier with the fruit and their labels.
clf = clf.fit(features, labels)
#Predicting a new fruit
print clf.predict([[160, 0]])
This is an expansion of the hello world example with the iris flower dataset. Here we're trying to predict the type of iris flower based on it's petals and sepals. This example also demonstrates how to seperate a dataset for training and testing data.
from sklearn.datasets import load_iris
from sklearn import tree
import numpy as np
iris = load_iris()
test_idx = [0, 50, 100]
#training data
training_target = np.delete(iris.target, test_idx)
training_data = np.delete(iris.data, test_idx, axis = 0)
#testing data
testing_target = iris.target[test_idx]
testing_data = iris.data[test_idx]
clf = tree.DecisionTreeClassifier()
clf.fit(training_data, training_target)
print testing_target
print clf.predict(testing_data)
#vis code
from sklearn.externals.six import StringIO
import pydot
dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
special_characters=True)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_svg("iris.svg")
print testing_data[1], testing_target[1]
print iris.feature_names, iris.target_names