diff --git a/open-machine-learning-jupyter-book/_toc.yml b/open-machine-learning-jupyter-book/_toc.yml index d00995ef7..c8ffe4fff 100644 --- a/open-machine-learning-jupyter-book/_toc.yml +++ b/open-machine-learning-jupyter-book/_toc.yml @@ -80,6 +80,7 @@ parts: sections: - file: ml-advanced/ensemble-learning/bagging - file: ml-advanced/ensemble-learning/random-forest + - file: ml-advanced\ensemble-learning/stacking.ipynb - file: ml-advanced/ensemble-learning/feature-importance - file: ml-advanced/gradient-boosting/introduction-to-gradient-boosting.ipynb sections: diff --git a/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/bagging.ipynb b/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/bagging.ipynb index bf2ddbdc6..d5c90276e 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/bagging.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/bagging.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "id": "8aec88d2-8c79-452c-927b-17253668a176", "metadata": { "tags": [ @@ -20,6 +20,7 @@ }, { "cell_type": "markdown", + "id": "28bfa3a1", "metadata": { "tags": [ "remove-cell" @@ -43,42 +44,7 @@ "id": "f6ca5fc4", "metadata": {}, "source": [ - "# Bagging\n", - "\n", - "In previous sections, we explored different classification algorithms as well as techniques that can be used to properly validate and evaluate the quality of your models.\n", - "\n", - "Now, suppose that we have chosen the best possible model for a particular problem and are struggling to further improve its accuracy. In this case, we would need to apply some more advanced machine learning techniques that are collectively referred to as *ensembles*.\n", - "\n", - "An *ensemble* is a set of elements that collectively contribute to a whole. A familiar example is a musical ensemble, which blends the sounds of several musical instruments to create harmony, or architectural ensembles, which are a set of buildings designed as a unit. In ensembles, the (whole) harmonious outcome is more important than the performance of any individual part." - ] - }, - { - "cell_type": "markdown", - "id": "4ea30c2c", - "metadata": {}, - "source": [ - "## Ensembles\n", - "\n", - "[Condorcet's jury theorem](https://en.wikipedia.org/wiki/Condorcet%27s_jury_theorem) (1784) is about an ensemble in some sense. It states that, if each member of the jury makes an independent judgment and the probability of the correct decision by each juror is more than 0.5, then the probability of the correct decision by the whole jury increases with the total number of jurors and tends to one. On the other hand, if the probability of being right is less than 0.5 for each juror, then the probability of the correct decision by the whole jury decreases with the number of jurors and tends to zero. \n", - "\n", - "Let's write an analytic expression for this theorem:\n", - "\n", - "- $\\large N$ is the total number of jurors;\n", - "- $\\large m$ is a minimal number of jurors that would make a majority, that is $\\large m = floor(N/2) + 1$;\n", - "- $\\large {N \\choose i}$ is the number of $\\large i$-combinations from a set with $\\large N$ elements.\n", - "- $\\large p$ is the probability of the correct decision by a juror;\n", - "- $\\large \\mu$ is the probability of the correct decision by the whole jury.\n", - "\n", - "Then:\n", - "\n", - "$$ \\large \\mu = \\sum_{i=m}^{N}{N\\choose i}p^i(1-p)^{N-i} $$\n", - "\n", - "It can be seen that if $\\large p > 0.5$, then $\\large \\mu > p$. In addition, if $\\large N \\rightarrow \\infty $, then $\\large \\mu \\rightarrow 1$.\n", - "\n", - "Let's look at another example of ensembles: an observation known as [Wisdom of the crowd](https://en.wikipedia.org/wiki/Wisdom_of_the_crowd). In 1906, [Francis Galton](https://en.wikipedia.org/wiki/Francis_Galton) visited a country fair in Plymouth where he saw a contest being held for farmers. 800 participants tried to estimate the weight of a slaughtered bull. The real weight of the bull was 1198 pounds. Although none of the farmers could guess the exact weight of the animal, the average of their predictions was 1197 pounds.\n", - "\n", - "\n", - "A similar idea for error reduction was adopted in the field of Machine Learning." + "# Bagging" ] }, { @@ -400,7 +366,7 @@ { "data": { "text/plain": [ - "" + "" ] }, "execution_count": 5, @@ -727,7 +693,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.9.18" + "version": "3.9.19" } }, "nbformat": 4, diff --git a/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/getting-started-with-ensemble-learning.ipynb b/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/getting-started-with-ensemble-learning.ipynb index 6c0b29bca..3943217cc 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/getting-started-with-ensemble-learning.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/getting-started-with-ensemble-learning.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "id": "b3953cf0-8228-4b81-b5d9-d3249228f011", "metadata": { "tags": [ @@ -20,6 +20,7 @@ }, { "cell_type": "markdown", + "id": "4eb207a7", "metadata": { "tags": [ "remove-cell" @@ -46,6 +47,54 @@ "# Getting started with ensemble learning" ] }, + { + "cell_type": "markdown", + "id": "239f071d", + "metadata": {}, + "source": [ + "In previous sections, we explored different classification algorithms as well as techniques that can be used to properly validate and evaluate the quality of your models.\n", + "\n", + "Now, suppose that we have chosen the best possible model for a particular problem and are struggling to further improve its accuracy. In this case, we would need to apply some more advanced machine learning techniques that are collectively referred to as *ensembles*.\n", + "\n", + "An *ensemble* is a set of elements that collectively contribute to a whole. A familiar example is a musical ensemble, which blends the sounds of several musical instruments to create harmony, or architectural ensembles, which are a set of buildings designed as a unit. In ensembles, the (whole) harmonious outcome is more important than the performance of any individual part." + ] + }, + { + "cell_type": "markdown", + "id": "2eff740a", + "metadata": {}, + "source": [ + "## Ensembles\n", + "\n", + "[Condorcet's jury theorem](https://en.wikipedia.org/wiki/Condorcet%27s_jury_theorem) (1784) is about an ensemble in some sense. It states that, if each member of the jury makes an independent judgment and the probability of the correct decision by each juror is more than 0.5, then the probability of the correct decision by the whole jury increases with the total number of jurors and tends to one. On the other hand, if the probability of being right is less than 0.5 for each juror, then the probability of the correct decision by the whole jury decreases with the number of jurors and tends to zero. \n", + "\n", + "Let's write an analytic expression for this theorem:\n", + "\n", + "- $\\large N$ is the total number of jurors;\n", + "- $\\large m$ is a minimal number of jurors that would make a majority, that is $\\large m = floor(N/2) + 1$;\n", + "- $\\large {N \\choose i}$ is the number of $\\large i$-combinations from a set with $\\large N$ elements.\n", + "- $\\large p$ is the probability of the correct decision by a juror;\n", + "- $\\large \\mu$ is the probability of the correct decision by the whole jury.\n", + "\n", + "Then:\n", + "\n", + "$$ \\large \\mu = \\sum_{i=m}^{N}{N\\choose i}p^i(1-p)^{N-i} $$\n", + "\n", + "It can be seen that if $\\large p > 0.5$, then $\\large \\mu > p$. In addition, if $\\large N \\rightarrow \\infty $, then $\\large \\mu \\rightarrow 1$.\n", + "\n", + "$~~~~~~~$... whenever we are faced with making a decision that has some important consequence, we often seek the opinions of different “experts” \n", + "\n", + "$~~~~~~$ to help us that decision ...\n", + "\n", + "\n", + "$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$ — Page 2, Ensemble Machine Learning, 2012.\n", + "\n", + "Let's look at another example of ensembles: an observation known as [Wisdom of the crowd](https://en.wikipedia.org/wiki/Wisdom_of_the_crowd). In 1906, [Francis Galton](https://en.wikipedia.org/wiki/Francis_Galton) visited a country fair in Plymouth where he saw a contest being held for farmers. 800 participants tried to estimate the weight of a slaughtered bull. The real weight of the bull was 1198 pounds. Although none of the farmers could guess the exact weight of the animal, the average of their predictions was 1197 pounds.\n", + "\n", + "\n", + "A similar idea for error reduction was adopted in the field of Machine Learning." + ] + }, { "cell_type": "code", "execution_count": 2, diff --git a/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/stacking.ipynb b/open-machine-learning-jupyter-book/ml-advanced/ensemble-learning/stacking.ipynb new file mode 100644 index 000000000..e69de29bb