Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some to to the introduction part #757

Merged
merged 10 commits into from
May 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions open-machine-learning-jupyter-book/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,5 +90,4 @@ sphinx:
- sphinxcontrib.tikz
- sphinxcontrib.blockdiag
- sphinxcontrib.drawio
- sphinxcontrib.quizdown

- sphinxcontrib.quizdown
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
},
{
"cell_type": "markdown",
"id": "28bfa3a1",
"metadata": {
"tags": [
"remove-cell"
Expand All @@ -43,42 +44,7 @@
"id": "f6ca5fc4",
"metadata": {},
"source": [
"# Bagging\n",
"\n",
"In previous sections, we explored different classification algorithms as well as techniques that can be used to properly validate and evaluate the quality of your models.\n",
"\n",
"Now, suppose that we have chosen the best possible model for a particular problem and are struggling to further improve its accuracy. In this case, we would need to apply some more advanced machine learning techniques that are collectively referred to as *ensembles*.\n",
"\n",
"An *ensemble* is a set of elements that collectively contribute to a whole. A familiar example is a musical ensemble, which blends the sounds of several musical instruments to create harmony, or architectural ensembles, which are a set of buildings designed as a unit. In ensembles, the (whole) harmonious outcome is more important than the performance of any individual part."
]
},
{
"cell_type": "markdown",
"id": "4ea30c2c",
"metadata": {},
"source": [
"## Ensembles\n",
"\n",
"[Condorcet's jury theorem](https://en.wikipedia.org/wiki/Condorcet%27s_jury_theorem) (1784) is about an ensemble in some sense. It states that, if each member of the jury makes an independent judgment and the probability of the correct decision by each juror is more than 0.5, then the probability of the correct decision by the whole jury increases with the total number of jurors and tends to one. On the other hand, if the probability of being right is less than 0.5 for each juror, then the probability of the correct decision by the whole jury decreases with the number of jurors and tends to zero. \n",
"\n",
"Let's write an analytic expression for this theorem:\n",
"\n",
"- $\\large N$ is the total number of jurors;\n",
"- $\\large m$ is a minimal number of jurors that would make a majority, that is $\\large m = floor(N/2) + 1$;\n",
"- $\\large {N \\choose i}$ is the number of $\\large i$-combinations from a set with $\\large N$ elements.\n",
"- $\\large p$ is the probability of the correct decision by a juror;\n",
"- $\\large \\mu$ is the probability of the correct decision by the whole jury.\n",
"\n",
"Then:\n",
"\n",
"$$ \\large \\mu = \\sum_{i=m}^{N}{N\\choose i}p^i(1-p)^{N-i} $$\n",
"\n",
"It can be seen that if $\\large p > 0.5$, then $\\large \\mu > p$. In addition, if $\\large N \\rightarrow \\infty $, then $\\large \\mu \\rightarrow 1$.\n",
"\n",
"Let's look at another example of ensembles: an observation known as [Wisdom of the crowd](https://en.wikipedia.org/wiki/Wisdom_of_the_crowd). <img src=\"https://habrastorage.org/webt/zg/hw/b7/zghwb7oztkmv840odqkjpink1vw.png\" align=\"right\" width=15% height=15%> In 1906, [Francis Galton](https://en.wikipedia.org/wiki/Francis_Galton) visited a country fair in Plymouth where he saw a contest being held for farmers. 800 participants tried to estimate the weight of a slaughtered bull. The real weight of the bull was 1198 pounds. Although none of the farmers could guess the exact weight of the animal, the average of their predictions was 1197 pounds.\n",
"\n",
"\n",
"A similar idea for error reduction was adopted in the field of Machine Learning."
"# Bagging"
]
},
{
Expand Down Expand Up @@ -727,7 +693,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
"version": "3.9.19"
}
},
"nbformat": 4,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"cells": [
Lola-jo marked this conversation as resolved.
Show resolved Hide resolved
jlgmp marked this conversation as resolved.
Show resolved Hide resolved
jlgmp marked this conversation as resolved.
Show resolved Hide resolved
Lola-jo marked this conversation as resolved.
Show resolved Hide resolved
Lola-jo marked this conversation as resolved.
Show resolved Hide resolved
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"id": "b3953cf0-8228-4b81-b5d9-d3249228f011",
"metadata": {
"tags": [
Expand All @@ -20,6 +20,7 @@
},
{
"cell_type": "markdown",
"id": "4eb207a7",
"metadata": {
"tags": [
"remove-cell"
Expand All @@ -46,6 +47,69 @@
"# Getting started with ensemble learning"
]
},
{
"cell_type": "markdown",
"id": "239f071d",
"metadata": {},
"source": [
"In previous sections, we explored different classification algorithms as well as techniques that can be used to properly validate and evaluate the quality of your models.\n",
"\n",
"Now, suppose that we have chosen the best possible model for a particular problem and are struggling to further improve its accuracy. In this case, we would need to apply some more advanced machine learning techniques that are collectively referred to as *ensembles*.\n",
"\n",
"An *ensemble* is a set of elements that collectively contribute to a whole. A familiar example is a musical ensemble, which blends the sounds of several musical instruments to create harmony, or architectural ensembles, which are a set of buildings designed as a unit. In ensembles, the (whole) harmonious outcome is more important than the performance of any individual part."
]
},
{
"cell_type": "markdown",
"id": "2eff740a",
"metadata": {},
"source": [
"***Ensembles***\n",
"\n",
"[Condorcet's jury theorem](https://en.wikipedia.org/wiki/Condorcet%27s_jury_theorem) (1784) is about an ensemble in some sense. It states that, if each member of the jury makes an independent judgment and the probability of the correct decision by each juror is more than 0.5, then the probability of the correct decision by the whole jury increases with the total number of jurors and tends to one. On the other hand, if the probability of being right is less than 0.5 for each juror, then the probability of the correct decision by the whole jury decreases with the number of jurors and tends to zero. \n",
"\n",
"Let's write an analytic expression for this theorem:\n",
"\n",
"- $\\large N$ is the total number of jurors;\n",
"- $\\large m$ is a minimal number of jurors that would make a majority, that is $\\large m = floor(N/2) + 1$;\n",
"- $\\large {N \\choose i}$ is the number of $\\large i$-combinations from a set with $\\large N$ elements.\n",
"- $\\large p$ is the probability of the correct decision by a juror;\n",
"- $\\large \\mu$ is the probability of the correct decision by the whole jury.\n",
"\n",
"Then:\n",
"\n",
"$$ \\large \\mu = \\sum_{i=m}^{N}{N\\choose i}p^i(1-p)^{N-i} $$\n",
"\n",
"It can be seen that if $\\large p > 0.5$, then $\\large \\mu > p$. In addition, if $\\large N \\rightarrow \\infty $, then $\\large \\mu \\rightarrow 1$.\n",
"\n",
"$~~~~~~~$... whenever we are faced with making a decision that has some important consequence, we often seek the opinions of different “experts” \n",
"\n",
"$~~~~~~$ to help us that decision ...\n",
"\n",
"\n",
"$~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~$ — Page 2, Ensemble Machine Learning, 2012.\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "5ab68564",
"metadata": {},
"source": [
"Let's look at another example of ensembles: an observation known as [Wisdom of the crowd](https://en.wikipedia.org/wiki/Wisdom_of_the_crowd). <img src=\"https://habrastorage.org/webt/zg/hw/b7/zghwb7oztkmv840odqkjpink1vw.png\" align=\"right\" width=15% height=15%> In 1906, [Francis Galton](https://en.wikipedia.org/wiki/Francis_Galton) visited a country fair in Plymouth where he saw a contest being held for farmers. 800 participants tried to estimate the weight of a slaughtered bull. The real weight of the bull was 1198 pounds. Although none of the farmers could guess the exact weight of the animal, the average of their predictions was 1197 pounds.\n",
"\n",
"\n",
"A similar idea for error reduction was also adopted in the field of Ensemble Learning which is called voting."
]
},
{
"cell_type": "markdown",
"id": "065ec775",
"metadata": {},
"source": [
"The video below will show you what is Ensemble Learning and introduce several methods of Ensemble Learning. "
]
},
{
"cell_type": "code",
"execution_count": 2,
Expand Down Expand Up @@ -89,11 +153,72 @@
},
{
"cell_type": "markdown",
"id": "41e99f6f",
"id": "286ec2c0",
"metadata": {},
"source": [
"***Intuition for classification ensembles***\n"
]
},
{
"cell_type": "markdown",
"id": "251f688c",
"metadata": {},
"source": [
"A model that learns how to classify points in effect draws lines in the feature space to\n",
"separate examples. We can sample points in the feature space in a grid and get a map of how\n",
"the model thinks the feature space should be by each class label. The separation of examples in\n",
"the feature space by the model is called the decision boundary and a plot of the grid or map of\n",
"how the model classifies points in the feature space is called a decision boundary plot. Now\n",
"consider an ensemble where each model has a different mapping of inputs to outputs. In effect,\n",
"each model has a different decision boundary or different idea of how to split up in the feature\n",
"space by class label. Each model will draw the lines differently and make different errors.\n",
"When we combine the predictions from these multiple different models, we are in effect\n",
"averaging the decision boundaries. We are defining a new decision boundary that attempts to\n",
"learn from all the different views on the feature space learned by contributing members. "
]
},
{
"cell_type": "markdown",
"id": "ababe0fd",
"metadata": {},
"source": [
"```{tableofcontents}\n",
"```"
":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/ml-advanced/ensemble-learning/getting-started-with-ensemble-learning/Example_of%20Combining_Decision_Boundaries_Using_an_Ensemble.png\n",
"---\n",
"name: ' Example of Combining Decision Boundaries Using an Ensemble'\n",
"width: 90%\n",
"---\n",
"Example of Combining Decision Boundaries Using an Ensemble. Taken from\n",
"Ensemble Machine Learning, 2012\n",
":::"
]
},
{
"cell_type": "markdown",
"id": "bb489924",
"metadata": {},
"source": [
"We can see the contributing members along the top, each with different decision boundaries\n",
"in the feature space. Then the bottom-left draws all of the decision boundaries on the same plot showing how they differ and make different errors. Finally, we can combine these boundaries to\n",
"in-effect create a new generalized decision boundary in the bottom-right that better captures\n",
"the true but unknown division of the feature space, resulting in better predictive performance.\n"
]
},
{
"cell_type": "markdown",
"id": "20481b8f",
"metadata": {},
"source": [
"We have learned what ensemble learning is, and in the next chapter, we will learn some specific algorithms for ensemble learning"
]
},
{
"cell_type": "markdown",
"id": "32240923",
"metadata": {},
"source": [
"## Acknowledgments\n",
"\n",
"Thanks to the book [Ensemble Machine Learning: Methods and Applications 2012th Edition](https://amzn.to/2C7syo5) by Cha Zhang (Editor), Yunqian Ma (Editor). They inspire the majority of the content in this chapter."
]
}
],
Expand All @@ -113,7 +238,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
"version": "3.9.19"
}
},
"nbformat": 4,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1371,7 +1371,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
"version": "3.9.19"
}
},
"nbformat": 4,
Expand Down
Loading