Update model-selection.ipynb

ocademy-ai · Apr 28, 2024 · 3a4afc0 · 3a4afc0
1 parent ded42d3
commit 3a4afc0
Showing 1 changed file with 33 additions and 15 deletions.
diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb
@@ -129,7 +129,7 @@
     "|:--:|:--:|\n",
     "| Over-fitting-train-ms |  Over-fitting-test-ms |\n",
     "\n",
-    "As we can see, over-fitting model fits very well on training data, but Over-fitting model fits poorly on test data. \n",
+    "As we can see, over-fitting model fits very well on training data, but over-fitting model fits poorly on test data. \n",
     "\n",
     "**Under-fitting model**\n",
     "\n",
@@ -147,9 +147,9 @@
     "|:--:|:--:|\n",
     "| Perfect-fitting-train-ms |  Perfect-fitting-test-ms |\n",
     "\n",
-    "Perfect-fitting model fits well on training data on training data and test data!\n",
+    "Perfect-fitting model fits well on training data and test data!\n",
     "\n",
-    "When overfitting occurs, the model demonstrates high accuracy or low error on the training data but performs poorly on the testing data or new data in practical applications. In contrast, underfitting indicates that the model is unable to capture the complex relationships or patterns within the data."
+    "When over-fitting occurs, the model demonstrates high accuracy or low error on the training data but performs poorly on the testing data or new data in practical applications. In contrast, under-fitting indicates that the model is unable to capture the complex relationships or patterns within the data."
    ]
   },
   {
@@ -276,9 +276,13 @@
     "Of course, here we are just demonstrating how to output the confusion matrix to understand its meaning after obtaining these two sets of data. In the subsequent experiment, we will explain how to obtain the desired confusion matrix through code.\n",
     "\n",
     "There are four values in the matrix their meanings are as follows:\n",
+    "\n",
     "**True Positive (TP)**: The number of positive instances correctly predicted as positive by the model.\n",
+    "\n",
     "**False Negative (FN)**: The number of positive instances incorrectly predicted as negative by the model.\n",
+    "\n",
     "**False Positive (FP)**: The number of negative instances incorrectly predicted as positive by the model.\n",
+    "\n",
     "**True Negative (TN)**: The number of negative instances correctly predicted as negative by the model.\n",
     "\n",
     "As for the matrix we have above, TP is where we predicted as 1 and actually it is 1. FN is the acount that we predicted as 0 but actually it is 1. FP is predicted as 1 but actually it's 0. TN is we predicted as 0 and it's actually 0.\n",
@@ -287,19 +291,19 @@
     "\n",
     "**Accuracy**: The ratio of the number of correctly predicted samples to the total number of samples.\n",
     "\n",
-    "**Accuracy = (TP + TN) / (TP + TN + FP + FN)**\n",
+    "$$Accuracy = \\frac{TP + TN}{TP + TN + FP + FN}$$\n",
     "\n",
     "**Precision**: The proportion of true positive predictions among the predicted positive instances, measuring the prediction accuracy of the model.\n",
     "\n",
-    "**Precision = TP / (TP + FP)**\n",
+    "$$Precision = \\frac{TP}{TP + FP}$$\n",
     "\n",
     "**Recall**: The proportion of true positive predictions among the actual positive instances, measuring the model's ability to identify positives.\n",
     "\n",
-    "**Recall = TP / (TP + FN)**\n",
+    "$$Recall = \\frac{TP}{TP + FN}$$\n",
     "\n",
     "**F1 Score**: The harmonic mean of precision and recall, considering both the accuracy and the identification ability of the model.\n",
     "\n",
-    "**F1 Score = 2 * (Precision * Recall) / (Precision + Recall)**\n",
+    "$$F_1 \\text{ Score} = \\frac{2 \\cdot (Precision \\cdot Recall)}{Precision + Recall}$$\n",
     "\n",
     "When evaluating the bias of a model, we usually consider metrics such as precision, accuracy, and F1 score. A lower F1 score may indicate that the model has issues in balancing accuracy and identification ability, but it cannot be simply equated to lower bias. By considering multiple metrics and the specific requirements of the application scenario, a more comprehensive assessment of the model's performance can be achieved.\n"
    ]
@@ -368,7 +372,7 @@
     "\n",
     "Bootstrapping, also known as resampling or sampling with replacement, is a technique where each time a copy of a sample is selected from a dataset containing m samples and added to the resulting dataset. This process is repeated m times, resulting in a dataset with m samples. (Some samples may appear multiple times in the resulting dataset.) This resulting dataset is then used as the training set.\n",
     "\n",
-    "Since the sampling is conducted independently, the probability that a specific sample is never selected in m iterations of sampling is [(1-1/m)^m]. As m approaches infinity, i.e., m→∞, the limit of this probability is 1/e, where e is the base of the natural logarithm and approximately equal to 2.71828. Therefore, when m is sufficiently large, the probability that a specific sample is never selected in m iterations of sampling is close to 1/e.\n"
+    "Since the sampling is conducted independently, the probability that a specific sample is never selected in m iterations of sampling is $ [(1-\\frac{1}{m})^m] $. As m approaches infinity, $ lim_{m \\to \\infty} (1 - \\frac{1}{m})^m = \\frac{1}{e} $ the limit of this probability is $1/e$ , where e is the base of the natural logarithm and approximately equal to 2.71828. Therefore, when m is sufficiently large, the probability that a specific sample is never selected in m iterations of sampling is close to $\\frac{1}{e}$ ≈ 0.36787944117$ .\n"
    ]
   },
   {
@@ -523,17 +527,17 @@
     "\n",
     "Let's consider a target function with a regularization term, which can be represented as:\n",
     "\n",
-    "J(θ) = L(θ) + λR(θ)\n",
+    "$$J(\\theta) = L(\\theta) + \\lambda R(\\theta)$$\n",
     "\n",
-    "Here, J(θ) is the target function, θ represents the model's parameters, L(θ) is the loss function (typically the model's error on the training data), R(θ) is the regularization term, and λ is the regularization parameter.\n",
+    "Here, $J(\\theta)$ is the target function, $\\theta$ represents the model's parameters, $L(\\theta)$ is the loss function (typically the model's error on the training data), $R(\\theta)$ is the regularization term, and \\lambda is the regularization parameter.\n",
     "\n",
-    "The loss function L(θ) measures how well the model fits the training data, and our goal is to minimize it. The regularization term R(θ) constrains or penalizes the values of the model's parameters, and it controls the complexity of the model.\n",
+    "The loss function $L(\\theta)$ measures how well the model fits the training data, and our goal is to minimize it. The regularization term $R(\\theta)$ constrains or penalizes the values of the model's parameters, and it controls the complexity of the model.\n",
     "\n",
-    "The regularization parameter λ determines the weight of the regularization term in the target function. When λ approaches 0, the impact of the regularization term becomes negligible, and the model's objective is primarily to minimize the loss function. On the other hand, when λ approaches infinity, the regularization term's impact becomes significant, and the model's objective is to minimize the regularization term as much as possible, leading to parameter values tending towards zero.\n",
+    "The regularization parameter $\\lambda$ determines the weight of the regularization term in the target function. When $\\lambda$ approaches $\\theta$, the impact of the regularization term becomes negligible, and the model's objective is primarily to minimize the loss function. On the other hand, when $\\lambda$ approaches infinity, the regularization term's impact becomes significant, and the model's objective is to minimize the regularization term as much as possible, leading to parameter values tending towards zero.\n",
     "\n",
-    "There are two forms of this cost: L1 regularization (also known as Lasso regression) with the regularization term R(θ) represented as the sum of the absolute values of the parameters θ: R(θ) = ||θ||₁. L1 regularization can induce certain parameters of the model to become zero, thereby achieving feature selection and sparsity.\n",
+    "There are two forms of this cost: L1 regularization (also known as Lasso regression) with the regularization term $R(\\theta)$ represented as the sum of the absolute values of the parameters $\\theta$: $R(\\theta) = ||\\theta||_1$. L1 regularization can induce certain parameters of the model to become zero, thereby achieving feature selection and sparsity.\n",
     "\n",
-    "L2 regularization (also known as Ridge regression) with the regularization term R(θ) represented as the square root of the sum of the squares of the parameters θ: R(θ) = ||θ||₂. L2 regularization encourages the parameter values of the model to gradually approach zero but not exactly become zero, hence it does not possess the ability for feature selection.\n",
+    "L2 regularization (also known as Ridge regression) with the regularization term $R(\\theta)$ represented as the square root of the sum of the squares of the parameters $\\theta$: $R(\\theta) = ||\\theta||_2$. L2 regularization encourages the parameter values of the model to gradually approach zero but not exactly become zero, hence it does not possess the ability for feature selection.\n",
     "\n",
     "In `tf.keras`, weight regularization is added by passing weight regularizer instances to layers as keyword arguments. Let's add L2 weight regularization now.\n",
     "\n",
@@ -698,7 +702,21 @@
     "## Your turn! 🚀\n",
     "\n",
     "Machine learning model selection and dealing with overfitting and underfitting are crucial aspects of the machine learning pipeline. In this assignment, you'll have the opportunity to apply your understanding of these concepts and techniques. Please complete the following tasks:\n",
-    "[assignment](../assignments/ml-advanced/model-selection/model-selection-assignment-1)"
+    "[assignment](../assignments/ml-advanced/model-selection/model-selection-assignment-1)\n",
+    "\n",
+    "If you would like to learn more about open-source projects related to model selection.\n",
+    "\n",
+    "Here are some recommended open-source and free model selection projects on GitHub!\n",
+    "\n",
+    "[Model Zoo](https://github.com/modzo/model-zoo)\n",
+    "\n",
+    "[AutoML](https://github.com/automl/auto-sklearn)\n",
+    "\n",
+    "[ModelHub](https://github.com/modelhub-ai/modelhub)\n",
+    "\n",
+    "[Hugging Face Models](https://github.com/huggingface/models)\n",
+    "\n",
+    "These projects are open-source and provide rich documentation and example code. You can choose the appropriate model selection project based on your needs and explore them. "
    ]
   }
  ],