diff --git a/deep_learning_from_scratch.ipynb b/deep_learning_from_scratch.ipynb index 088deee..0256e41 100644 --- a/deep_learning_from_scratch.ipynb +++ b/deep_learning_from_scratch.ipynb @@ -6356,7 +6356,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Hyper-parameters" + "### Hyper-parameters\n", + "Usual method for searching hyper-parameters: \n", + "* Manual Search\n", + "* Grid Search\n", + "* Random Search\n", + "* Bayesian Optimization\n", + "* Randomized Grid Search\n", + "* Genetic Algorithm\n", + "\n", + "Tools of hyper-parameters optimization: \n", + "\n", + "* Optuna\n", + "* Hyperopt\n", + "* Scikit-learn: GridSearchCV and RandomizedSearchCV\n", + "* Ray Tune" ] }, { @@ -6366,6 +6380,91 @@ "## CNN" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Overall structure\n", + "In the previous section, all neurons in adjacent layers are connected, which is called **fully-connected**, also called affine layer. \n", + "Affine based neural network: \n", + "![Affine based neural networks](./figures/dlscratch_affine.png) \n", + "In convolutional neural network (CNN), Convolution layer and Pooling layer are added to CNN. \n", + "CNN based neural network: \n", + "![CNN based neural networks](./figures/dlscratch_cnn.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Convolutional layer\n", + "* Problem in affine layer: **The fully connected layer ignores the spatial structure of the input data.** Specifically, image data usually has a three-dimensional shape (such as height, width, and number of channels), which contains important spatial information, such as the similarity between neighboring pixels, the correlation between RGB channels, and the independence between pixels far away. However, in the fully connected layer, these three-dimensional data are flattened into a one-dimensional vector input, thus losing the original spatial structure. This processing method makes the fully connected layer unable to fully utilize the spatial patterns and features in the input data.\n", + "* Convolution: \n", + "![Convolution](./figures/dlscratch_convolution.png) \n", + "the process of convolution: \n", + "![convolutionprocess](./figures/dlscratch_convolutionprocess.png) \n", + "Convolution with a bias: \n", + "![Convolution with bias](./figures/dlscratch_convolutionbias.png) \n", + "* Padding: Mainly to adjust the size of the output. \n", + "![padding](./figures/dlscratch_padding.png) \n", + "* Stride: the steps that a filter move one time. In the above convolution process, the stride is 1, in the below figure, the stride is 2. \n", + "![Stride](./figures/dlscratch_stride.png) \n", + "The relation of input size $(H,W)$, filter size $(FH,FW)$, padding $(P)$, stride $(S)$, and output size $(OH,OW)$ is as follows:\n", + "$$OH = \\frac{H+2P-FH}{S}+1\\\\\n", + " OW = \\frac{W+2P-FW}{S}+1$$\n", + "Therefore, when you design the the convolution layer, please respect this rule, otherwise it could have unexpectable results. \n", + "* 3D data convolultion sum up the output in direction of channels (e.g. RGB), pay attention, the number of input channels are the same with the number of filter channels. \n", + "![3D convolution](./figures/dlscratch_convolution3channel.png) \n", + "the process of 3D data convolution: \n", + "![convolutionprocess3d](./figures/dlscratch_convolutionprocess3channel.png) \n", + "* Feature maps \n", + "In the obove figures, the output is one **feature map**, also we can have multiple feature maps in the direction of channels. \n", + "One feature map \n", + "![featuremap](./figures/dlscratch_convolution1featuremap.png) \n", + "Multiple feature maps \n", + "![featuremaps](./figures/dlscratch_convolutionfeaturemaps.png) \n", + "Multiple feature maps with bias (using broadcast principle in numpy) \n", + "![featuremaps with bias](./figures/dlscratch_convolutionfeaturemapsbias.png) \n", + "* Batch procssing \n", + "The data is 4D (batch_num, channel, height, width) \n", + "![batch processing](./figures/dlscratch_convolutionfeaturemapsbiasbatch.png)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Pooling layer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Implemention of convolutional and pooling layer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Implemention of CNN" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualization of CNN" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Classice CNNs" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -6397,7 +6496,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.5" + "version": "3.11.6" }, "vscode": { "interpreter": { diff --git a/figures/dlscratch_affine.png b/figures/dlscratch_affine.png new file mode 100644 index 0000000..3b96a89 Binary files /dev/null and b/figures/dlscratch_affine.png differ diff --git a/figures/dlscratch_cnn.png b/figures/dlscratch_cnn.png new file mode 100644 index 0000000..a7965e0 Binary files /dev/null and b/figures/dlscratch_cnn.png differ diff --git a/figures/dlscratch_convolution.png b/figures/dlscratch_convolution.png new file mode 100644 index 0000000..db57b2a Binary files /dev/null and b/figures/dlscratch_convolution.png differ diff --git a/figures/dlscratch_convolution1featuremap.png b/figures/dlscratch_convolution1featuremap.png new file mode 100644 index 0000000..c3fdfe8 Binary files /dev/null and b/figures/dlscratch_convolution1featuremap.png differ diff --git a/figures/dlscratch_convolution3channel.png b/figures/dlscratch_convolution3channel.png new file mode 100644 index 0000000..77c123b Binary files /dev/null and b/figures/dlscratch_convolution3channel.png differ diff --git a/figures/dlscratch_convolutionbias.png b/figures/dlscratch_convolutionbias.png new file mode 100644 index 0000000..281c92f Binary files /dev/null and b/figures/dlscratch_convolutionbias.png differ diff --git a/figures/dlscratch_convolutionfeaturemaps.png b/figures/dlscratch_convolutionfeaturemaps.png new file mode 100644 index 0000000..85cfed2 Binary files /dev/null and b/figures/dlscratch_convolutionfeaturemaps.png differ diff --git a/figures/dlscratch_convolutionfeaturemapsbias.png b/figures/dlscratch_convolutionfeaturemapsbias.png new file mode 100644 index 0000000..35ad40b Binary files /dev/null and b/figures/dlscratch_convolutionfeaturemapsbias.png differ diff --git a/figures/dlscratch_convolutionfeaturemapsbiasbatch.png b/figures/dlscratch_convolutionfeaturemapsbiasbatch.png new file mode 100644 index 0000000..1020892 Binary files /dev/null and b/figures/dlscratch_convolutionfeaturemapsbiasbatch.png differ diff --git a/figures/dlscratch_convolutionprocess.png b/figures/dlscratch_convolutionprocess.png new file mode 100644 index 0000000..25171ad Binary files /dev/null and b/figures/dlscratch_convolutionprocess.png differ diff --git a/figures/dlscratch_convolutionprocess3channel.png b/figures/dlscratch_convolutionprocess3channel.png new file mode 100644 index 0000000..bcd52ed Binary files /dev/null and b/figures/dlscratch_convolutionprocess3channel.png differ diff --git a/figures/dlscratch_padding.png b/figures/dlscratch_padding.png new file mode 100644 index 0000000..2db7a04 Binary files /dev/null and b/figures/dlscratch_padding.png differ diff --git a/figures/dlscratch_stride.png b/figures/dlscratch_stride.png new file mode 100644 index 0000000..15d1b78 Binary files /dev/null and b/figures/dlscratch_stride.png differ