Skip to content

Commit

Permalink
Update reading-svm.md
Browse files Browse the repository at this point in the history
  • Loading branch information
nsanthan committed Apr 10, 2024
1 parent 28c86e7 commit 5228f04
Showing 1 changed file with 14 additions and 24 deletions.
38 changes: 14 additions & 24 deletions morea/kernels/reading-svm.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ $$
$$

so that the smallest distance between the examples and the hyperplane
is \\(\gamma(\w,b)/||\w||\\). This is called the \emph{margin} of the classifier
is \\(\gamma(\w,b)/||\w||\\). This is called the _margin_ of the classifier
\\(\w^T\x-b=0\\).

From our training data, we want to obtain that plane \\(\w^T\x-b=0\\) which
Expand All @@ -67,7 +67,7 @@ is some complicated function of \\(\w\\) and \\(b\\). Different values of
\\(\w\\) and \\(b\\) yield potentially different orientations and
intercepts of the separating hyperplane, and their margin is
determined by different examples (\ie the minimizer
in\textasciitilde{}\eqref{eq:gamma} is different). Even though we may not have
in (1) is different). Even though we may not have
\\(\gamma(\w,b)\\) in a simple form, we can still ask for


Expand All @@ -84,17 +84,16 @@ conceptual way around it. Suppose \\(\w\\) and \\(\b\\) classified all examples
such that every example, \\(\z_1\upto \z_n\\) satisfied

\begin{equation}
\label{eq:constraints}
y_i(\w^T \z_i -b) \ge \nu, \qquad 1\le i\le n.
y_i(\w^T \z_i -b) \ge \nu, \qquad 1\le i\le n.\tag*{(2)}
\end{equation}

For a given \\(\w\\) and \\(b\\), since \\(\gamma(\w,b)/||\w||\\) happens to be the
distance of the closest point to the plane \\(\w^T \x -b =0\\), we could
satisfy all \\(n\\) constraints of\textasciitilde{}\eqref{eq:constraints} above for every value of \\(\nu\\) in the range \\(0 \le\nu \le \gamma(\w,b)\\) and for no
satisfy all \\(n\\) constraints of (2) above for every value of \\(\nu\\) in the range \\(0 \le\nu \le \gamma(\w,b)\\) and for no
other.

Therefore, we ask to find the maximum number \\(\nu\\) such that all the
constraints in\textasciitilde{}\eqref{eq:constraints} are satisfied.
constraints in (2) are satisfied.
Note the shift now---we treat \\(\nu\\) as just a number (not a
function of \\(\w\\) and \\(b\\)) and see which is the largest combination
of the number \\(\nu\\), the vector \\(\w\\) and \\(b\\) that satisfies
Expand Down Expand Up @@ -138,18 +137,9 @@ maximizing \\(1/||\w||\\) is the same as minimizing \\(||\w||\\), which is in tu
the same as minimizing \\(\half ||\w||^2\\). We can therefore write our training
objective as obtaining the hyperplane \\( {\w^*}^T \x-b^*=0\\), where

$$ \w^*,b^* = \arg\min_{\w,b} \half{||\w||^2} \tag*{(2)}$$
$$ \w^*,b^* = \arg\min_{\w,b} \half{||\w||^2} \tag*{(3)}$$
subject to \\(y_i(\w^T \z_i -{b}) \ge 1 \\) for all \\(1\le i\le n.\\)

You may wonder why we transformed maximizing \\(1/||\w||\\) to minimizing
\\(\half ||\w||^2\\). The reason is that we want our objectives and
constraints to be \emph{convex} functions. We will have a little
digression here to define convex functions and sets, but practically
every large constrained optimization we can solve is convex (or we
just fake the steps of a convex optimization if we are stuck with
non-convex optimization). Often, even convex optimization does not
look that way to begin with---we need to tweak the formulation as
above to get to the correct form.

#### Lagrangian for the SVM problem
To write the Lagrangian for this problem, we rewrite each inequality
Expand All @@ -160,7 +150,7 @@ $$1-y_i(\w^T \z_i -{b}) \le 0.$$
Each inequality gets its own Lagrange multiplier \\(\lambda_i\\), so our
Lagrangian is (letting \\(\Lambda = (\lambda_1\upto \lambda_n)\\))

$$\cL(\w,b, \Lambda) = \half{||\w||^2} + \sum_{i=1}^n \lambda_i \Paren{1-y_i(\w^T \z_i -{b})}.$$
$$\cL(\w,b, \Lambda) = \half{||\w||^2} + \sum_{i=1}^n \lambda_i (1-y_i(\w^T \z_i -{b})).$$

Now consider the following problem for a specific choice of \\(\w\\) and \\(b\\),

Expand All @@ -174,7 +164,7 @@ $$1-y_i(\w^T \z_i -{b}) \le 0,$$

therefore

$$\cL(\w,b, \Lambda) = \half{||\w||^2} + \sum_{i=1}^n \lambda_i \Paren{1-y_i(\w^T \z_i -{b})} \le \half{||\w||^2}, $$
$$\cL(\w,b, \Lambda) = \half{||\w||^2} + \sum_{i=1}^n \lambda_i (1-y_i(\w^T \z_i -{b})) \le \half{||\w||^2}, $$

with equality in the second equation iff we choose \\(\lambda_i=0\\) for all \\(i\\). Therefore, for any \\(\w\\) and \\(b\\) satisfying all constraints,

Expand Down Expand Up @@ -222,16 +212,16 @@ every constraint given.

Therefore, we can pose\textasciitilde{}\eqref{eq:svmls} as follows:
\begin{equation}
\label{eq:primal}
\w^*,b^* = \arg\min_{\w,b} \max_{\Lambda \ge 0} \cL(\w,b, \Lambda).
\w^*,b^* = \arg\min_{\w,b} \max_{\Lambda \ge 0} \cL(\w,b, \Lambda).\tag*{(4)}
\end{equation}

We will call the above the \emph{primal} formulation of the
constrained optimization problem in\textasciitilde{}\eqref{eq:svmls}---where we write the
Lagrangian, and observe that the solution of\textasciitilde{}\eqref{eq:svmls} is obtained
by the minmax formulation in\textasciitilde{}\eqref{eq:primal}.
constrained optimization problem (3)---where we write the Lagrangian,
and observe that the solution of (3) is
obtained by the minmax formulation (4).


As the name ``primal'' suggests, we will also have a \emph{dual}
formulation of the optimization problem in\textasciitilde{}\eqref{eq:svmls}. But before
formulation of the optimization problem in (3). But before
we get into the dual formulation, we have a little segue into elementary
game theory.

0 comments on commit 5228f04

Please sign in to comment.