Skip to content

Commit

Permalink
Update reading-svm.md
Browse files Browse the repository at this point in the history
  • Loading branch information
nsanthan committed Apr 10, 2024
1 parent 8259439 commit 4c6e2b1
Showing 1 changed file with 97 additions and 6 deletions.
103 changes: 97 additions & 6 deletions morea/kernels/reading-svm.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Support Vector Machines"
published: true
morea_id: reading-svm
morea_summary: "Support Vector Machines"
morea_summary: "Primal formulation"
morea_type: reading
morea_labels:
---
Expand Down Expand Up @@ -30,8 +30,7 @@ desirable it is.

We will formulate an optimization problem for training that not only
tries to get a separating hyperplane, but also one that will ensure
that the examples are as far away from it as possible. To do this,
first note the following exercise.
that the examples are as far away from it as possible.


#### Training for maximum margin
Expand Down Expand Up @@ -97,7 +96,7 @@ Note the shift now---we treat \\(\nu\\) as just a number (not a
function of \\(\w\\) and \\(b\\)) and see which is the largest combination
of the number \\(\nu\\), the vector \\(\w\\) and \\(b\\) that satisfies

$$\w^*,b^*,\nu^* &= \arg\max_{\nu,\w,b} \frac{\nu}{||\w||}$$
$$\w^*,b^*,\nu^* = \arg\max_{\nu,\w,b} \frac{\nu}{||\w||}$$
subject to \\(y_i(\w^T \z_i -b) \ge \nu \text{ for all } 1\le i\le n.\\)

We can make one more simplification. There is no distinction between
Expand Down Expand Up @@ -125,7 +124,7 @@ and the margin of the hyperplane \\(\tilde{\w}^T
\x-\tilde{b}=0\\) is \\(1/||\tilde{\w}||\\).
So we can rewrite our training goal to be the optimization

$$\w^*,b^*,\nu^* &= \arg\max_{\nu,\tilde{\w},b} \frac{1}{||\tilde{\w}||}$$
$$\w^*,b^*,\nu^* = \arg\max_{\nu,\tilde{\w},b} \frac{1}{||\tilde{\w}||}$$
subject to \\(y_i(\tilde{\w}^T \z_i -\tilde{b}) \ge 1\\) for all \\(1\le i\le n.\\)

Clearly, the \\(\nu\\) 's are now superflous---they don't exist in either the
Expand All @@ -136,7 +135,7 @@ maximizing \\(1/||\w||\\) is the same as minimizing \\(||\w||\\), which is in tu
the same as minimizing \\(\half ||\w||^2\\). We can therefore write our training
objective as obtaining the hyperplane \\((\w^*)^T \x-b^*=0\\), where

$$ \w^*,b^* &= \arg\min_{\w,b} \half{||\w||^2} \tag*{(2)}$$
$$ \w^*,b^* = \arg\min_{\w,b} \half{||\w||^2} \tag*{(2)}$$
subject to \\(y_i(\w^T \z_i -{b}) \ge 1 \\) for all \\(1\le i\le n.\\)

You may wonder why we transformed maximizing \\(1/||\w||\\) to minimizing
Expand All @@ -148,3 +147,95 @@ just fake the steps of a convex optimization if we are stuck with
non-convex optimization). Often, even convex optimization does not
look that way to begin with---we need to tweak the formulation as
above to get to the correct form.

#### Lagrangian for the SVM problem
To write the Lagrangian for this problem, we rewrite each inequality
constraint above so that it looks like \(f_i(\w,b) \le 0\), namely

$$1-y_i(\w^T \z_i -{b}) \le 0.$$

Each inequality gets its own Lagrange multiplier \(\lambda_i\), so our Lagrangian
is (letting \(\Lambda = (\lambda_1\upto \lambda_n)\))

$$\cL(\w,b, \Lambda) =
\half{||\w||^2} + \sum_{i=1}^n \lambda_i \Paren{1-y_i(\w^T \z_i -{b})}.$$

Now consider the following problem for a specific choice of \(\w\) and \(b\),

$$\max_{\Lambda \ge 0}\cL(\w,b, \Lambda) ,$$

where \(\Lambda\ge 0\) is shorthand for \(\lambda_1\ge 0, \lambda_2\ge
0,\cdots,\lambda_n\ge 0\). Now if \(\w\) and \(b\) satisfy all constraints,
we will have for all \(1\le i\le n\) that

$$1-y_i(\w^T \z_i -{b}) \le 0,$$

therefore

$$\cL(\w,b, \Lambda)
=
\half{||\w||^2} + \sum_{i=1}^n \lambda_i \Paren{1-y_i(\w^T \z_i -{b})}\\
\le
\half{||\w||^2},
$$

with equality in the second equation iff we choose \(\lambda_i=0\) for all \(i\). Therefore, for any \(\w\) and \(b\) satisfying all constraints,

$$\max_{\Lambda \ge 0}\cL(\w,b, \Lambda) = \half ||\w||^2.$$

On the other hand if \(\w\) and \(b\) are such that there is even a single
constraint violated, that is for some \(j\),

$$1-y_j(\w^T \z_j -{b}) \ge 0.$$

Then to maximize \(\cL(\w,b, \Lambda)\), we can choose \(\lambda_j\to \infty\),
therefore

$$\lambda_j(1-y_j(\w^T \z_j -{b})) \to +\infty,$$

therefore

$$\max_{\Lambda \ge 0}\cL(\w,b, \Lambda) = \infty.$$

Putting it together

$$
\max_{\Lambda \ge 0}\cL(\w,b, \Lambda)
=
\begin{cases}
\half ||\w||^2 & \w, b \text{ satisfy all $n$ constraints}\\
\infty & \text{else.}
\end{cases}
$$

Let us call

$$g(\w,b) \ed \max_{\Lambda \ge 0}\cL(\w,b, \Lambda)$$

for convenience. Now there is definitely at least one \(\w\), \(b\) that
satisfies all constraints (since the points are linearly
separable). Therefore, the smallest value \(g(\w,b)\) can take is
definitely not infinity (so any \(\w,b\) that violates any constraint
will never minimize \(g(\w,b)\)). That means that if we look for

$$\arg \min_{\w,b} g(\w,b),$$

the solution must be \(\w^*,b^*\) from\textasciitilde{}\eqref{eq:svmls}, since we are
minimizing \(\half ||\w||^2\) but only among such \(\w,b\) that satisfy
every constraint given.

Therefore, we can pose\textasciitilde{}\eqref{eq:svmls} as follows:
\begin{equation}
\label{eq:primal}
\w^*,b^* = \arg\min_{\w,b} \max_{\Lambda \ge 0} \cL(\w,b, \Lambda).
\end{equation}

We will call the above the \emph{primal} formulation of the
constrained optimization problem in\textasciitilde{}\eqref{eq:svmls}---where we write the
Lagrangian, and observe that the solution of\textasciitilde{}\eqref{eq:svmls} is obtained
by the minmax formulation in\textasciitilde{}\eqref{eq:primal}.

As the name ``primal'' suggests, we will also have a \emph{dual}
formulation of the optimization problem in\textasciitilde{}\eqref{eq:svmls}. But before
we get into the dual formulation, we have a little segue into elementary
game theory.

0 comments on commit 4c6e2b1

Please sign in to comment.