forked from bips-hb/datatrain_workshop_ml
-
Notifications
You must be signed in to change notification settings - Fork 0
/
03-tuning-svm.Rmd
326 lines (235 loc) · 11.7 KB
/
03-tuning-svm.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
---
title: "03: Tuning & SVMs"
date: "`r Sys.time()`"
output:
html_notebook:
toc: yes
theme: flatly
number_sections: yes
editor_options:
chunk_output_type: console
---
```{r setup}
library(mlr3verse) # All the mlr3 things
library(ggplot2) # For plotting
# Spam Task setup
spam_task <- tsk("spam")
set.seed(26)
# train/test split
spam_split <- partition(spam_task, ratio = 2/3)
```
Goals of this part:
1. Introduce hyperparameter tuning
2. Experiment with tuning different learners
3. Introduce SVMs
4. Tune an SVM with a more complex setup
# Hyperparameter Tuning
So far we've seen four learners:
1. kNN via `{kknn}`
2. Decision Trees via `{rpart}`
3. Random Forest via `{ranger}`
4. Gradient Boosting via `{xgboost}`
We've gotten to know the first two a little, and now we'll also take a closer look at the second two.
First we'll start doing some tuning with `{mlr3}` based on the **kNN** learner because it's nice and simple.
We saw that `k` is an important parameter, and it's an integer greater than 1 at least. To tune it, we also have to make a few other decisions, also using what we learned about (nested) resampling.
- What's our resampling strategy?
- What measure to we tune on?
- What does the parameter search space look like?
- How long to we tune? What's our *budget*?
- What's our tuning strategy?
We'll use `{mlr3}`'s [`auto_tuner`](https://mlr3book.mlr-org.com/chapters/chapter4/hyperparameter_optimization.html) for this because it's just so convenient:
```{r knn-tuning-setup}
# Defining a search space: k is an integer, we look in range 3 to 51
search_space_knn = ps(
k = p_int(lower = 3, upper = 51)
)
tuned_knn <- auto_tuner(
# The base learner we want to tune, optionally setting other parameters
learner = lrn("classif.kknn", predict_type = "prob"),
# Resampling strategy for the tuning (inner resampling)
resampling = rsmp("cv", folds = 3),
# Tuning measure: Maximize the classification accuracy
measure = msr("classif.bbrier"),
# Setting the search space we defined above
search_space = search_space_knn,
# Budget: Try n_evals different values
terminator = trm("evals", n_evals = 20),
# Strategy: Randomly try parameter values in the space
# (not ideal in this case but in general a good place to start)
tuner = tnr("random_search")
)
# Take a look at the new tuning learner
tuned_knn
```
Now `tuned_knn` behaves the same way as any other Learner that has not been trained on any data yet --- first we have to train (and tune!) it on our spam training data. The result will be the best hyperparameter configuration of those we tried:
```{r tune-knn}
# If you're on your own machine with multiple cores available, you can parallelize:
future::plan("multisession")
# Setting a seed so we get the same results -- train on train set only!
set.seed(2398)
tuned_knn$train(spam_task, row_ids = spam_split$train)
```
In this case, we get a `k` of 9, with an accuracy of about 90%.
(Side task: Try the same but with the AUC measure --- do you get the same `k`?)
We can visualize the performance across all values of `k` we tried by accessing the tuning instance now included in the `tuned_knn` learner object:
```{r plot-knn-tuning-result}
autoplot(tuned_knn$tuning_instance)
tuned_knn$tuning_instance
```
(See also docs at `?mlr3viz:::autoplot.TuningInstanceSingleCrit`)
And we can get the hyperparameter results that worked best in the end:
```{r knn-tuning-param-results}
tuned_knn$tuning_result
```
Now that we've tuned on the training set, it's time to evaluate on the test set:
```{r knn-eval}
tuned_knn_pred <- tuned_knn$predict(spam_task, row_ids = spam_split$test)
# Accuracy and AUC
tuned_knn_pred$score(msrs(c("classif.acc", "classif.auc", "classif.bbrier")))
```
That was basically the manual way of doing nested resampling with an inner resampling strategy (CV) and an outer resampling strategy of holdout (the `spam_train` and `spam_test` sets).
To be more robust we can use CV in the outer resampling as well:
```{r knn-nested-resampling}
# Reset the learner so we start fresh
tuned_knn$reset()
# Set up resampling with the ready-to-be-tuned learner and outer resampling: CV
knn_nested_tune <- resample(
task = spam_task,
learner = tuned_knn,
resampling = rsmp("cv", folds = 5), # this is effectively the outer resampling
store_models = TRUE
)
# Extract inner tuning results, since we now tuned in multiple CV folds
# Folds might conclude different optimal k
# AUC is averaged over inner folds
extract_inner_tuning_results(knn_nested_tune)
# Above combined individual results also accessible via e.g.
knn_nested_tune$learners[[2]]$tuning_result
# Plot of inner folds
autoplot(knn_nested_tune$learners[[1]]$tuning_instance)
autoplot(knn_nested_tune$learners[[2]]$tuning_instance)
autoplot(knn_nested_tune$learners[[3]]$tuning_instance)
# Get AUC and acc for outer folds
knn_nested_tune$score(msrs(c("classif.acc", "classif.auc")))[, .(iteration, classif.auc, classif.acc)]
# Average over outer folds - our "final result" performance for kNN
# AUC is averaged over outer folds
knn_nested_tune$aggregate(msrs(c("classif.acc", "classif.auc")))
```
Seems like a decent result?
Let's try to beat it with some other learner!
## Your Turn!
Above you have a boilerplate to tune your own learner.
Start with either of the other three learners we've seen, pick one ore two hyperparameters to tune with a reasonable budget (note we have limited time and resources), tune on the training set and evaluate per AUC on the test set.
Some pointers:
- Consult the Learner docs to see tuning-worthy parameters:
- `lrn("classif.xgboost")$help()` links to the `xgboost` help
- `lrn("classif.rpart")$help()` analogously for the decision tree
- You can also see the documentation online, e.g. <https://mlr3learners.mlr-org.com/reference/mlr_learners_classif.xgboost.html>
- Parameter search spaces in `ps()` have different types, see the help at `?paradox::Domain`
- Use `p_int()` for integers, `p_dbl()` for real-valued params etc.
- If you don't know which parameter to tune, try the following:
- `classif.xgboost`:
- Important: `nrounds` (integer) (>= 1)
- Important: `eta` (double) (0 < eta < 1)
- Maybe: `max_depth` (integer) (< 10)
- `classif.rpart`:
- `cp` (double)
- Maybe: `maxdepth` (integer) (< 30)
- `classif.ranger`:
- `num.trees` (integer) (default is 500)
- `max.depth` (integer) (ranger has no limit here aside from the data itself :)
Note: Instead of randomly picking parameters from the design space, we can also generate a grid of parameters and try those.
We'll not try that here for now, but you can read up on how to do that here: `?mlr_tuners_grid_search`.
-> `generate_design_grid(search_space, resolution = 5)`
Also note that the cool thing about the `auto_tuner()` is that it behaves just like any other `mlr3` learner, but it automatically tunes itself. You can plug it into `resample()` or `benchmark_grid()` just fine!
# Support Vector Machines
Let's circle back to new learners and explore SVMs a little by trying out different kernels at the example of our penguin dataset we used in the beginning:
```{r penguins}
penguins <- na.omit(palmerpenguins::penguins)
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
geom_point() +
labs(
title = "Palmer Penguins",
x = "Flipper Length [mm]", y = "Body Mass [g]",
color = "Species"
) +
theme_minimal()
```
Since we don't care about prediction accuracy for now, we'll use the whole dataset for training and prediction. Please only do this with toy data 🙃.
For the SVM algorithm itself, we use the `svm` learner from the `e1071` package (great name, I know) but once again use `{mlr3}`'s interface
According to the docs (`?e1071::svm`) we have the choice of the following kernels:
- `"linear"`: $u'v$
- `"polynomial"`: $(\mathtt{gamma} \cdot u' \cdot v + \mathtt{coef0})^\mathtt{degree}$
- `"radial"`: $\exp(-\mathtt{gamma} \cdot |u-v|^2)$
- `"sigmoid"`: $\tanh(\mathtt{gamma} \cdot u'v + \mathtt{coef0})$
Where `gamma`, `degree`, and `coef0` are further hyperparameters.
```{r svm-learner-default}
svm_learner <- lrn("classif.svm")
# What parameters do we have?
svm_learner$param_set$ids()
# Default kernel
svm_learner$param_set$default$kernel
```
## Your Turn!
Below you have a boilerplate for
a) Creating an SVM learner and train it on the penguin dataset with 2 predictors
b) Plotting decision boundaries with it (using the `{mlr3}` helper function)
Run the code below once to see what linear decision boundaries look like, then pick different kernels from the list above and run it again.
- What kernel would you pick just by the looks of the boundaries?
- How do the boundaries change if you also adjust the other hyperparameters?
- Try picking any other two variables as features (`penguin_task$col_info`)
```{r penguin-task-2-predictors}
penguin_task <- TaskClassif$new(
id = "penguins",
backend = na.omit(palmerpenguins::penguins),
target = "species"
)
penguin_task$col_roles$feature <- c("body_mass_g", "flipper_length_mm")
```
```{r svm-decision-boundaries}
# Create the learner, picking a kernel and/or other hyperparams
svm_learner <- lrn("classif.svm", predict_type = "prob", kernel = "polynomial", degree = 3)
svm_learner <- lrn("classif.svm", predict_type = "prob", kernel = "linear")
# Train the learner
svm_learner$train(penguin_task)
# Plot decision boundaries
plot_learner_prediction(
learner = svm_learner,
task = penguin_task
)
```
## SVM-Tuning
Let's try a more complex tuning experiment, based on the spam task from before.
We'll create a new SVM learner object and this time explicitly tell it which classification to do --- that's the default value anyway, but `{mlr3}` wants us to be explicit here for tuning:
```{r svm-learner}
svm_learner <- lrn("classif.svm", predict_type = "prob", type = "C-classification")
```
First up we'll define our search space, meaning the range of parameters we want to test out. Since `kernel` is a categorical parameter (i.e. no numbers, just names of kernels), we'll define the search space for that parameter by just passing the names of the kernels to the `p_fct()` helper function that defines `factor`-parameters in `{mlr3}`.
The interesting thing here is that some parameters are only relevant for some kernels, which we can declare via a `depends` argument:
```{r svm-search-space-short}
search_space_svm = ps(
kernel = p_fct(c("linear", "polynomial", "radial", "sigmoid")),
# Degree is onoly relevant if "kernel" is "polynomial"
degree = p_int(lower = 1, upper = 7, depends = kernel == "polynomial")
)
```
We can create an example design grid to inspect our setup and see that `degree` is `NA` for cases where `kernel` is not `"polynomial"`, just as we expected
```{r svm-search-space-inspect}
generate_design_grid(search_space_svm, resolution = 3)
```
## Your Turn!
The above should get you started to...
1. Create a `search_space_svm` like above, tuning...
- `cost` from 0.1 to 1 (hint: `trafo = function(x) 10^x`)
- `kernel`, (like above example)
- `degree`, as above, **only if** `kernel == "polynomial"`
- `gamma`, from e.g. 0.01 to 0.2, **only if** `kernel` is polynomial, radial, sigmoid (hint: you can't use `kernel != "linear"` unfortunately, but `kernel %in% c(...)`) works
2. Use the `auto_tuner` function as previously seen with
- `svm_learner` (see above)
- A resampling strategy (use `"holdout"` if runtime is an issue)
- A measure (e.g. `classif.acc` or `classif.auc`)
- The search space you created in 1.
- A termination criterion (e.g. 40 evaluations)
- Random search as your tuning strategy
3. Train the auto-tuned learner and evaluate on the test set
What parameter settings worked best?