-
Notifications
You must be signed in to change notification settings - Fork 4
/
08-feature_engineering_with_recipes.Rmd
35 lines (29 loc) · 2.34 KB
/
08-feature_engineering_with_recipes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Feature engineering with recipes
**Learning objectives:**
- Define **feature engineering.**
- List **reasons** that feature engineering might be **beneficial.**
- Use the {recipes} package to **create a simple feature engineering recipe.**
- Use selectors from the {recipes} package to **apply transformations to specific types of columns.**
- List some **advantages of using a recipe** for feature engineering.
- Describe **what happens when a recipe is prepared** with `recipes::prep()`.
- Use `recipes::bake()` to **process a dataset.**
- Recognize how to use `recipes::step_unknown()`, `recipes::step_novel()`, `recipes::step_other()` to **prepare factor variables.**
- Explain how `recipes::step_dummy()` **encodes qualitative data in a numeric format.**
- Recognize techniques for dealing with large numbers of categories, such as feature hashing or encoding using the {embed} package (as described in [this talk by Alan Feder at rstudio::global(2021)](https://rstudio.com/resources/rstudioglobal-2021/categorical-embeddings-new-ways-to-simplify-complex-data/)).
- Recognize methods for **encoding ordered factors.**
- Use `recipes::step_interact()` to add **interaction terms** to a recipe.
- Understand why **some steps might only be applicable to training data.**
- Recognize the **functions from `{recipes}` and `{themis}`** that are **only applied to training data** by default.
- Recognize that `{recipes}` includes functions for **creating spline terms,** such as `step_ns()`.
- Recognize that `{recipes}` includes functions for **feature extraction,** such as `step_pca()`.
- Use `themis::step_downsample()` to **downsample** data.
- Recognize other **row-sampling steps** from the `{recipes}` package.
- Use `recipes::step_mutate()` and `recipes::step_mutate_at()` for general `{dplyr}`-like transformations.
- Recall that the `{textrecipes}` package exists for **text-specific feature-engineering steps.**
- Understand that the functions of the `{recipes}` package **use training data** for all preprocessing and feature engineering steps to prevent leakage.
- Use `{recipes}` to **prepare data for traditional modeling functions.**
- Use `tidy()` to **examine a recipe** and its steps.
- Refer to columns with **roles** other than `"predictor"` or `"outcome"`.
## Videos de las reuniones
### Cohorte 1
`r knitr::include_url("https://www.youtube.com/embed/xTabMY-H_DI")`