14-Model-level-exploration.Rmd

# (PART) Dataset Level {-}

```{r, echo=FALSE, warning = FALSE}
source("code_snippets/ema_init.R")
```

# Introduction to Dataset-level Exploration {#modelLevelExploration}

In Part II, we focused on instance-level explainers, which help to understand how does a model yield a prediction for a single observation (instance).

In Part III, we concentrate on dataset-level explainers, which help to understand how do the model predictions perform overall, for an entire set of observations? Assuming that the observations form a representative sample from a general population, dataset-level explainers can provide information about the quality of predictions for the population.

The following examples illustrate situations in which dataset-level explainers may be useful:

* We may want to learn which variables are "important" in the model. For instance, we may be interested in predicting the risk of heart attack by using explanatory variables that are derived from the results of medical examinations. If some of the variables do not influence predictions, we could simplify the model by removing the variables.
* We may want to understand how does a selected variable influence the model's predictions? For instance, we may be interested in predicting prices of apartments. Apartment's location is an important factor, but we may want to know which locations lead to higher prices? 
* We may want to discover whether there are any observations, for which the model yields wrong predictions. For instance, for a model predicting the probability of survival after a risky treatment, we might want to know whether there are patients for whom the model's predictions are extremely wrong. Identifying such a group of patients might point to, for instance, an incorrect form of an explanatory variable, or even a missed variable. 
* We may be interested in an overall "performance" of the model. For instance, we may want to compare two models in terms of the average accuracy of their predictions.

<!---
Model-level explainers focus on four main aspects of a model:
* Variable's importance: which explanatory variables are ''important'', and which are not?
* Variable's effect: how does a variable influence the average model's predictions? 
* Model's performance: how ''good'' is the model? Is one model ''better'' than another?
* Model's fit: which observations are misfitted by the model, where residual are the largest? 
---->

In all cases, measures capturing a particular aspect of a model's performance have to be defined. We will discuss them in subsequent chapters of this part of the book. In particular, in Chapter \@ref(modelPerformance), we discuss measures that are useful for the evaluation of the overall performance of a model. In Chapter \@ref(featureImportance), we focus on methods that allow evaluation of a variable's effect on a model's predictions. Chapter \@ref(partialDependenceProfiles) and Chapter \@ref(accumulatedLocalProfiles) focus on exploration of the effect of selected variables on predictions. Chapter \@ref(residualDiagnostic) presents an overview of the classical residual-diagnostics tools. Finally, in Chapter \@ref(summaryModelLevel), we present an overview of all dataset-level explainers introduced in the book.