-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
85 lines (51 loc) · 5.15 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
title: Tools to Support Relative Importance Analysis
output: github_document
---
# domir <img src="man/figures/logo.png" align="right" height="139"/>
[![Stable version](http://www.r-pkg.org/badges/version-last-release/domir)](https://cran.r-project.org/package=domir) [![downloads](http://cranlogs.r-pkg.org/badges/grand-total/domir)](https://cran.r-project.org/package=domir)
```{r setup, include = FALSE}
devtools::load_all()
```
# Overview
The **domir** package supports determining the relative importance of inputs (i.e., independent variables, predictors, or features referred to as *names* in the package) in a user's statistical or machine learning model.
The intention of this package is to provide a flexible user interface to *Dominance Analysis*---a relatively assumption-free methodology for comparing the predictive value, usefulness, or importance associated with of model inputs/names.
Dominance analysis resolves the indeterminancy of ascribing the value returned by a predictive modeling function to inputs/names when it is not possible to do so analytically. The most common use case for the application of dominance analysis is in comparing inputs/names in terms of their contribution to a predictive model's fit statistic or metric.
# Installation
To install the most recent version of **domir** from CRAN use:
`install.packages("domir")`
**domir** is also used as the computational engine underlying the [`dominance_analysis()`](https://easystats.github.io/parameters/reference/dominance_analysis.html) function for the [**parameters**](https://easystats.github.io/parameters/) package in the [**easystats**](https://easystats.github.io/easystats/) framework/collection.
# What **domir** Does
`domir` computes three different sets of results based on a set of inputs/names and the values returned from a function like this linear regression model.
`lm(mpg ~ am + vs + cyl, data = mtcars)`
Using the variance explained $R^2$ as fit statistic as implemented by `lm`'s `summary` method as the returned value, `domir` can implement a 'classic' dominance analysis[^1] as:
[^1]: see this [vignette](https://CRAN.R-project.org/package=domir/vignettes/domir_basics.html) for a conceptual discussion of dominance analysis
```{r init_domin}
lm_wrapper <-
function(formula, data) {
lm(formula, data = data) |>
summary() |>
_[["r.squared"]]
}
domir(mpg ~ am + vs + cyl, lm_wrapper, data = mtcars)
```
`domir` requires the set of inputs/names, submitted as a `formula` or a specialized [`formula_list`](https://jluchman.github.io/domir/reference/formula_list.html) object, and a function that accepts the input/names and returns a single, numeric value.
Note the use of a wrapper function, `lm_wrapper`, that accepts a `formula` and returns the $R^2$. These 'analysis pipeline' wrapper functions are necessary for the effective use of `domir` and the ability to use them to adapt predictive models to the computational engine used by `domir` makes this package able to apply to almost any model.
`domir` by default reports on complete dominance proportions, conditional dominance values, and general dominance values.
Complete dominance proportions are the proportion of subsets of inputs/names where the name in the row obtains a bigger value than the name in the column.
Conditional dominance values are the average value associated with the name when included sequentially at each possible position in the sequence of name slots.
General dominance values are the average value associated with the name across all possible ways of including the name in the sequence of all names. These values are also equivalent to the [Shapley Value](https://en.wikipedia.org/wiki/Shapley_value) for each name.
# Comparison with Existing Relative Importance Packages
Several other relative importance packages can produce results identical to `domir` under specific circumstances. I will focus on discussing two of the most relevant comparison packages below.
The `calc.relimpo` function in the **relaimpo** package with `type = "lmg"` produces the general dominance values for `lm` as in the example below:
```{r init_relaimpo}
relaimpo::calc.relimp(mpg ~ am + vs + cyl, data = mtcars, type = "lmg")
```
**relaimpo** is for importance analysis with linear regression with variance explained $R^2$ as a fit statistic and is optimized to analyze that model-fit statistic pairing across multiple ways of submitting data (i.e., correlation matrices, fitted `lm` object, a `data.frame`).
The `dominanceAnalysis` function in **dominanceAnalysis** produces many of the same statistics as `domir` as in the example below:
```{r init_da}
dominanceanalysis::dominanceAnalysis(lm(mpg ~ am + vs + cyl, data = mtcars))
```
**dominanceAnalysis** is for the relative importance of specific model-fit statistic pairs as it is implemented using S3 methods focused on model types to implement similar to how `parameters::dominance_analysis` works but using a custom implementation not dependent on the **insight** package to parse model components and implement the methodology.
# Further Examples
Further examples of `domir`s functionality will be populated on the [**domir** wiki](https://github.com/jluchman/domir/wiki).