effects ="integrateoutRE")bm$ContrastSummary|>data.frame()#> M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label
-#> 1 0.09898374 0.09694253 0.04870074 0.1611877 NA NA 0.95 ETI <NA> <NA> AME x
+#> 1 0.09864399 0.09651684 0.04835076 0.1610664 NA NA 0.95 ETI <NA> <NA> AME x
+1 Treatment - No Treatment 3.517374 0.4765446 2.583364 4.451385
The results are close to those that we obtained with comparisons(), but the confidence interval differs slightly because of the difference between bootstrapping and the delta method.
marginaleffects also supports the tidymodels machine learning framework. When the underlying engine used by tidymodels to train the model is itself supported as a standalone package by marginaleffects, we can obtain both estimates and their standard errors:
+
+
library(tidymodels)
+
+penguins<-modeldata::penguins|>
+na.omit()|>
+select(sex, island, species, bill_length_mm)
+
+mod<-linear_reg(mode ="regression")|>
+set_engine("lm")|>
+fit(bill_length_mm~., data =penguins)
+
+avg_comparisons(mod, type ="numeric", newdata =penguins)
+
+
+ Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
+ island Dream - Biscoe -0.489 0.470 -1.04 0.299 1.7 -1.410 0.433
+ island Torgersen - Biscoe 0.103 0.488 0.21 0.833 0.3 -0.853 1.059
+ sex male - female 3.697 0.255 14.51 <0.001 156.0 3.198 4.197
+ species Chinstrap - Adelie 10.347 0.422 24.54 <0.001 439.4 9.521 11.174
+ species Gentoo - Adelie 8.546 0.410 20.83 <0.001 317.8 7.742 9.350
+
+Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
+Type: numeric
+
+
avg_predictions(mod, type ="numeric", newdata =penguins, by ="island")
When the underlying engine that tidymodels uses to fit the model is not supported by marginaleffects as a standalone model, we can also obtain correct results, but no uncertainy estimates. Here is a random forest model:
+
+
mod<-rand_forest(mode ="regression")|>
+set_engine("ranger")|>
+fit(bill_length_mm~., data =penguins)
+
+avg_comparisons(mod, newdata =penguins, type ="numeric")
+
+
+ Term Contrast Estimate
+ bill_length_mm +1 0.000
+ island Dream - Biscoe 0.244
+ island Torgersen - Biscoe -2.059
+ sex male - female 2.711
+ species Chinstrap - Adelie 5.915
+ species Gentoo - Adelie 5.975
+
+Columns: term, contrast, estimate
+Type: numeric
+
+
+
+19.1.1 Workflows
+
tidymodels “workflows” are a convenient way to train a model while applying a series of pre-processing steps to the data. marginaleffects supports workflows out of the box. First, let’s consider a simple regression task:
+
+
penguins<-modeldata::penguins|>
+na.omit()|>
+select(sex, island, species, bill_length_mm)
+
+mod<-penguins|>
+recipe(bill_length_mm~island+species+sex, data =_)|>
+step_dummy(all_nominal_predictors())|>
+workflow(spec =linear_reg(mode ="regression", engine ="glm"))|>
+fit(penguins)
+
+avg_comparisons(mod, newdata =penguins, type ="numeric")
+
+
+ Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
+ bill_length_mm +1 0.000 NA NA NA NA NA NA
+ island Dream - Biscoe -0.489 0.470 -1.04 0.299 1.7 -1.410 0.433
+ island Torgersen - Biscoe 0.103 0.488 0.21 0.833 0.3 -0.853 1.059
+ sex male - female 3.697 0.255 14.51 <0.001 156.0 3.198 4.197
+ species Chinstrap - Adelie 10.347 0.422 24.54 <0.001 439.4 9.521 11.174
+ species Gentoo - Adelie 8.546 0.410 20.83 <0.001 317.8 7.742 9.350
+
+Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
+Type: numeric
+
+
+
Now, we run a classification task instead, and plot the predicted probabilities:
mlr3 is a machine learning framework for R. It makes it possible for users to train a wide range of models, including linear models, random forests, gradient boosting machines, and neural networks.
In this example, we use the bikes dataset supplied by the fmeffects package to train a random forest model predicting the number of bikes rented per hour. We then use marginaleffects to interpret the results of the model.
data("bikes", package ="fmeffects")task<-as_task_regr(x =bikes, id ="bikes", target ="count")forest<-lrn("regr.ranger")$train(task)
As described in other vignettes, we can use the avg_comparisons() function to compute the average change in predicted outcome that is associated with a change in each feature:
As the code above makes clear, the avg_comparisons() computes the effect of a “centered” change on the outcome. If we want to compute a “Forward Marginal Effect” instead, we can call:
fmeffects::fme( model =forest, data =bikes, target ="count", feature ="temp", step.size =1)$ame
-
[1] 2.412783
+
[1] 2.245841
With marginaleffects::avg_comparisons(), we can also compute the average effect of a simultaneous change in multiple predictors, using the variables and cross arguments. In this example, we see what happens (on average) to the predicted outcome when the temp, season, and weather predictors all change together:
Estimate C: season C: temp C: weather
- -33.443 spring - fall +1 misty - clear
- -76.611 spring - fall +1 rain - clear
- -11.686 summer - fall +1 misty - clear
- -62.018 summer - fall +1 rain - clear
- -0.179 winter - fall +1 misty - clear
- -55.485 winter - fall +1 rain - clear
+ -38.49 spring - fall +1 misty - clear
+ -81.69 spring - fall +1 rain - clear
+ -14.95 summer - fall +1 misty - clear
+ -64.69 summer - fall +1 rain - clear
+ -4.33 winter - fall +1 misty - clear
+ -58.45 winter - fall +1 rain - clear
Columns: term, contrast_season, contrast_temp, contrast_weather, estimate
Type: response
-
-20 tidymodels
-
marginaleffects also supports the tidymodels machine learning framework. When the underlying engine used by tidymodels to train the model is itself supported as a standalone package by marginaleffects, we can obtain estimates of uncertainty estimates:
When the underlying engine that tidymodels uses to fit the model is not supported by marginaleffects as a standalone model, we can also obtain correct results, but no uncertainy estimates. Here is a random forest model:
-
-
forest_tidy<-rand_forest(mode ="regression")|>
-set_engine("ranger")|>
-fit(count~., data =bikes)
-avg_comparisons(forest_tidy, newdata =bikes, type ="numeric")
-
-
- Term Contrast Estimate
- count +1 0.000
- holiday False - True 13.487
- humidity +1 -24.291
- month +1 4.076
- season spring - fall -29.015
- season summer - fall -6.781
- season winter - fall 4.958
- temp +1 3.399
- weather misty - clear -7.555
- weather rain - clear -59.817
- weekday Fri - Sun 70.596
- weekday Mon - Sun 78.772
- weekday Sat - Sun 22.198
- weekday Thu - Sun 86.375
- weekday Tue - Sun 84.493
- weekday Wed - Sun 86.895
- windspeed +1 0.141
- workingday False - True -192.057
- year 1 - 0 99.677
-
-Columns: term, contrast, estimate
-Type: numeric
-
-
-
-21 Plot
+
+19.3 Plots
We can plot the results using the standard marginaleffects helpers. For example, to plot predictions, we can do:
As documented in ?plot_predictions, using condition="temp" is equivalent to creating an equally-spaced grid of temp values, and holding all other predictors at their means or modes. In other words, it is equivalent to:
Alternatively, we could plot “marginal” predictions, where replicate the full dataset once for every value of temp, and then average the predicted values over each value of the x-axis:
plot_predictions(forest, by ="temp", newdata =d)+geom_point(data =bikes, aes(x =temp, y =count), alpha =0.1)+geom_smooth(data =bikes, aes(x =temp, y =count), se =FALSE, color ="orange")+labs(x ="Temperature (Celcius)", y ="Predicted number of bikes rented per hour",
@@ -725,7 +1025,7 @@
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
The (simulated) data that we will use is stored in a R data frame called survey. We can use the nrow() function to confirm the sample size, and the datasummary_df() function from the modelsummary package to display the first few rows of data:
diff --git a/docs/articles/slopes_files/figure-html/unnamed-chunk-16-1.png b/docs/articles/slopes_files/figure-html/unnamed-chunk-16-1.png
index 337771d59..98e36d9b5 100644
Binary files a/docs/articles/slopes_files/figure-html/unnamed-chunk-16-1.png and b/docs/articles/slopes_files/figure-html/unnamed-chunk-16-1.png differ
diff --git a/docs/articles/slopes_files/figure-html/unnamed-chunk-17-1.png b/docs/articles/slopes_files/figure-html/unnamed-chunk-17-1.png
index 84ff1a863..484f5efcb 100644
Binary files a/docs/articles/slopes_files/figure-html/unnamed-chunk-17-1.png and b/docs/articles/slopes_files/figure-html/unnamed-chunk-17-1.png differ
diff --git a/docs/articles/slopes_files/figure-html/unnamed-chunk-19-1.png b/docs/articles/slopes_files/figure-html/unnamed-chunk-19-1.png
index 2a0815734..e9b073e76 100644
Binary files a/docs/articles/slopes_files/figure-html/unnamed-chunk-19-1.png and b/docs/articles/slopes_files/figure-html/unnamed-chunk-19-1.png differ
diff --git a/docs/articles/slopes_files/figure-html/unnamed-chunk-20-1.png b/docs/articles/slopes_files/figure-html/unnamed-chunk-20-1.png
index 907f58610..dfb0eb3f2 100644
Binary files a/docs/articles/slopes_files/figure-html/unnamed-chunk-20-1.png and b/docs/articles/slopes_files/figure-html/unnamed-chunk-20-1.png differ
diff --git a/docs/articles/slopes_files/figure-html/unnamed-chunk-26-1.png b/docs/articles/slopes_files/figure-html/unnamed-chunk-26-1.png
index 7446d7ae0..a98d3fc7a 100644
Binary files a/docs/articles/slopes_files/figure-html/unnamed-chunk-26-1.png and b/docs/articles/slopes_files/figure-html/unnamed-chunk-26-1.png differ
diff --git a/docs/articles/supported_models.html b/docs/articles/supported_models.html
index 0f82011cf..0ee55e78d 100644
--- a/docs/articles/supported_models.html
+++ b/docs/articles/supported_models.html
@@ -469,7 +469,7 @@
-
This table shows the list of 87 supported model types. There are three main alternative software packages to compute such slopes (1) Stata’s margins command, (2) R’s margins::margins() function, and (3) R’s emmeans::emtrends() function. The test suite hosted on Github compares the numerical equivalence of results produced by marginaleffects::slopes() to those produced by all 3 alternative software packages:
+
This table shows the list of 88 supported model types. There are three main alternative software packages to compute such slopes (1) Stata’s margins command, (2) R’s margins::margins() function, and (3) R’s emmeans::emtrends() function. The test suite hosted on Github compares the numerical equivalence of results produced by marginaleffects::slopes() to those produced by all 3 alternative software packages:
✓: a green check means that the results of at least one model are equal to a reasonable tolerance.
✖: a red cross means that the results are not identical; extra caution is warranted.
To build the full matrix, we would simply iterate through the coefficients, incrementing them one after the other. Finally, we get standard errors via:
## On average CIs around unit-level estimates are:
cmp1 <- comparisons(mod, variables = "hp")
mean(cmp1$conf.high - cmp1$conf.low)
-#> [1] 0.05572727
+#> [1] 0.05572725
## The CI of the average estimate is:
cmp2 <- comparisons(mod, variables = "hp", comparison = "differenceavg")
cmp2$conf.high - cmp2$conf.low
-#> [1] 0.04920522
+#> [1] 0.04920523
```
:::
diff --git a/docs/articles/uncertainty_files/figure-html/unnamed-chunk-6-1.png b/docs/articles/uncertainty_files/figure-html/unnamed-chunk-6-1.png
index 9d3607546..5bd8228bf 100644
Binary files a/docs/articles/uncertainty_files/figure-html/unnamed-chunk-6-1.png and b/docs/articles/uncertainty_files/figure-html/unnamed-chunk-6-1.png differ
diff --git a/docs/index.html b/docs/index.html
index 4abcdf16d..3e57372a5 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -495,7 +495,7 @@
The Marginal Effects Zoo (0.15.1)
The terminology to describe these estimands is not standardized, and varies tremendously across disciplines.
Modeling packages in R and Python produce inconsistent objects which require users to write custom (and error-prone) code to interpret statistical results.
-
The “Marginal Effects Zoo” book and the marginaleffects packages for R and Python are designed to help analysts overcome these challenges. The free online book provides a unified framework to describe and compute a wide range of estimands. The marginaleffects package implements this framework and offers a consistent interface to interpret the estimates from over 87 classes of statistical models.
+
The “Marginal Effects Zoo” book and the marginaleffects packages for R and Python are designed to help analysts overcome these challenges. The free online book provides a unified framework to describe and compute a wide range of estimands. The marginaleffects package implements this framework and offers a consistent interface to interpret the estimates from over 88 classes of statistical models.
2 What?
The marginaleffects package allows R and Python users to compute and plot three principal quantities of interest: (1) predictions, (2) comparisons (contrasts, risk ratios, odds, lift, etc.), and (3) slopes. In addition, the package includes a convenience function to compute a fourth estimand, “marginal means”, which is a special case of averaged predictions. marginaleffects can also average (or “marginalize”) unit-level (or “conditional”) estimates of all those quantities. Finally, marginaleffects can also conduct hypothesis and eequivalence tests on coefficient estimates and on any of the quantities generated by the package.
@@ -596,7 +596,7 @@
The Marginal Effects Zoo (0.15.1)
The advantages of marginaleffects include:
-Powerful: It can compute predictions, comparisons (contrasts, risk ratios, etc.), slopes, and conduct hypothesis tests for 87 different classes of models in R and Python.
+Powerful: It can compute predictions, comparisons (contrasts, risk ratios, etc.), slopes, and conduct hypothesis tests for 88 different classes of models in R and Python.
Simple: All functions share a simple and unified interface.
To cite package 'marginaleffects' in publications use:
+
Warning in citation("marginaleffects"): no date field in DESCRIPTION file of package 'marginaleffects'
- Arel-Bundock V (2023). _marginaleffects: Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis Tests_. R package version 0.15.1.9002, <https://marginaleffects.com/>.
+To cite package 'marginaleffects' in publications use:
+
+ Arel-Bundock V (2023). _marginaleffects: Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis Tests_. R package version 0.15.1.9011, <https://marginaleffects.com/>.
A BibTeX entry for LaTeX users is
@@ -630,7 +632,7 @@
The Marginal Effects Zoo (0.15.1)
Tests},
author = {Vincent Arel-Bundock},
year = {2023},
- note = {R package version 0.15.1.9002},
+ note = {R package version 0.15.1.9011},
url = {https://marginaleffects.com/},
}
diff --git a/docs/search.json b/docs/search.json
index ca038a880..a073229b4 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -4,7 +4,7 @@
"href": "index.html",
"title": "The Marginal Effects Zoo (0.15.1)",
"section": "",
- "text": "🚨🚨🚨 This November, the marginaleffects author (Vincent) will be giving an online seminar called: “Interpreting and Communicating Statistical Results with R.” If you want to become a marginaleffects expert and support the development of the package, sign up at Code Horizons! 🚨🚨🚨\n\n1 Why?\nInterpreting the parameters estimated by complex statistical models is often challenging. Many applied researchers are keen to report simple quantities that carry clear scientific meaning but, in doing so, they face three primary obstacles:\n\nIntuitive estimands—and their standard errors—are often tedious to compute.\nThe terminology to describe these estimands is not standardized, and varies tremendously across disciplines.\nModeling packages in R and Python produce inconsistent objects which require users to write custom (and error-prone) code to interpret statistical results.\n\nThe “Marginal Effects Zoo” book and the marginaleffects packages for R and Python are designed to help analysts overcome these challenges. The free online book provides a unified framework to describe and compute a wide range of estimands. The marginaleffects package implements this framework and offers a consistent interface to interpret the estimates from over 87 classes of statistical models.\n\n2 What?\nThe marginaleffects package allows R and Python users to compute and plot three principal quantities of interest: (1) predictions, (2) comparisons (contrasts, risk ratios, odds, lift, etc.), and (3) slopes. In addition, the package includes a convenience function to compute a fourth estimand, “marginal means”, which is a special case of averaged predictions. marginaleffects can also average (or “marginalize”) unit-level (or “conditional”) estimates of all those quantities. Finally, marginaleffects can also conduct hypothesis and eequivalence tests on coefficient estimates and on any of the quantities generated by the package.\nPredictions:\n\nThe outcome predicted by a fitted model on a specified scale for a given combination of values of the predictor variables, such as their observed values, their means, or factor levels. a.k.a. Fitted values, adjusted predictions. predictions(), avg_predictions(), plot_predictions().\n\nComparisons:\n\nCompare the predictions made by a model for different regressor values (e.g., college graduates vs. others): contrasts, differences, risk ratios, odds, lift, etc. comparisons(), avg_comparisons(), plot_comparisons().\n\nSlopes:\n\nPartial derivative of the regression equation with respect to a regressor of interest. a.k.a. Marginal effects, trends. slopes(), avg_slopes(), plot_slopes().\n\nMarginal Means:\n\nPredictions of a model, averaged across a “reference grid” of categorical predictors. marginalmeans().\n\nHypothesis and Equivalence Tests:\n\nHypothesis and equivalence tests can be conducted on linear or non-linear functions of model coefficients, or on any of the quantities computed by the marginaleffects packages (predictions, slopes, comparisons, marginal means, etc.). Uncertainy estimates can be obtained via the delta method (with or without robust standard errors), bootstrap, or simulation.\n\n\n\n\n\nGoal\nFunction\n\n\n\nPredictions\npredictions()\n\n\n\navg_predictions()\n\n\n\nplot_predictions()\n\n\nComparisons\ncomparisons()\n\n\n\navg_comparisons()\n\n\n\nplot_comparisons()\n\n\nSlopes\nslopes()\n\n\n\navg_slopes()\n\n\n\nplot_slopes()\n\n\nMarginal Means\nmarginal_means()\n\n\nGrids\ndatagrid()\n\n\n\ndatagridcf()\n\n\nHypothesis & Equivalence\nhypotheses()\n\n\nBayes, Bootstrap, Simulation\nposterior_draws()\n\n\n\ninferences()\n\n\n\n\n\n\n3 Benefits\nThe advantages of marginaleffects include:\n\n\nPowerful: It can compute predictions, comparisons (contrasts, risk ratios, etc.), slopes, and conduct hypothesis tests for 87 different classes of models in R and Python.\n\nSimple: All functions share a simple and unified interface.\n\nDocumented: Each function is thoroughly documented with abundant examples. The website includes 20,000+ words of vignettes and case studies.\n\nEfficient: Some operations are orders of magnitude faster than with the margins package, and the memory footprint is much smaller.\n\nThin: Few dependencies.\n\nStandards-compliant: marginaleffects follows “tidy” principles and returns objects that work with standard functions like summary(), head(), tidy(), and glance(). These objects are easy to program with and feed to other packages like modelsummary.\n\n\nValid: When possible, numerical results are checked against alternative software like Stata or other R packages. Unfortunately, it is not possible to test every model type, so users are still strongly encouraged to cross-check their results.\n\nExtensible: Adding support for new models is very easy, often requiring less than 10 lines of new code. Please submit feature requests on Github.\n\n\nActive development: Bugs are fixed promptly.\n\n4 License and Citation\nThe marginaleffects package is licensed under the GNU General Public License v3.0. The content of this website/book is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).\n\nTo cite package 'marginaleffects' in publications use:\n\n Arel-Bundock V (2023). _marginaleffects: Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis Tests_. R package version 0.15.1.9002, <https://marginaleffects.com/>.\n\nA BibTeX entry for LaTeX users is\n\n @Manual{,\n title = {marginaleffects: Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis\nTests},\n author = {Vincent Arel-Bundock},\n year = {2023},\n note = {R package version 0.15.1.9002},\n url = {https://marginaleffects.com/},\n }"
+ "text": "🚨🚨🚨 This November, the marginaleffects author (Vincent) will be giving an online seminar called: “Interpreting and Communicating Statistical Results with R.” If you want to become a marginaleffects expert and support the development of the package, sign up at Code Horizons! 🚨🚨🚨\n\n1 Why?\nInterpreting the parameters estimated by complex statistical models is often challenging. Many applied researchers are keen to report simple quantities that carry clear scientific meaning but, in doing so, they face three primary obstacles:\n\nIntuitive estimands—and their standard errors—are often tedious to compute.\nThe terminology to describe these estimands is not standardized, and varies tremendously across disciplines.\nModeling packages in R and Python produce inconsistent objects which require users to write custom (and error-prone) code to interpret statistical results.\n\nThe “Marginal Effects Zoo” book and the marginaleffects packages for R and Python are designed to help analysts overcome these challenges. The free online book provides a unified framework to describe and compute a wide range of estimands. The marginaleffects package implements this framework and offers a consistent interface to interpret the estimates from over 88 classes of statistical models.\n\n2 What?\nThe marginaleffects package allows R and Python users to compute and plot three principal quantities of interest: (1) predictions, (2) comparisons (contrasts, risk ratios, odds, lift, etc.), and (3) slopes. In addition, the package includes a convenience function to compute a fourth estimand, “marginal means”, which is a special case of averaged predictions. marginaleffects can also average (or “marginalize”) unit-level (or “conditional”) estimates of all those quantities. Finally, marginaleffects can also conduct hypothesis and eequivalence tests on coefficient estimates and on any of the quantities generated by the package.\nPredictions:\n\nThe outcome predicted by a fitted model on a specified scale for a given combination of values of the predictor variables, such as their observed values, their means, or factor levels. a.k.a. Fitted values, adjusted predictions. predictions(), avg_predictions(), plot_predictions().\n\nComparisons:\n\nCompare the predictions made by a model for different regressor values (e.g., college graduates vs. others): contrasts, differences, risk ratios, odds, lift, etc. comparisons(), avg_comparisons(), plot_comparisons().\n\nSlopes:\n\nPartial derivative of the regression equation with respect to a regressor of interest. a.k.a. Marginal effects, trends. slopes(), avg_slopes(), plot_slopes().\n\nMarginal Means:\n\nPredictions of a model, averaged across a “reference grid” of categorical predictors. marginalmeans().\n\nHypothesis and Equivalence Tests:\n\nHypothesis and equivalence tests can be conducted on linear or non-linear functions of model coefficients, or on any of the quantities computed by the marginaleffects packages (predictions, slopes, comparisons, marginal means, etc.). Uncertainy estimates can be obtained via the delta method (with or without robust standard errors), bootstrap, or simulation.\n\n\n\n\n\nGoal\nFunction\n\n\n\nPredictions\npredictions()\n\n\n\navg_predictions()\n\n\n\nplot_predictions()\n\n\nComparisons\ncomparisons()\n\n\n\navg_comparisons()\n\n\n\nplot_comparisons()\n\n\nSlopes\nslopes()\n\n\n\navg_slopes()\n\n\n\nplot_slopes()\n\n\nMarginal Means\nmarginal_means()\n\n\nGrids\ndatagrid()\n\n\n\ndatagridcf()\n\n\nHypothesis & Equivalence\nhypotheses()\n\n\nBayes, Bootstrap, Simulation\nposterior_draws()\n\n\n\ninferences()\n\n\n\n\n\n\n3 Benefits\nThe advantages of marginaleffects include:\n\n\nPowerful: It can compute predictions, comparisons (contrasts, risk ratios, etc.), slopes, and conduct hypothesis tests for 88 different classes of models in R and Python.\n\nSimple: All functions share a simple and unified interface.\n\nDocumented: Each function is thoroughly documented with abundant examples. The website includes 20,000+ words of vignettes and case studies.\n\nEfficient: Some operations are orders of magnitude faster than with the margins package, and the memory footprint is much smaller.\n\nThin: Few dependencies.\n\nStandards-compliant: marginaleffects follows “tidy” principles and returns objects that work with standard functions like summary(), head(), tidy(), and glance(). These objects are easy to program with and feed to other packages like modelsummary.\n\n\nValid: When possible, numerical results are checked against alternative software like Stata or other R packages. Unfortunately, it is not possible to test every model type, so users are still strongly encouraged to cross-check their results.\n\nExtensible: Adding support for new models is very easy, often requiring less than 10 lines of new code. Please submit feature requests on Github.\n\n\nActive development: Bugs are fixed promptly.\n\n4 License and Citation\nThe marginaleffects package is licensed under the GNU General Public License v3.0. The content of this website/book is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).\n\nWarning in citation(\"marginaleffects\"): no date field in DESCRIPTION file of package 'marginaleffects'\n\nTo cite package 'marginaleffects' in publications use:\n\n Arel-Bundock V (2023). _marginaleffects: Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis Tests_. R package version 0.15.1.9011, <https://marginaleffects.com/>.\n\nA BibTeX entry for LaTeX users is\n\n @Manual{,\n title = {marginaleffects: Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis\nTests},\n author = {Vincent Arel-Bundock},\n year = {2023},\n note = {R package version 0.15.1.9011},\n url = {https://marginaleffects.com/},\n }"
},
{
"objectID": "articles/marginaleffects.html#installation",
@@ -18,28 +18,28 @@
"href": "articles/marginaleffects.html#estimands-predictions-comparisons-and-slopes",
"title": "\n1 Get Started\n",
"section": "\n1.2 Estimands: Predictions, Comparisons, and Slopes",
- "text": "1.2 Estimands: Predictions, Comparisons, and Slopes\nThe marginaleffects package allows R users to compute and plot three principal quantities of interest: (1) predictions, (2) comparisons, and (3) slopes. In addition, the package includes a convenience function to compute a fourth estimand, “marginal means”, which is a special case of averaged predictions. marginaleffects can also average (or “marginalize”) unit-level (or “conditional”) estimates of all those quantities, and conduct hypothesis tests on them.\nPredictions:\n\nThe outcome predicted by a fitted model on a specified scale for a given combination of values of the predictor variables, such as their observed values, their means, or factor levels. a.k.a. Fitted values, adjusted predictions. predictions(), avg_predictions(), plot_predictions().\n\nComparisons:\n\nCompare the predictions made by a model for different regressor values (e.g., college graduates vs. others): contrasts, differences, risk ratios, odds, etc. comparisons(), avg_comparisons(), plot_comparisons().\n\nSlopes:\n\nPartial derivative of the regression equation with respect to a regressor of interest. a.k.a. Marginal effects, trends. slopes(), avg_slopes(), plot_slopes().\n\nMarginal Means:\n\nPredictions of a model, averaged across a “reference grid” of categorical predictors. marginalmeans().\n\nHypothesis and Equivalence Tests:\n\nHypothesis and equivalence tests can be conducted on linear or non-linear functions of model coefficients, or on any of the quantities computed by the marginaleffects packages (predictions, slopes, comparisons, marginal means, etc.). Uncertainy estimates can be obtained via the delta method (with or without robust standard errors), bootstrap, or simulation.\n\nPredictions, comparisons, and slopes are fundamentally unit-level (or “conditional”) quantities. Except in the simplest linear case, estimates will typically vary based on the values of all the regressors in a model. Each of the observations in a dataset is thus associated with its own prediction, comparison, and slope estimates. Below, we will see that it can be useful to marginalize (or “average over”) unit-level estimates to report an “average prediction”, “average comparison”, or “average slope”.\nOne ambiguous aspect of the definitions above is that the word “marginal” comes up in two different and opposite ways:\n\nIn “marginal effects,” we refer to the effect of a tiny (marginal) change in the regressor on the outcome. This is a slope, or derivative.\nIn “marginal means,” we refer to the process of marginalizing across rows of a prediction grid. This is an average, or integral.\n\nOn this website and in this package, we reserve the expression “marginal effect” to mean a “slope” or “partial derivative”.\nThe marginaleffects package includes functions to estimate, average, plot, and summarize all of the estimands described above. The objects produced by marginaleffects are “tidy”: they produce simple data frames in “long” format. They are also “standards-compliant” and work seamlessly with standard functions like summary(), head(), tidy(), and glance(), as well with external packages like modelsummary or ggplot2.\nWe now apply marginaleffects functions to compute each of the estimands described above. First, we fit a linear regression model with multiplicative interactions:\n\n\nR\nPython\n\n\n\n\nlibrary(marginaleffects)\n\nmod <- lm(mpg ~ hp * wt * am, data = mtcars)\n\n\n\n\nimport polars as pl\nimport numpy as np\nimport statsmodels.formula.api as smf\nfrom marginaleffects import *\n\nmtcars = pl.read_csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv\")\n\nmod = smf.ols(\"mpg ~ hp * wt * am\", data = mtcars).fit()\n\n\n\n\nThen, we call the predictions() function. As noted above, predictions are unit-level estimates, so there is one specific prediction per observation. By default, the predictions() function makes one prediction per observation in the dataset that was used to fit the original model. Since mtcars has 32 rows, the predictions() outcome also has 32 rows:\n\n\nR\nPython\n\n\n\n\npre <- predictions(mod)\n\nnrow(mtcars)\n\n[1] 32\n\nnrow(pre)\n\n[1] 32\n\npre\n\n\n Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n 22.5 0.884 25.44 <0.001 471.7 20.8 24.2\n 20.8 1.194 17.42 <0.001 223.3 18.5 23.1\n 25.3 0.709 35.66 <0.001 922.7 23.9 26.7\n 20.3 0.704 28.75 <0.001 601.5 18.9 21.6\n 17.0 0.712 23.88 <0.001 416.2 15.6 18.4\n--- 22 rows omitted. See ?avg_predictions and ?print.marginaleffects --- \n 29.6 1.874 15.80 <0.001 184.3 25.9 33.3\n 15.9 1.311 12.13 <0.001 110.0 13.3 18.5\n 19.4 1.145 16.95 <0.001 211.6 17.2 21.7\n 14.8 2.017 7.33 <0.001 42.0 10.8 18.7\n 21.5 1.072 20.02 <0.001 293.8 19.4 23.6\nColumns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp, wt, am \nType: response \n\n\n\n\n\npre = predictions(mod)\n\nmtcars.shape\n\n(32, 12)\n\npre.shape\n\n(32, 20)\n\nprint(pre)\n\nshape: (32, 7)\n┌──────────┬───────────┬──────┬──────────┬──────┬──────┬───────┐\n│ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞══════════╪═══════════╪══════╪══════════╪══════╪══════╪═══════╡\n│ 22.5 ┆ 0.884 ┆ 25.4 ┆ 0 ┆ inf ┆ 20.7 ┆ 24.3 │\n│ 20.8 ┆ 1.19 ┆ 17.4 ┆ 4e-15 ┆ 47.8 ┆ 18.3 ┆ 23.3 │\n│ 25.3 ┆ 0.709 ┆ 35.7 ┆ 0 ┆ inf ┆ 23.8 ┆ 26.7 │\n│ 20.3 ┆ 0.704 ┆ 28.8 ┆ 0 ┆ inf ┆ 18.8 ┆ 21.7 │\n│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │\n│ 15.9 ┆ 1.31 ┆ 12.1 ┆ 1e-11 ┆ 36.5 ┆ 13.2 ┆ 18.6 │\n│ 19.4 ┆ 1.15 ┆ 16.9 ┆ 7.33e-15 ┆ 47 ┆ 17 ┆ 21.8 │\n│ 14.8 ┆ 2.02 ┆ 7.33 ┆ 1.43e-07 ┆ 22.7 ┆ 10.6 ┆ 19 │\n│ 21.5 ┆ 1.07 ┆ 20 ┆ 2.22e-16 ┆ 52 ┆ 19.2 ┆ 23.7 │\n└──────────┴───────────┴──────┴──────────┴──────┴──────┴───────┘\n\nColumns: rowid, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb\n\n\n\n\n\nNow, we use the comparisons() function to compute the difference in predicted outcome when each of the predictors is incremented by 1 unit (one predictor at a time, holding all others constant). Once again, comparisons are unit-level quantities. And since there are 3 predictors in the model and our data has 32 rows, we obtain 96 comparisons:\n\n\nR\nPython\n\n\n\n\ncmp <- comparisons(mod)\n\nnrow(cmp)\n\n[1] 96\n\ncmp\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n am 1 - 0 0.325 1.68 0.193 0.8467 0.2 -2.97 3.622\n am 1 - 0 -0.544 1.57 -0.347 0.7287 0.5 -3.62 2.530\n am 1 - 0 1.201 2.35 0.511 0.6090 0.7 -3.40 5.802\n am 1 - 0 -1.703 1.87 -0.912 0.3618 1.5 -5.36 1.957\n am 1 - 0 -0.615 1.68 -0.366 0.7146 0.5 -3.91 2.680\n--- 86 rows omitted. See ?avg_comparisons and ?print.marginaleffects --- \n wt +1 -6.518 1.88 -3.462 <0.001 10.9 -10.21 -2.828\n wt +1 -1.653 3.74 -0.442 0.6588 0.6 -8.99 5.683\n wt +1 -4.520 2.47 -1.830 0.0672 3.9 -9.36 0.321\n wt +1 0.635 4.89 0.130 0.8966 0.2 -8.95 10.216\n wt +1 -6.647 1.86 -3.572 <0.001 11.5 -10.29 -2.999\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, hp, wt, am \nType: response \n\n\n\n\n\ncmp = comparisons(mod)\n\ncmp.shape\n\n(96, 25)\n\nprint(cmp)\n\nshape: (96, 9)\n┌──────┬──────────┬──────────┬───────────┬───┬─────────┬───────┬───────┬───────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪══════════╪══════════╪═══════════╪═══╪═════════╪═══════╪═══════╪═══════╡\n│ am ┆ 1 - 0 ┆ 0.325 ┆ 1.68 ┆ … ┆ 0.848 ┆ 0.237 ┆ -3.15 ┆ 3.8 │\n│ am ┆ 1 - 0 ┆ -0.544 ┆ 1.57 ┆ … ┆ 0.732 ┆ 0.451 ┆ -3.78 ┆ 2.69 │\n│ am ┆ 1 - 0 ┆ 1.2 ┆ 2.35 ┆ … ┆ 0.614 ┆ 0.704 ┆ -3.64 ┆ 6.05 │\n│ am ┆ 1 - 0 ┆ -1.7 ┆ 1.87 ┆ … ┆ 0.371 ┆ 1.43 ┆ -5.56 ┆ 2.15 │\n│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │\n│ wt ┆ +1 ┆ -1.65 ┆ 3.74 ┆ … ┆ 0.663 ┆ 0.593 ┆ -9.38 ┆ 6.07 │\n│ wt ┆ +1 ┆ -4.52 ┆ 2.47 ┆ … ┆ 0.0797 ┆ 3.65 ┆ -9.62 ┆ 0.577 │\n│ wt ┆ +1 ┆ 0.635 ┆ 4.89 ┆ … ┆ 0.898 ┆ 0.156 ┆ -9.45 ┆ 10.7 │\n│ wt ┆ +1 ┆ -6.65 ┆ 1.86 ┆ … ┆ 0.00154 ┆ 9.34 ┆ -10.5 ┆ -2.81 │\n└──────┴──────────┴──────────┴───────────┴───┴─────────┴───────┴───────┴───────┘\n\nColumns: rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb\n\n\n\n\n\nThe comparisons() function allows customized queries. For example, what happens to the predicted outcome when the hp variable increases from 100 to 120?\n\n\nR\nPython\n\n\n\n\ncomparisons(mod, variables = list(hp = c(120, 100)))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 120 - 100 -0.738 0.370 -1.995 0.04607 4.4 -1.463 -0.0129\n hp 120 - 100 -0.574 0.313 -1.836 0.06640 3.9 -1.186 0.0388\n hp 120 - 100 -0.931 0.452 -2.062 0.03922 4.7 -1.817 -0.0460\n hp 120 - 100 -0.845 0.266 -3.182 0.00146 9.4 -1.366 -0.3248\n hp 120 - 100 -0.780 0.268 -2.909 0.00362 8.1 -1.306 -0.2547\n--- 22 rows omitted. See ?avg_comparisons and ?print.marginaleffects --- \n hp 120 - 100 -1.451 0.705 -2.058 0.03958 4.7 -2.834 -0.0692\n hp 120 - 100 -0.384 0.270 -1.422 0.15498 2.7 -0.912 0.1451\n hp 120 - 100 -0.641 0.334 -1.918 0.05513 4.2 -1.297 0.0141\n hp 120 - 100 -0.126 0.272 -0.463 0.64360 0.6 -0.659 0.4075\n hp 120 - 100 -0.635 0.332 -1.911 0.05598 4.2 -1.286 0.0162\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, hp, wt, am \nType: response \n\n\n\n\n\ncmp = comparisons(mod, variables = {\"hp\": [120, 100]})\nprint(cmp)\n\nshape: (32, 9)\n┌──────┬───────────┬──────────┬───────────┬───┬─────────┬───────┬───────────┬───────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪═══════════╪══════════╪═══════════╪═══╪═════════╪═══════╪═══════════╪═══════╡\n│ hp ┆ 100 - 120 ┆ 0.738 ┆ 0.37 ┆ … ┆ 0.0576 ┆ 4.12 ┆ -0.0256 ┆ 1.5 │\n│ hp ┆ 100 - 120 ┆ 0.574 ┆ 0.313 ┆ … ┆ 0.0788 ┆ 3.67 ┆ -0.0713 ┆ 1.22 │\n│ hp ┆ 100 - 120 ┆ 0.931 ┆ 0.452 ┆ … ┆ 0.0502 ┆ 4.32 ┆ -0.000919 ┆ 1.86 │\n│ hp ┆ 100 - 120 ┆ 0.845 ┆ 0.266 ┆ … ┆ 0.00401 ┆ 7.96 ┆ 0.297 ┆ 1.39 │\n│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │\n│ hp ┆ 100 - 120 ┆ 0.384 ┆ 0.27 ┆ … ┆ 0.168 ┆ 2.57 ┆ -0.173 ┆ 0.941 │\n│ hp ┆ 100 - 120 ┆ 0.641 ┆ 0.334 ┆ … ┆ 0.0671 ┆ 3.9 ┆ -0.0488 ┆ 1.33 │\n│ hp ┆ 100 - 120 ┆ 0.126 ┆ 0.272 ┆ … ┆ 0.648 ┆ 0.626 ┆ -0.436 ┆ 0.688 │\n│ hp ┆ 100 - 120 ┆ 0.635 ┆ 0.332 ┆ … ┆ 0.068 ┆ 3.88 ┆ -0.0507 ┆ 1.32 │\n└──────┴───────────┴──────────┴───────────┴───┴─────────┴───────┴───────────┴───────┘\n\nColumns: rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb\n\n\n\n\n\nWhat happens to the predicted outcome when the wt variable increases by 1 standard deviation about its mean?\n\n\nR\nPython\n\n\n\n\ncomparisons(mod, variables = list(hp = \"sd\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp (x + sd/2) - (x - sd/2) -2.530 1.269 -1.995 0.04607 4.4 -5.02 -0.0441\n hp (x + sd/2) - (x - sd/2) -1.967 1.072 -1.836 0.06640 3.9 -4.07 0.1332\n hp (x + sd/2) - (x - sd/2) -3.193 1.549 -2.062 0.03922 4.7 -6.23 -0.1578\n hp (x + sd/2) - (x - sd/2) -2.898 0.911 -3.182 0.00146 9.4 -4.68 -1.1133\n hp (x + sd/2) - (x - sd/2) -2.675 0.919 -2.909 0.00362 8.1 -4.48 -0.8731\n--- 22 rows omitted. See ?avg_comparisons and ?print.marginaleffects --- \n hp (x + sd/2) - (x - sd/2) -4.976 2.418 -2.058 0.03958 4.7 -9.71 -0.2373\n hp (x + sd/2) - (x - sd/2) -1.315 0.925 -1.422 0.15498 2.7 -3.13 0.4974\n hp (x + sd/2) - (x - sd/2) -2.199 1.147 -1.918 0.05513 4.2 -4.45 0.0483\n hp (x + sd/2) - (x - sd/2) -0.432 0.933 -0.463 0.64360 0.6 -2.26 1.3970\n hp (x + sd/2) - (x - sd/2) -2.177 1.139 -1.911 0.05598 4.2 -4.41 0.0556\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, hp, wt, am \nType: response \n\n\n\n\n\ncmp = comparisons(mod, variables = {\"hp\": \"sd\"})\nprint(cmp)\n\nshape: (32, 9)\n┌──────┬────────────────────┬──────────┬───────────┬───┬─────────┬───────┬───────┬─────────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪════════════════════╪══════════╪═══════════╪═══╪═════════╪═══════╪═══════╪═════════╡\n│ hp ┆ +68.56286848932059 ┆ -2.53 ┆ 1.27 ┆ … ┆ 0.0576 ┆ 4.12 ┆ -5.15 ┆ 0.0878 │\n│ hp ┆ +68.56286848932059 ┆ -1.97 ┆ 1.07 ┆ … ┆ 0.0788 ┆ 3.67 ┆ -4.18 ┆ 0.245 │\n│ hp ┆ +68.56286848932059 ┆ -3.19 ┆ 1.55 ┆ … ┆ 0.0502 ┆ 4.32 ┆ -6.39 ┆ 0.00315 │\n│ hp ┆ +68.56286848932059 ┆ -2.9 ┆ 0.911 ┆ … ┆ 0.00401 ┆ 7.96 ┆ -4.78 ┆ -1.02 │\n│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │\n│ hp ┆ +68.56286848932059 ┆ -1.32 ┆ 0.925 ┆ … ┆ 0.168 ┆ 2.57 ┆ -3.22 ┆ 0.594 │\n│ hp ┆ +68.56286848932059 ┆ -2.2 ┆ 1.15 ┆ … ┆ 0.0671 ┆ 3.9 ┆ -4.57 ┆ 0.167 │\n│ hp ┆ +68.56286848932059 ┆ -0.432 ┆ 0.933 ┆ … ┆ 0.648 ┆ 0.626 ┆ -2.36 ┆ 1.49 │\n│ hp ┆ +68.56286848932059 ┆ -2.18 ┆ 1.14 ┆ … ┆ 0.068 ┆ 3.88 ┆ -4.53 ┆ 0.174 │\n└──────┴────────────────────┴──────────┴───────────┴───┴─────────┴───────┴───────┴─────────┘\n\nColumns: rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb\n\n\n\n\n\nThe comparisons() function also allows users to specify arbitrary functions of predictions, with the comparison argument. For example, what is the average ratio between predicted Miles per Gallon after an increase of 50 units in Horsepower?\n\n\nR\nPython\n\n\n\n\ncomparisons(\n mod,\n variables = list(hp = 50),\n comparison = \"ratioavg\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp mean(+50) 0.91 0.0291 31.3 <0.001 711.9 0.853 0.966\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\n\n\n\n\ncmp = comparisons(\n mod,\n variables = {\"hp\": 50},\n comparison = \"ratioavg\")\nprint(cmp)\n\nshape: (1, 9)\n┌──────┬──────────┬──────────┬───────────┬───┬─────────┬─────┬──────┬───────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪══════════╪══════════╪═══════════╪═══╪═════════╪═════╪══════╪═══════╡\n│ hp ┆ +50 ┆ 0.91 ┆ 0.0291 ┆ … ┆ 0 ┆ inf ┆ 0.85 ┆ 0.97 │\n└──────┴──────────┴──────────┴───────────┴───┴─────────┴─────┴──────┴───────┘\n\nColumns: term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nSee the Comparisons vignette for detailed explanations and more options.\nThe slopes() function allows us to compute the partial derivative of the outcome equation with respect to each of the predictors. Once again, we obtain a data frame with 96 rows:\n\n\nR\nPython\n\n\n\n\nmfx <- slopes(mod)\n\nnrow(mfx)\n\n[1] 96\n\nmfx\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n am 1 - 0 0.325 1.68 0.193 0.8467 0.2 -2.97 3.622\n am 1 - 0 -0.544 1.57 -0.347 0.7287 0.5 -3.62 2.530\n am 1 - 0 1.201 2.35 0.511 0.6090 0.7 -3.40 5.802\n am 1 - 0 -1.703 1.87 -0.912 0.3618 1.5 -5.36 1.957\n am 1 - 0 -0.615 1.68 -0.366 0.7146 0.5 -3.91 2.680\n--- 86 rows omitted. See ?avg_slopes and ?print.marginaleffects --- \n wt dY/dX -6.518 1.88 -3.462 <0.001 10.9 -10.21 -2.828\n wt dY/dX -1.653 3.74 -0.442 0.6588 0.6 -8.99 5.682\n wt dY/dX -4.520 2.47 -1.830 0.0672 3.9 -9.36 0.321\n wt dY/dX 0.635 4.89 0.130 0.8966 0.2 -8.95 10.216\n wt dY/dX -6.647 1.86 -3.572 <0.001 11.5 -10.29 -2.999\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, hp, wt, am \nType: response \n\n\n\n\n\nmfx = slopes(mod)\n\nmfx.shape\n\n(96, 25)\n\nprint(mfx)\n\nshape: (96, 9)\n┌──────┬──────────┬──────────┬───────────┬───┬─────────┬───────┬─────────┬──────────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪══════════╪══════════╪═══════════╪═══╪═════════╪═══════╪═════════╪══════════╡\n│ hp ┆ dY/dX ┆ -0.0369 ┆ 0.019 ┆ … ┆ 0.0639 ┆ 3.97 ┆ -0.0761 ┆ 0.00231 │\n│ hp ┆ dY/dX ┆ -0.0287 ┆ 0.018 ┆ … ┆ 0.124 ┆ 3.01 ┆ -0.0659 ┆ 0.00851 │\n│ hp ┆ dY/dX ┆ -0.0466 ┆ 0.0208 ┆ … ┆ 0.0343 ┆ 4.86 ┆ -0.0894 ┆ -0.00374 │\n│ hp ┆ dY/dX ┆ -0.0423 ┆ 0.0134 ┆ … ┆ 0.00432 ┆ 7.86 ┆ -0.07 ┆ -0.0146 │\n│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │\n│ wt ┆ dY/dX ┆ -1.65 ┆ 3.74 ┆ … ┆ 0.662 ┆ 0.594 ┆ -9.37 ┆ 6.06 │\n│ wt ┆ dY/dX ┆ -4.52 ┆ 2.47 ┆ … ┆ 0.0795 ┆ 3.65 ┆ -9.61 ┆ 0.574 │\n│ wt ┆ dY/dX ┆ 0.635 ┆ 4.89 ┆ … ┆ 0.898 ┆ 0.156 ┆ -9.45 ┆ 10.7 │\n│ wt ┆ dY/dX ┆ -6.65 ┆ 1.86 ┆ … ┆ 0.00155 ┆ 9.33 ┆ -10.5 ┆ -2.8 │\n└──────┴──────────┴──────────┴───────────┴───┴─────────┴───────┴─────────┴──────────┘\n\nColumns: rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb"
+ "text": "1.2 Estimands: Predictions, Comparisons, and Slopes\nThe marginaleffects package allows R users to compute and plot three principal quantities of interest: (1) predictions, (2) comparisons, and (3) slopes. In addition, the package includes a convenience function to compute a fourth estimand, “marginal means”, which is a special case of averaged predictions. marginaleffects can also average (or “marginalize”) unit-level (or “conditional”) estimates of all those quantities, and conduct hypothesis tests on them.\nPredictions:\n\nThe outcome predicted by a fitted model on a specified scale for a given combination of values of the predictor variables, such as their observed values, their means, or factor levels. a.k.a. Fitted values, adjusted predictions. predictions(), avg_predictions(), plot_predictions().\n\nComparisons:\n\nCompare the predictions made by a model for different regressor values (e.g., college graduates vs. others): contrasts, differences, risk ratios, odds, etc. comparisons(), avg_comparisons(), plot_comparisons().\n\nSlopes:\n\nPartial derivative of the regression equation with respect to a regressor of interest. a.k.a. Marginal effects, trends. slopes(), avg_slopes(), plot_slopes().\n\nMarginal Means:\n\nPredictions of a model, averaged across a “reference grid” of categorical predictors. marginalmeans().\n\nHypothesis and Equivalence Tests:\n\nHypothesis and equivalence tests can be conducted on linear or non-linear functions of model coefficients, or on any of the quantities computed by the marginaleffects packages (predictions, slopes, comparisons, marginal means, etc.). Uncertainy estimates can be obtained via the delta method (with or without robust standard errors), bootstrap, or simulation.\n\nPredictions, comparisons, and slopes are fundamentally unit-level (or “conditional”) quantities. Except in the simplest linear case, estimates will typically vary based on the values of all the regressors in a model. Each of the observations in a dataset is thus associated with its own prediction, comparison, and slope estimates. Below, we will see that it can be useful to marginalize (or “average over”) unit-level estimates to report an “average prediction”, “average comparison”, or “average slope”.\nOne ambiguous aspect of the definitions above is that the word “marginal” comes up in two different and opposite ways:\n\nIn “marginal effects,” we refer to the effect of a tiny (marginal) change in the regressor on the outcome. This is a slope, or derivative.\nIn “marginal means,” we refer to the process of marginalizing across rows of a prediction grid. This is an average, or integral.\n\nOn this website and in this package, we reserve the expression “marginal effect” to mean a “slope” or “partial derivative”.\nThe marginaleffects package includes functions to estimate, average, plot, and summarize all of the estimands described above. The objects produced by marginaleffects are “tidy”: they produce simple data frames in “long” format. They are also “standards-compliant” and work seamlessly with standard functions like summary(), head(), tidy(), and glance(), as well with external packages like modelsummary or ggplot2.\nWe now apply marginaleffects functions to compute each of the estimands described above. First, we fit a linear regression model with multiplicative interactions:\n\n\nR\nPython\n\n\n\n\nlibrary(marginaleffects)\n\nmod <- lm(mpg ~ hp * wt * am, data = mtcars)\n\n\n\n\nimport polars as pl\nimport numpy as np\nimport statsmodels.formula.api as smf\nfrom marginaleffects import *\n\nmtcars = pl.read_csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv\")\n\nmod = smf.ols(\"mpg ~ hp * wt * am\", data = mtcars).fit()\n\n\n\n\nThen, we call the predictions() function. As noted above, predictions are unit-level estimates, so there is one specific prediction per observation. By default, the predictions() function makes one prediction per observation in the dataset that was used to fit the original model. Since mtcars has 32 rows, the predictions() outcome also has 32 rows:\n\n\nR\nPython\n\n\n\n\npre <- predictions(mod)\n\nnrow(mtcars)\n\n[1] 32\n\nnrow(pre)\n\n[1] 32\n\npre\n\n\n Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n 22.5 0.884 25.44 <0.001 471.7 20.8 24.2\n 20.8 1.194 17.42 <0.001 223.3 18.5 23.1\n 25.3 0.709 35.66 <0.001 922.7 23.9 26.7\n 20.3 0.704 28.75 <0.001 601.5 18.9 21.6\n 17.0 0.712 23.88 <0.001 416.2 15.6 18.4\n--- 22 rows omitted. See ?avg_predictions and ?print.marginaleffects --- \n 29.6 1.874 15.80 <0.001 184.3 25.9 33.3\n 15.9 1.311 12.13 <0.001 110.0 13.3 18.5\n 19.4 1.145 16.95 <0.001 211.6 17.2 21.7\n 14.8 2.017 7.33 <0.001 42.0 10.8 18.7\n 21.5 1.072 20.02 <0.001 293.8 19.4 23.6\nColumns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp, wt, am \nType: response \n\n\n\n\n\npre = predictions(mod)\n\nmtcars.shape\n\n(32, 12)\n\npre.shape\n\n(32, 20)\n\nprint(pre)\n\nshape: (32, 7)\n┌──────────┬───────────┬──────┬──────────┬──────┬──────┬───────┐\n│ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞══════════╪═══════════╪══════╪══════════╪══════╪══════╪═══════╡\n│ 22.5 ┆ 0.884 ┆ 25.4 ┆ 0 ┆ inf ┆ 20.7 ┆ 24.3 │\n│ 20.8 ┆ 1.19 ┆ 17.4 ┆ 4e-15 ┆ 47.8 ┆ 18.3 ┆ 23.3 │\n│ 25.3 ┆ 0.709 ┆ 35.7 ┆ 0 ┆ inf ┆ 23.8 ┆ 26.7 │\n│ 20.3 ┆ 0.704 ┆ 28.8 ┆ 0 ┆ inf ┆ 18.8 ┆ 21.7 │\n│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │\n│ 15.9 ┆ 1.31 ┆ 12.1 ┆ 1e-11 ┆ 36.5 ┆ 13.2 ┆ 18.6 │\n│ 19.4 ┆ 1.15 ┆ 16.9 ┆ 7.33e-15 ┆ 47 ┆ 17 ┆ 21.8 │\n│ 14.8 ┆ 2.02 ┆ 7.33 ┆ 1.43e-07 ┆ 22.7 ┆ 10.6 ┆ 19 │\n│ 21.5 ┆ 1.07 ┆ 20 ┆ 2.22e-16 ┆ 52 ┆ 19.2 ┆ 23.7 │\n└──────────┴───────────┴──────┴──────────┴──────┴──────┴───────┘\n\nColumns: rowid, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb\n\n\n\n\n\nNow, we use the comparisons() function to compute the difference in predicted outcome when each of the predictors is incremented by 1 unit (one predictor at a time, holding all others constant). Once again, comparisons are unit-level quantities. And since there are 3 predictors in the model and our data has 32 rows, we obtain 96 comparisons:\n\n\nR\nPython\n\n\n\n\ncmp <- comparisons(mod)\n\nnrow(cmp)\n\n[1] 96\n\ncmp\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n am 1 - 0 0.325 1.68 0.193 0.8467 0.2 -2.97 3.622\n am 1 - 0 -0.544 1.57 -0.347 0.7287 0.5 -3.62 2.530\n am 1 - 0 1.201 2.35 0.511 0.6090 0.7 -3.40 5.802\n am 1 - 0 -1.703 1.87 -0.912 0.3618 1.5 -5.36 1.957\n am 1 - 0 -0.615 1.68 -0.366 0.7146 0.5 -3.91 2.680\n--- 86 rows omitted. See ?avg_comparisons and ?print.marginaleffects --- \n wt +1 -6.518 1.88 -3.462 <0.001 10.9 -10.21 -2.828\n wt +1 -1.653 3.74 -0.442 0.6588 0.6 -8.99 5.683\n wt +1 -4.520 2.47 -1.830 0.0672 3.9 -9.36 0.321\n wt +1 0.635 4.89 0.130 0.8966 0.2 -8.95 10.216\n wt +1 -6.647 1.86 -3.572 <0.001 11.5 -10.29 -2.999\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, hp, wt, am \nType: response \n\n\n\n\n\ncmp = comparisons(mod)\n\ncmp.shape\n\n(96, 25)\n\nprint(cmp)\n\nshape: (96, 9)\n┌──────┬──────────┬──────────┬───────────┬───┬─────────┬───────┬─────────┬─────────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪══════════╪══════════╪═══════════╪═══╪═════════╪═══════╪═════════╪═════════╡\n│ hp ┆ +1 ┆ -0.0369 ┆ 0.0185 ┆ … ┆ 0.0575 ┆ 4.12 ┆ -0.0751 ┆ 0.00128 │\n│ hp ┆ +1 ┆ -0.0287 ┆ 0.0156 ┆ … ┆ 0.0788 ┆ 3.67 ┆ -0.0609 ┆ 0.00357 │\n│ hp ┆ +1 ┆ -0.0466 ┆ 0.0226 ┆ … ┆ 0.0502 ┆ 4.32 ┆ -0.0932 ┆ 4.6e-05 │\n│ hp ┆ +1 ┆ -0.0423 ┆ 0.0133 ┆ … ┆ 0.00401 ┆ 7.96 ┆ -0.0697 ┆ -0.0149 │\n│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │\n│ am ┆ 1 - 0 ┆ 2.11 ┆ 2.29 ┆ … ┆ 0.367 ┆ 1.45 ┆ -2.62 ┆ 6.83 │\n│ am ┆ 1 - 0 ┆ 0.895 ┆ 1.64 ┆ … ┆ 0.591 ┆ 0.758 ┆ -2.5 ┆ 4.29 │\n│ am ┆ 1 - 0 ┆ 4.03 ┆ 3.24 ┆ … ┆ 0.226 ┆ 2.15 ┆ -2.66 ┆ 10.7 │\n│ am ┆ 1 - 0 ┆ -0.237 ┆ 1.59 ┆ … ┆ 0.883 ┆ 0.18 ┆ -3.51 ┆ 3.04 │\n└──────┴──────────┴──────────┴───────────┴───┴─────────┴───────┴─────────┴─────────┘\n\nColumns: rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb\n\n\n\n\n\nThe comparisons() function allows customized queries. For example, what happens to the predicted outcome when the hp variable increases from 100 to 120?\n\n\nR\nPython\n\n\n\n\ncomparisons(mod, variables = list(hp = c(120, 100)))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 120 - 100 -0.738 0.370 -1.995 0.04607 4.4 -1.463 -0.0129\n hp 120 - 100 -0.574 0.313 -1.836 0.06640 3.9 -1.186 0.0388\n hp 120 - 100 -0.931 0.452 -2.062 0.03922 4.7 -1.817 -0.0460\n hp 120 - 100 -0.845 0.266 -3.182 0.00146 9.4 -1.366 -0.3248\n hp 120 - 100 -0.780 0.268 -2.909 0.00362 8.1 -1.306 -0.2547\n--- 22 rows omitted. See ?avg_comparisons and ?print.marginaleffects --- \n hp 120 - 100 -1.451 0.705 -2.058 0.03958 4.7 -2.834 -0.0692\n hp 120 - 100 -0.384 0.270 -1.422 0.15498 2.7 -0.912 0.1451\n hp 120 - 100 -0.641 0.334 -1.918 0.05513 4.2 -1.297 0.0141\n hp 120 - 100 -0.126 0.272 -0.463 0.64360 0.6 -0.659 0.4075\n hp 120 - 100 -0.635 0.332 -1.911 0.05598 4.2 -1.286 0.0162\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, hp, wt, am \nType: response \n\n\n\n\n\ncmp = comparisons(mod, variables = {\"hp\": [120, 100]})\nprint(cmp)\n\nshape: (32, 9)\n┌──────┬───────────┬──────────┬───────────┬───┬─────────┬───────┬───────────┬───────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪═══════════╪══════════╪═══════════╪═══╪═════════╪═══════╪═══════════╪═══════╡\n│ hp ┆ 100 - 120 ┆ 0.738 ┆ 0.37 ┆ … ┆ 0.0576 ┆ 4.12 ┆ -0.0256 ┆ 1.5 │\n│ hp ┆ 100 - 120 ┆ 0.574 ┆ 0.313 ┆ … ┆ 0.0788 ┆ 3.67 ┆ -0.0713 ┆ 1.22 │\n│ hp ┆ 100 - 120 ┆ 0.931 ┆ 0.452 ┆ … ┆ 0.0502 ┆ 4.32 ┆ -0.000918 ┆ 1.86 │\n│ hp ┆ 100 - 120 ┆ 0.845 ┆ 0.266 ┆ … ┆ 0.00401 ┆ 7.96 ┆ 0.297 ┆ 1.39 │\n│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │\n│ hp ┆ 100 - 120 ┆ 0.384 ┆ 0.27 ┆ … ┆ 0.168 ┆ 2.57 ┆ -0.173 ┆ 0.941 │\n│ hp ┆ 100 - 120 ┆ 0.641 ┆ 0.334 ┆ … ┆ 0.0671 ┆ 3.9 ┆ -0.0488 ┆ 1.33 │\n│ hp ┆ 100 - 120 ┆ 0.126 ┆ 0.272 ┆ … ┆ 0.648 ┆ 0.626 ┆ -0.436 ┆ 0.688 │\n│ hp ┆ 100 - 120 ┆ 0.635 ┆ 0.332 ┆ … ┆ 0.068 ┆ 3.88 ┆ -0.0507 ┆ 1.32 │\n└──────┴───────────┴──────────┴───────────┴───┴─────────┴───────┴───────────┴───────┘\n\nColumns: rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb\n\n\n\n\n\nWhat happens to the predicted outcome when the wt variable increases by 1 standard deviation about its mean?\n\n\nR\nPython\n\n\n\n\ncomparisons(mod, variables = list(hp = \"sd\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp (x + sd/2) - (x - sd/2) -2.530 1.269 -1.995 0.04607 4.4 -5.02 -0.0441\n hp (x + sd/2) - (x - sd/2) -1.967 1.072 -1.836 0.06640 3.9 -4.07 0.1332\n hp (x + sd/2) - (x - sd/2) -3.193 1.549 -2.062 0.03922 4.7 -6.23 -0.1578\n hp (x + sd/2) - (x - sd/2) -2.898 0.911 -3.182 0.00146 9.4 -4.68 -1.1133\n hp (x + sd/2) - (x - sd/2) -2.675 0.919 -2.909 0.00362 8.1 -4.48 -0.8731\n--- 22 rows omitted. See ?avg_comparisons and ?print.marginaleffects --- \n hp (x + sd/2) - (x - sd/2) -4.976 2.418 -2.058 0.03958 4.7 -9.71 -0.2373\n hp (x + sd/2) - (x - sd/2) -1.315 0.925 -1.422 0.15498 2.7 -3.13 0.4974\n hp (x + sd/2) - (x - sd/2) -2.199 1.147 -1.918 0.05513 4.2 -4.45 0.0483\n hp (x + sd/2) - (x - sd/2) -0.432 0.933 -0.463 0.64360 0.6 -2.26 1.3970\n hp (x + sd/2) - (x - sd/2) -2.177 1.139 -1.911 0.05598 4.2 -4.41 0.0556\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, hp, wt, am \nType: response \n\n\n\n\n\ncmp = comparisons(mod, variables = {\"hp\": \"sd\"})\nprint(cmp)\n\nshape: (32, 9)\n┌──────┬────────────────────┬──────────┬───────────┬───┬─────────┬───────┬───────┬─────────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪════════════════════╪══════════╪═══════════╪═══╪═════════╪═══════╪═══════╪═════════╡\n│ hp ┆ +68.56286848932059 ┆ -2.53 ┆ 1.27 ┆ … ┆ 0.0576 ┆ 4.12 ┆ -5.15 ┆ 0.0878 │\n│ hp ┆ +68.56286848932059 ┆ -1.97 ┆ 1.07 ┆ … ┆ 0.0788 ┆ 3.67 ┆ -4.18 ┆ 0.245 │\n│ hp ┆ +68.56286848932059 ┆ -3.19 ┆ 1.55 ┆ … ┆ 0.0502 ┆ 4.32 ┆ -6.39 ┆ 0.00315 │\n│ hp ┆ +68.56286848932059 ┆ -2.9 ┆ 0.911 ┆ … ┆ 0.00401 ┆ 7.96 ┆ -4.78 ┆ -1.02 │\n│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │\n│ hp ┆ +68.56286848932059 ┆ -1.32 ┆ 0.925 ┆ … ┆ 0.168 ┆ 2.57 ┆ -3.22 ┆ 0.594 │\n│ hp ┆ +68.56286848932059 ┆ -2.2 ┆ 1.15 ┆ … ┆ 0.0671 ┆ 3.9 ┆ -4.57 ┆ 0.167 │\n│ hp ┆ +68.56286848932059 ┆ -0.432 ┆ 0.933 ┆ … ┆ 0.648 ┆ 0.626 ┆ -2.36 ┆ 1.49 │\n│ hp ┆ +68.56286848932059 ┆ -2.18 ┆ 1.14 ┆ … ┆ 0.068 ┆ 3.88 ┆ -4.53 ┆ 0.174 │\n└──────┴────────────────────┴──────────┴───────────┴───┴─────────┴───────┴───────┴─────────┘\n\nColumns: rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb\n\n\n\n\n\nThe comparisons() function also allows users to specify arbitrary functions of predictions, with the comparison argument. For example, what is the average ratio between predicted Miles per Gallon after an increase of 50 units in Horsepower?\n\n\nR\nPython\n\n\n\n\ncomparisons(\n mod,\n variables = list(hp = 50),\n comparison = \"ratioavg\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp mean(+50) 0.91 0.0291 31.3 <0.001 711.9 0.853 0.966\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\n\n\n\n\ncmp = comparisons(\n mod,\n variables = {\"hp\": 50},\n comparison = \"ratioavg\")\nprint(cmp)\n\nshape: (1, 9)\n┌──────┬──────────┬──────────┬───────────┬───┬─────────┬─────┬──────┬───────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪══════════╪══════════╪═══════════╪═══╪═════════╪═════╪══════╪═══════╡\n│ hp ┆ +50 ┆ 0.91 ┆ 0.0291 ┆ … ┆ 0 ┆ inf ┆ 0.85 ┆ 0.97 │\n└──────┴──────────┴──────────┴───────────┴───┴─────────┴─────┴──────┴───────┘\n\nColumns: term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nSee the Comparisons vignette for detailed explanations and more options.\nThe slopes() function allows us to compute the partial derivative of the outcome equation with respect to each of the predictors. Once again, we obtain a data frame with 96 rows:\n\n\nR\nPython\n\n\n\n\nmfx <- slopes(mod)\n\nnrow(mfx)\n\n[1] 96\n\nmfx\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n am 1 - 0 0.325 1.68 0.193 0.8467 0.2 -2.97 3.622\n am 1 - 0 -0.544 1.57 -0.347 0.7287 0.5 -3.62 2.530\n am 1 - 0 1.201 2.35 0.511 0.6090 0.7 -3.40 5.802\n am 1 - 0 -1.703 1.87 -0.912 0.3618 1.5 -5.36 1.957\n am 1 - 0 -0.615 1.68 -0.366 0.7146 0.5 -3.91 2.680\n--- 86 rows omitted. See ?avg_slopes and ?print.marginaleffects --- \n wt dY/dX -6.518 1.88 -3.462 <0.001 10.9 -10.21 -2.827\n wt dY/dX -1.653 3.74 -0.442 0.6588 0.6 -8.99 5.683\n wt dY/dX -4.520 2.47 -1.830 0.0673 3.9 -9.36 0.321\n wt dY/dX 0.635 4.89 0.130 0.8966 0.2 -8.95 10.215\n wt dY/dX -6.647 1.86 -3.571 <0.001 11.5 -10.29 -2.999\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, hp, wt, am \nType: response \n\n\n\n\n\nmfx = slopes(mod)\n\nmfx.shape\n\n(96, 25)\n\nprint(mfx)\n\nshape: (96, 9)\n┌──────┬──────────┬───────────┬───────────┬───┬──────────┬───────┬───────────┬──────────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪══════════╪═══════════╪═══════════╪═══╪══════════╪═══════╪═══════════╪══════════╡\n│ wt ┆ dY/dX ┆ -6.61 ┆ 1.87 ┆ … ┆ 0.00164 ┆ 9.25 ┆ -10.5 ┆ -2.76 │\n│ wt ┆ dY/dX ┆ -6.61 ┆ 1.87 ┆ … ┆ 0.00165 ┆ 9.25 ┆ -10.5 ┆ -2.76 │\n│ wt ┆ dY/dX ┆ -7.16 ┆ 1.8 ┆ … ┆ 0.000558 ┆ 10.8 ┆ -10.9 ┆ -3.45 │\n│ wt ┆ dY/dX ┆ -3.21 ┆ 2.01 ┆ … ┆ 0.123 ┆ 3.02 ┆ -7.35 ┆ 0.939 │\n│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │\n│ am ┆ dY/dX ┆ 2.11e+04 ┆ 2.29e+04 ┆ … ┆ 0.367 ┆ 1.45 ┆ -2.62e+04 ┆ 6.83e+04 │\n│ am ┆ dY/dX ┆ 8.95e+03 ┆ 1.64e+04 ┆ … ┆ 0.591 ┆ 0.758 ┆ -2.5e+04 ┆ 4.29e+04 │\n│ am ┆ dY/dX ┆ 4.03e+04 ┆ 3.24e+04 ┆ … ┆ 0.226 ┆ 2.15 ┆ -2.66e+04 ┆ 1.07e+05 │\n│ am ┆ dY/dX ┆ -2.37e+03 ┆ 1.59e+04 ┆ … ┆ 0.883 ┆ 0.18 ┆ -3.51e+04 ┆ 3.04e+04 │\n└──────┴──────────┴───────────┴───────────┴───┴──────────┴───────┴───────────┴──────────┘\n\nColumns: rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb"
},
{
"objectID": "articles/marginaleffects.html#grid",
"href": "articles/marginaleffects.html#grid",
"title": "\n1 Get Started\n",
"section": "\n1.3 Grid",
- "text": "1.3 Grid\nPredictions, comparisons, and slopes are typically “conditional” quantities which depend on the values of all the predictors in the model. By default, marginaleffects functions estimate quantities of interest for the empirical distribution of the data (i.e., for each row of the original dataset). However, users can specify the exact values of the predictors they want to investigate by using the newdata argument.\nnewdata accepts data frames, shortcut strings, or a call to the datagrid() function. For example, to compute the predicted outcome for a hypothetical car with all predictors equal to the sample mean or median, we can do:\n\n\nR\nPython\n\n\n\n\npredictions(mod, newdata = \"mean\")\n\n\n Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % hp wt am\n 18.4 0.68 27 <0.001 531.7 17 19.7 147 3.22 0.406\n\nColumns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp, wt, am \nType: response \n\npredictions(mod, newdata = \"median\")\n\n\n Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % hp wt am\n 19.4 0.646 30 <0.001 653.2 18.1 20.6 123 3.33 0\n\nColumns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp, wt, am \nType: response \n\n\n\n\n\np = predictions(mod, newdata = \"mean\")\nprint(p)\n\nshape: (1, 7)\n┌──────────┬───────────┬─────┬─────────┬─────┬──────┬───────┐\n│ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞══════════╪═══════════╪═════╪═════════╪═════╪══════╪═══════╡\n│ 18.4 ┆ 0.68 ┆ 27 ┆ 0 ┆ inf ┆ 17 ┆ 19.8 │\n└──────────┴───────────┴─────┴─────────┴─────┴──────┴───────┘\n\nColumns: rowid, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb\n\np = predictions(mod, newdata = \"median\")\nprint(p)\n\nshape: (1, 7)\n┌──────────┬───────────┬─────┬─────────┬─────┬──────┬───────┐\n│ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞══════════╪═══════════╪═════╪═════════╪═════╪══════╪═══════╡\n│ 19.4 ┆ 0.646 ┆ 30 ┆ 0 ┆ inf ┆ 18 ┆ 20.7 │\n└──────────┴───────────┴─────┴─────────┴─────┴──────┴───────┘\n\nColumns: rowid, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb\n\n\n\n\n\nThe datagrid function gives us a powerful way to define a grid of predictors. All the variables not mentioned explicitly in datagrid() are fixed to their mean or mode:\n\n\nR\nPython\n\n\n\n\npredictions(\n mod,\n newdata = datagrid(\n am = c(0, 1),\n wt = range))\n\n\n am wt Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % hp\n 0 1.51 23.3 2.71 8.60 <0.001 56.7 17.96 28.6 147\n 0 5.42 12.8 2.98 4.30 <0.001 15.8 6.96 18.6 147\n 1 1.51 27.1 2.85 9.52 <0.001 69.0 21.56 32.7 147\n 1 5.42 5.9 5.81 1.01 0.31 1.7 -5.50 17.3 147\n\nColumns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp, am, wt \nType: response \n\n\n\n\n\np = predictions(\n mod,\n newdata = datagrid(\n mod,\n am = [0, 1],\n wt = [mtcars[\"wt\"].min(), mtcars[\"wt\"].max()]))\nprint(p)\n\nshape: (4, 7)\n┌──────────┬───────────┬──────┬──────────┬──────┬──────┬───────┐\n│ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞══════════╪═══════════╪══════╪══════════╪══════╪══════╪═══════╡\n│ 23.3 ┆ 2.71 ┆ 8.6 ┆ 8.65e-09 ┆ 26.8 ┆ 17.7 ┆ 28.8 │\n│ 12.8 ┆ 2.98 ┆ 4.3 ┆ 0.000249 ┆ 12 ┆ 6.65 ┆ 18.9 │\n│ 27.1 ┆ 2.85 ┆ 9.52 ┆ 1.27e-09 ┆ 29.5 ┆ 21.3 ┆ 33 │\n│ 5.9 ┆ 5.81 ┆ 1.01 ┆ 0.32 ┆ 1.64 ┆ -6.1 ┆ 17.9 │\n└──────────┴───────────┴──────┴──────────┴──────┴──────┴───────┘\n\nColumns: rowid, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, am, wt, rownames, mpg, cyl, disp, hp, drat, qsec, vs, gear, carb\n\n\n\n\n\nThe same mechanism is available in comparisons() and slopes(). To estimate the partial derivative of mpg with respect to wt, when am is equal to 0 and 1, while other predictors are held at their means:\n\n\nR\nPython\n\n\n\n\nslopes(\n mod,\n variables = \"wt\",\n newdata = datagrid(am = 0:1))\n\n\n Term am Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n wt 0 -2.68 1.42 -1.89 0.0593 4.1 -5.46 0.105\n wt 1 -5.43 2.15 -2.52 0.0116 6.4 -9.65 -1.213\n\nColumns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, am, predicted_lo, predicted_hi, predicted, mpg, hp, wt \nType: response \n\n\n\n\n\ns = slopes(\n mod,\n variables = \"wt\",\n newdata = datagrid(mod, am = [0, 1]))\nprint(s)\n\nshape: (2, 9)\n┌──────┬──────────┬──────────┬───────────┬───┬─────────┬──────┬───────┬────────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪══════════╪══════════╪═══════════╪═══╪═════════╪══════╪═══════╪════════╡\n│ wt ┆ dY/dX ┆ -2.68 ┆ 1.42 ┆ … ┆ 0.072 ┆ 3.8 ┆ -5.61 ┆ 0.258 │\n│ wt ┆ dY/dX ┆ -5.43 ┆ 2.15 ┆ … ┆ 0.0187 ┆ 5.74 ┆ -9.88 ┆ -0.986 │\n└──────┴──────────┴──────────┴───────────┴───┴─────────┴──────┴───────┴────────┘\n\nColumns: rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, am, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, gear, carb\n\n\n\n\n\nWe can also plot how predictions, comparisons, or slopes change across different values of the predictors using three powerful plotting functions:\n\n\nplot_predictions: Conditional Adjusted Predictions\n\nplot_comparisons: Conditional Comparisons\n\nplot_slopes: Conditional Marginal Effects\n\nFor example, this plot shows the outcomes predicted by our model for different values of the wt and am variables:\n\nplot_predictions(mod, condition = list(\"hp\", \"wt\" = \"threenum\", \"am\"))\n\n\n\n\nThis plot shows how the derivative of mpg with respect to am varies as a function of wt and hp:\n\nplot_slopes(mod, variables = \"am\", condition = list(\"hp\", \"wt\" = \"minmax\"))\n\n\n\n\nSee this vignette for more information: Plots, interactions, predictions, contrasts, and slopes"
+ "text": "1.3 Grid\nPredictions, comparisons, and slopes are typically “conditional” quantities which depend on the values of all the predictors in the model. By default, marginaleffects functions estimate quantities of interest for the empirical distribution of the data (i.e., for each row of the original dataset). However, users can specify the exact values of the predictors they want to investigate by using the newdata argument.\nnewdata accepts data frames, shortcut strings, or a call to the datagrid() function. For example, to compute the predicted outcome for a hypothetical car with all predictors equal to the sample mean or median, we can do:\n\n\nR\nPython\n\n\n\n\npredictions(mod, newdata = \"mean\")\n\n\n Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % hp wt am\n 18.4 0.68 27 <0.001 531.7 17 19.7 147 3.22 0.406\n\nColumns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp, wt, am \nType: response \n\npredictions(mod, newdata = \"median\")\n\n\n Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % hp wt am\n 19.4 0.646 30 <0.001 653.2 18.1 20.6 123 3.33 0\n\nColumns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp, wt, am \nType: response \n\n\n\n\n\np = predictions(mod, newdata = \"mean\")\nprint(p)\n\nshape: (1, 7)\n┌──────────┬───────────┬─────┬─────────┬─────┬──────┬───────┐\n│ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞══════════╪═══════════╪═════╪═════════╪═════╪══════╪═══════╡\n│ 18.4 ┆ 0.68 ┆ 27 ┆ 0 ┆ inf ┆ 17 ┆ 19.8 │\n└──────────┴───────────┴─────┴─────────┴─────┴──────┴───────┘\n\nColumns: rowid, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb\n\np = predictions(mod, newdata = \"median\")\nprint(p)\n\nshape: (1, 7)\n┌──────────┬───────────┬─────┬─────────┬─────┬──────┬───────┐\n│ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞══════════╪═══════════╪═════╪═════════╪═════╪══════╪═══════╡\n│ 19.4 ┆ 0.646 ┆ 30 ┆ 0 ┆ inf ┆ 18 ┆ 20.7 │\n└──────────┴───────────┴─────┴─────────┴─────┴──────┴───────┘\n\nColumns: rowid, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb\n\n\n\n\n\nThe datagrid function gives us a powerful way to define a grid of predictors. All the variables not mentioned explicitly in datagrid() are fixed to their mean or mode:\n\n\nR\nPython\n\n\n\n\npredictions(\n mod,\n newdata = datagrid(\n am = c(0, 1),\n wt = range))\n\n\n am wt Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % hp\n 0 1.51 23.3 2.71 8.60 <0.001 56.7 17.96 28.6 147\n 0 5.42 12.8 2.98 4.30 <0.001 15.8 6.96 18.6 147\n 1 1.51 27.1 2.85 9.52 <0.001 69.0 21.56 32.7 147\n 1 5.42 5.9 5.81 1.01 0.31 1.7 -5.50 17.3 147\n\nColumns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp, am, wt \nType: response \n\n\n\n\n\np = predictions(\n mod,\n newdata = datagrid(\n mod,\n am = [0, 1],\n wt = [mtcars[\"wt\"].min(), mtcars[\"wt\"].max()]))\nprint(p)\n\nshape: (4, 7)\n┌──────────┬───────────┬──────┬──────────┬──────┬──────┬───────┐\n│ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞══════════╪═══════════╪══════╪══════════╪══════╪══════╪═══════╡\n│ 23.3 ┆ 2.71 ┆ 8.6 ┆ 8.65e-09 ┆ 26.8 ┆ 17.7 ┆ 28.8 │\n│ 12.8 ┆ 2.98 ┆ 4.3 ┆ 0.000249 ┆ 12 ┆ 6.65 ┆ 18.9 │\n│ 27.1 ┆ 2.85 ┆ 9.52 ┆ 1.27e-09 ┆ 29.5 ┆ 21.3 ┆ 33 │\n│ 5.9 ┆ 5.81 ┆ 1.01 ┆ 0.32 ┆ 1.64 ┆ -6.1 ┆ 17.9 │\n└──────────┴───────────┴──────┴──────────┴──────┴──────┴───────┘\n\nColumns: rowid, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, am, wt, rownames, mpg, cyl, disp, hp, drat, qsec, vs, gear, carb\n\n\n\n\n\nThe same mechanism is available in comparisons() and slopes(). To estimate the partial derivative of mpg with respect to wt, when am is equal to 0 and 1, while other predictors are held at their means:\n\n\nR\nPython\n\n\n\n\nslopes(\n mod,\n variables = \"wt\",\n newdata = datagrid(am = 0:1))\n\n\n Term am Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n wt 0 -2.68 1.42 -1.89 0.0593 4.1 -5.46 0.106\n wt 1 -5.43 2.15 -2.52 0.0116 6.4 -9.65 -1.214\n\nColumns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, am, predicted_lo, predicted_hi, predicted, mpg, hp, wt \nType: response \n\n\n\n\n\ns = slopes(\n mod,\n variables = \"wt\",\n newdata = datagrid(mod, am = [0, 1]))\nprint(s)\n\nshape: (2, 9)\n┌──────┬──────────┬──────────┬───────────┬───┬─────────┬──────┬───────┬────────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪══════════╪══════════╪═══════════╪═══╪═════════╪══════╪═══════╪════════╡\n│ wt ┆ dY/dX ┆ -2.68 ┆ 1.42 ┆ … ┆ 0.072 ┆ 3.8 ┆ -5.61 ┆ 0.258 │\n│ wt ┆ dY/dX ┆ -5.43 ┆ 2.15 ┆ … ┆ 0.0186 ┆ 5.75 ┆ -9.87 ┆ -0.993 │\n└──────┴──────────┴──────────┴───────────┴───┴─────────┴──────┴───────┴────────┘\n\nColumns: rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, am, rownames, mpg, cyl, disp, hp, drat, wt, qsec, vs, gear, carb\n\n\n\n\n\nWe can also plot how predictions, comparisons, or slopes change across different values of the predictors using three powerful plotting functions:\n\n\nplot_predictions: Conditional Adjusted Predictions\n\nplot_comparisons: Conditional Comparisons\n\nplot_slopes: Conditional Marginal Effects\n\nFor example, this plot shows the outcomes predicted by our model for different values of the wt and am variables:\n\nplot_predictions(mod, condition = list(\"hp\", \"wt\" = \"threenum\", \"am\"))\n\n\n\n\nThis plot shows how the derivative of mpg with respect to am varies as a function of wt and hp:\n\nplot_slopes(mod, variables = \"am\", condition = list(\"hp\", \"wt\" = \"minmax\"))\n\n\n\n\nSee this vignette for more information: Plots, interactions, predictions, contrasts, and slopes"
},
{
"objectID": "articles/marginaleffects.html#averaging",
"href": "articles/marginaleffects.html#averaging",
"title": "\n1 Get Started\n",
"section": "\n1.4 Averaging",
- "text": "1.4 Averaging\nSince predictions, comparisons, and slopes are conditional quantities, they can be a bit unwieldy. Often, it can be useful to report a one-number summary instead of one estimate per observation. Instead of presenting “conditional” estimates, some methodologists recommend reporting “marginal” estimates, that is, an average of unit-level estimates.\n(This use of the word “marginal” as “averaging” should not be confused with the term “marginal effect” which, in the econometrics tradition, corresponds to a partial derivative, or the effect of a “small/marginal” change.)\nTo marginalize (average over) our unit-level estimates, we can use the by argument or the one of the convenience functions: avg_predictions(), avg_comparisons(), or avg_slopes(). For example, both of these commands give us the same result: the average predicted outcome in the mtcars dataset:\n\n\nR\nPython\n\n\n\n\navg_predictions(mod)\n\n\n Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n 20.1 0.39 51.5 <0.001 Inf 19.3 20.9\n\nColumns: estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\n\n\n\np = avg_predictions(mod)\nprint(p)\n\nshape: (1, 7)\n┌──────────┬───────────┬──────┬─────────┬─────┬──────┬───────┐\n│ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞══════════╪═══════════╪══════╪═════════╪═════╪══════╪═══════╡\n│ 20.1 ┆ 0.39 ┆ 51.5 ┆ 0 ┆ inf ┆ 19.3 ┆ 20.9 │\n└──────────┴───────────┴──────┴─────────┴─────┴──────┴───────┘\n\nColumns: estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nThis is equivalent to manual computation by:\n\n\nR\nPython\n\n\n\n\nmean(predict(mod))\n\n[1] 20.09062\n\n\n\n\n\nnp.mean(mod.predict())\n\n20.090625000000014\n\n\n\n\n\nThe main marginaleffects functions all include a by argument, which allows us to marginalize within sub-groups of the data. For example,\n\n\nR\nPython\n\n\n\n\navg_comparisons(mod, by = \"am\")\n\n\n Term Contrast am Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n am mean(1) - mean(0) 0 -1.3830 2.5250 -0.548 0.58388 0.8 -6.3319 3.56589\n am mean(1) - mean(0) 1 1.9029 2.3086 0.824 0.40980 1.3 -2.6219 6.42773\n hp mean(+1) 0 -0.0343 0.0159 -2.160 0.03079 5.0 -0.0654 -0.00317\n hp mean(+1) 1 -0.0436 0.0213 -2.050 0.04038 4.6 -0.0854 -0.00191\n wt mean(+1) 0 -2.4799 1.2316 -2.014 0.04406 4.5 -4.8939 -0.06595\n wt mean(+1) 1 -6.0718 1.9762 -3.072 0.00212 8.9 -9.9451 -2.19846\n\nColumns: term, contrast, am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\n\n\n\n\ncmp = avg_comparisons(mod, by = \"am\")\nprint(cmp)\n\nshape: (6, 10)\n┌─────┬──────┬───────────────────┬──────────┬───┬─────────┬───────┬─────────┬──────────┐\n│ am ┆ Term ┆ Contrast ┆ Estimate ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞═════╪══════╪═══════════════════╪══════════╪═══╪═════════╪═══════╪═════════╪══════════╡\n│ 1 ┆ wt ┆ +1 ┆ -6.07 ┆ … ┆ 0.00522 ┆ 7.58 ┆ -10.2 ┆ -1.99 │\n│ 0 ┆ wt ┆ +1 ┆ -2.48 ┆ … ┆ 0.0554 ┆ 4.17 ┆ -5.02 ┆ 0.0621 │\n│ 1 ┆ am ┆ mean(1) - mean(0) ┆ 1.9 ┆ … ┆ 0.418 ┆ 1.26 ┆ -2.86 ┆ 6.67 │\n│ 0 ┆ am ┆ mean(1) - mean(0) ┆ -1.38 ┆ … ┆ 0.589 ┆ 0.764 ┆ -6.59 ┆ 3.83 │\n│ 1 ┆ hp ┆ +1 ┆ -0.0436 ┆ … ┆ 0.0515 ┆ 4.28 ┆ -0.0876 ┆ 0.000301 │\n│ 0 ┆ hp ┆ +1 ┆ -0.0343 ┆ … ┆ 0.041 ┆ 4.61 ┆ -0.067 ┆ -0.00152 │\n└─────┴──────┴───────────────────┴──────────┴───┴─────────┴───────┴─────────┴──────────┘\n\nColumns: am, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nMarginal Means are a special case of predictions, which are marginalized (or averaged) across a balanced grid of categorical predictors. To illustrate, we estimate a new model with categorical predictors:\n\n\nR\nPython\n\n\n\n\ndat <- mtcars\ndat$am <- as.logical(dat$am)\ndat$cyl <- as.factor(dat$cyl)\nmod_cat <- lm(mpg ~ am + cyl + hp, data = dat)\n\n\n\n\ndat = mtcars \\\n .with_columns(pl.col(\"am\").cast(pl.Boolean),\n pl.col(\"cyl\").cast(pl.Utf8))\nmod_cat = smf.ols('mpg ~ am + cyl + hp', data=dat).fit()\n\n\n\n\nWe can compute marginal means manually using the functions already described:\n\n\nR\nPython\n\n\n\n\navg_predictions(\n mod_cat,\n newdata = datagrid(cyl = unique, am = unique),\n by = \"am\")\n\n\n am Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n FALSE 18.3 0.785 23.3 <0.001 397.4 16.8 19.9\n TRUE 22.5 0.834 26.9 <0.001 528.6 20.8 24.1\n\nColumns: am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\n\n\n\np = avg_predictions(\n mod_cat,\n newdata = datagrid(mod_cat, cyl = dat[\"cyl\"].unique(), am = dat[\"am\"].unique()),\n by = \"am\")\nprint(p)\n\nshape: (2, 8)\n┌───────┬──────────┬───────────┬──────┬─────────┬─────┬──────┬───────┐\n│ am ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ bool ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞═══════╪══════════╪═══════════╪══════╪═════════╪═════╪══════╪═══════╡\n│ true ┆ 22.5 ┆ 0.834 ┆ 26.9 ┆ 0 ┆ inf ┆ 20.8 ┆ 24.2 │\n│ false ┆ 18.3 ┆ 0.785 ┆ 23.3 ┆ 0 ┆ inf ┆ 16.7 ┆ 19.9 │\n└───────┴──────────┴───────────┴──────┴─────────┴─────┴──────┴───────┘\n\nColumns: am, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nFor convenience, the marginaleffects package for R also includes a marginal_means() function:\n\nmarginal_means(mod_cat, variables = \"am\")\n\n\n Term Value Mean Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n am FALSE 18.3 0.785 23.3 <0.001 397.4 16.8 19.9\n am TRUE 22.5 0.834 26.9 <0.001 528.6 20.8 24.1\n\nResults averaged over levels of: cyl, am \nColumns: term, value, am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\nThe Marginal Means vignette offers more detail."
+ "text": "1.4 Averaging\nSince predictions, comparisons, and slopes are conditional quantities, they can be a bit unwieldy. Often, it can be useful to report a one-number summary instead of one estimate per observation. Instead of presenting “conditional” estimates, some methodologists recommend reporting “marginal” estimates, that is, an average of unit-level estimates.\n(This use of the word “marginal” as “averaging” should not be confused with the term “marginal effect” which, in the econometrics tradition, corresponds to a partial derivative, or the effect of a “small/marginal” change.)\nTo marginalize (average over) our unit-level estimates, we can use the by argument or the one of the convenience functions: avg_predictions(), avg_comparisons(), or avg_slopes(). For example, both of these commands give us the same result: the average predicted outcome in the mtcars dataset:\n\n\nR\nPython\n\n\n\n\navg_predictions(mod)\n\n\n Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n 20.1 0.39 51.5 <0.001 Inf 19.3 20.9\n\nColumns: estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\n\n\n\np = avg_predictions(mod)\nprint(p)\n\nshape: (1, 7)\n┌──────────┬───────────┬──────┬─────────┬─────┬──────┬───────┐\n│ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞══════════╪═══════════╪══════╪═════════╪═════╪══════╪═══════╡\n│ 20.1 ┆ 0.39 ┆ 51.5 ┆ 0 ┆ inf ┆ 19.3 ┆ 20.9 │\n└──────────┴───────────┴──────┴─────────┴─────┴──────┴───────┘\n\nColumns: estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nThis is equivalent to manual computation by:\n\n\nR\nPython\n\n\n\n\nmean(predict(mod))\n\n[1] 20.09062\n\n\n\n\n\nnp.mean(mod.predict())\n\n20.090624999999992\n\n\n\n\n\nThe main marginaleffects functions all include a by argument, which allows us to marginalize within sub-groups of the data. For example,\n\n\nR\nPython\n\n\n\n\navg_comparisons(mod, by = \"am\")\n\n\n Term Contrast am Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n am mean(1) - mean(0) 0 -1.3830 2.5250 -0.548 0.58388 0.8 -6.3319 3.56589\n am mean(1) - mean(0) 1 1.9029 2.3086 0.824 0.40980 1.3 -2.6219 6.42773\n hp mean(+1) 0 -0.0343 0.0159 -2.160 0.03079 5.0 -0.0654 -0.00317\n hp mean(+1) 1 -0.0436 0.0213 -2.050 0.04039 4.6 -0.0854 -0.00191\n wt mean(+1) 0 -2.4799 1.2316 -2.014 0.04406 4.5 -4.8939 -0.06595\n wt mean(+1) 1 -6.0718 1.9762 -3.072 0.00212 8.9 -9.9451 -2.19846\n\nColumns: term, contrast, am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\n\n\n\n\ncmp = avg_comparisons(mod, by = \"am\")\nprint(cmp)\n\nshape: (6, 10)\n┌─────┬──────┬───────────────────┬──────────┬───┬─────────┬───────┬─────────┬──────────┐\n│ am ┆ Term ┆ Contrast ┆ Estimate ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞═════╪══════╪═══════════════════╪══════════╪═══╪═════════╪═══════╪═════════╪══════════╡\n│ 1 ┆ hp ┆ +1 ┆ -0.0436 ┆ … ┆ 0.0515 ┆ 4.28 ┆ -0.0876 ┆ 0.000301 │\n│ 0 ┆ hp ┆ +1 ┆ -0.0343 ┆ … ┆ 0.041 ┆ 4.61 ┆ -0.067 ┆ -0.00152 │\n│ 1 ┆ wt ┆ +1 ┆ -6.07 ┆ … ┆ 0.00522 ┆ 7.58 ┆ -10.2 ┆ -1.99 │\n│ 0 ┆ wt ┆ +1 ┆ -2.48 ┆ … ┆ 0.0554 ┆ 4.17 ┆ -5.02 ┆ 0.0621 │\n│ 1 ┆ am ┆ mean(1) - mean(0) ┆ 1.9 ┆ … ┆ 0.418 ┆ 1.26 ┆ -2.86 ┆ 6.67 │\n│ 0 ┆ am ┆ mean(1) - mean(0) ┆ -1.38 ┆ … ┆ 0.589 ┆ 0.764 ┆ -6.59 ┆ 3.83 │\n└─────┴──────┴───────────────────┴──────────┴───┴─────────┴───────┴─────────┴──────────┘\n\nColumns: am, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nMarginal Means are a special case of predictions, which are marginalized (or averaged) across a balanced grid of categorical predictors. To illustrate, we estimate a new model with categorical predictors:\n\n\nR\nPython\n\n\n\n\ndat <- mtcars\ndat$am <- as.logical(dat$am)\ndat$cyl <- as.factor(dat$cyl)\nmod_cat <- lm(mpg ~ am + cyl + hp, data = dat)\n\n\n\n\ndat = mtcars \\\n .with_columns(pl.col(\"am\").cast(pl.Boolean),\n pl.col(\"cyl\").cast(pl.Utf8))\nmod_cat = smf.ols('mpg ~ am + cyl + hp', data=dat).fit()\n\n\n\n\nWe can compute marginal means manually using the functions already described:\n\n\nR\nPython\n\n\n\n\navg_predictions(\n mod_cat,\n newdata = datagrid(cyl = unique, am = unique),\n by = \"am\")\n\n\n am Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n FALSE 18.3 0.785 23.3 <0.001 397.4 16.8 19.9\n TRUE 22.5 0.834 26.9 <0.001 528.6 20.8 24.1\n\nColumns: am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\n\n\n\np = avg_predictions(\n mod_cat,\n newdata = datagrid(mod_cat, cyl = dat[\"cyl\"].unique(), am = dat[\"am\"].unique()),\n by = \"am\")\nprint(p)\n\nshape: (2, 8)\n┌───────┬──────────┬───────────┬──────┬─────────┬─────┬──────┬───────┐\n│ am ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ bool ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞═══════╪══════════╪═══════════╪══════╪═════════╪═════╪══════╪═══════╡\n│ true ┆ 22.5 ┆ 0.834 ┆ 26.9 ┆ 0 ┆ inf ┆ 20.8 ┆ 24.2 │\n│ false ┆ 18.3 ┆ 0.785 ┆ 23.3 ┆ 0 ┆ inf ┆ 16.7 ┆ 19.9 │\n└───────┴──────────┴───────────┴──────┴─────────┴─────┴──────┴───────┘\n\nColumns: am, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nFor convenience, the marginaleffects package for R also includes a marginal_means() function:\n\nmarginal_means(mod_cat, variables = \"am\")\n\n\n Term Value Mean Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n am FALSE 18.3 0.785 23.3 <0.001 397.4 16.8 19.9\n am TRUE 22.5 0.834 26.9 <0.001 528.6 20.8 24.1\n\nResults averaged over levels of: cyl, am \nColumns: term, value, am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\nThe Marginal Means vignette offers more detail."
},
{
"objectID": "articles/marginaleffects.html#hypothesis-and-equivalence-tests",
"href": "articles/marginaleffects.html#hypothesis-and-equivalence-tests",
"title": "\n1 Get Started\n",
"section": "\n1.5 Hypothesis and equivalence tests",
- "text": "1.5 Hypothesis and equivalence tests\nThe hypotheses() function and the hypothesis argument can be used to conduct linear and non-linear hypothesis tests on model coefficients, or on any of the quantities computed by the functions introduced above.\nConsider this model:\n\n\nR\nPython\n\n\n\n\nmod <- lm(mpg ~ qsec * drat, data = mtcars)\ncoef(mod)\n\n(Intercept) qsec drat qsec:drat \n 12.3371987 -1.0241183 -3.4371461 0.5973153 \n\n\n\n\n\nmod = smf.ols('mpg ~ qsec * drat', data=mtcars).fit()\nprint(mod.params)\n\nIntercept 12.337199\nqsec -1.024118\ndrat -3.437146\nqsec:drat 0.597315\ndtype: float64\n\n\n\n\n\nCan we reject the null hypothesis that the drat coefficient is 2 times the size of the qsec coefficient?\n\nhypotheses(mod, \"drat = 2 * qsec\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n drat = 2 * qsec -1.39 10.8 -0.129 0.897 0.2 -22.5 19.7\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n\n\nWe can ask the same question but refer to parameters by position, with indices b1, b2, b3, etc.:\n\n\nR\nPython\n\n\n\n\nhypotheses(mod, \"b3 = 2 * b2\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n b3 = 2 * b2 -1.39 10.8 -0.129 0.897 0.2 -22.5 19.7\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n\n\n\n\n\nh = hypotheses(mod, \"b3 = 2 * b2\")\nprint(h)\n\nshape: (1, 8)\n┌─────────┬──────────┬───────────┬────────┬─────────┬───────┬───────┬───────┐\n│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞═════════╪══════════╪═══════════╪════════╪═════════╪═══════╪═══════╪═══════╡\n│ b3=2*b2 ┆ -1.39 ┆ 10.8 ┆ -0.129 ┆ 0.898 ┆ 0.155 ┆ -23.5 ┆ 20.7 │\n└─────────┴──────────┴───────────┴────────┴─────────┴───────┴───────┴───────┘\n\nColumns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nThe main functions in marginaleffects all have a hypothesis argument, which means that we can do complex model testing. For example, consider two slope estimates:\n\n\nR\nPython\n\n\n\n\nslopes(\n mod,\n variables = \"drat\",\n newdata = datagrid(qsec = range))\n\n\n Term qsec Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n drat 14.5 5.22 3.79 1.38 0.1682 2.6 -2.206 12.7\n drat 22.9 10.24 5.16 1.98 0.0472 4.4 0.127 20.4\n\nColumns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, qsec, predicted_lo, predicted_hi, predicted, mpg, drat \nType: response \n\n\n\n\n\ns = slopes(\n mod,\n variables = \"drat\",\n newdata = datagrid(mod, qsec = [mtcars[\"qsec\"].min(), mtcars[\"qsec\"].max()]))\nprint(s)\n\nshape: (2, 9)\n┌──────┬──────────┬──────────┬───────────┬───┬─────────┬──────┬────────┬───────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪══════════╪══════════╪═══════════╪═══╪═════════╪══════╪════════╪═══════╡\n│ drat ┆ dY/dX ┆ 5.22 ┆ 3.81 ┆ … ┆ 0.181 ┆ 2.47 ┆ -2.57 ┆ 13 │\n│ drat ┆ dY/dX ┆ 10.2 ┆ 5.16 ┆ … ┆ 0.057 ┆ 4.13 ┆ -0.328 ┆ 20.8 │\n└──────┴──────────┴──────────┴───────────┴───┴─────────┴──────┴────────┴───────┘\n\nColumns: rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, qsec, rownames, mpg, cyl, disp, hp, drat, wt, vs, am, gear, carb\n\n\n\n\n\nAre these two slopes significantly different from one another? To test this, we can use the hypothesis argument:\n\n\nR\nPython\n\n\n\n\nslopes(\n mod,\n hypothesis = \"b1 = b2\",\n variables = \"drat\",\n newdata = datagrid(qsec = range))\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n b1=b2 -5.02 8.52 -0.589 0.556 0.8 -21.7 11.7\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\n\n\n\ns = slopes(\n mod,\n hypothesis = \"b1 = b2\",\n variables = \"drat\",\n newdata = datagrid(mod, qsec = [mtcars[\"qsec\"].min(), mtcars[\"qsec\"].max()]))\nprint(s)\n\nshape: (1, 8)\n┌───────┬──────────┬───────────┬────────┬─────────┬───────┬───────┬───────┐\n│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞═══════╪══════════╪═══════════╪════════╪═════════╪═══════╪═══════╪═══════╡\n│ b1=b2 ┆ -5.02 ┆ 8.53 ┆ -0.588 ┆ 0.561 ┆ 0.833 ┆ -22.5 ┆ 12.5 │\n└───────┴──────────┴───────────┴────────┴─────────┴───────┴───────┴───────┘\n\nColumns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nAlternatively, we can also refer to values with term names (when they are unique):\n\n\nR\nPython\n\n\n\n\navg_slopes(mod)\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n drat 7.22 1.365 5.29 < 0.001 23.0 4.549 9.90\n qsec 1.12 0.433 2.59 0.00947 6.7 0.275 1.97\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_slopes(mod, hypothesis = \"drat = qsec\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n drat=qsec 6.1 1.45 4.2 <0.001 15.2 3.25 8.95\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\n\n\n\ns = avg_slopes(mod)\nprint(s)\n\nshape: (2, 9)\n┌──────┬─────────────┬──────────┬───────────┬───┬──────────┬──────┬───────┬───────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪═════════════╪══════════╪═══════════╪═══╪══════════╪══════╪═══════╪═══════╡\n│ drat ┆ mean(dY/dX) ┆ 7.22 ┆ 1.37 ┆ … ┆ 1.25e-05 ┆ 16.3 ┆ 4.43 ┆ 10 │\n│ qsec ┆ mean(dY/dX) ┆ 1.12 ┆ 0.435 ┆ … ┆ 0.0152 ┆ 6.04 ┆ 0.234 ┆ 2.01 │\n└──────┴─────────────┴──────────┴───────────┴───┴──────────┴──────┴───────┴───────┘\n\nColumns: term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\ns = avg_slopes(mod, hypothesis = \"drat = qsec\")\nprint(s)\n\nshape: (1, 8)\n┌───────────┬──────────┬───────────┬─────┬──────────┬─────┬──────┬───────┐\n│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞═══════════╪══════════╪═══════════╪═════╪══════════╪═════╪══════╪═══════╡\n│ drat=qsec ┆ 6.1 ┆ 1.45 ┆ 4.2 ┆ 0.000245 ┆ 12 ┆ 3.13 ┆ 9.07 │\n└───────────┴──────────┴───────────┴─────┴──────────┴─────┴──────┴───────┘\n\nColumns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nNow, imagine that for theoretical (or substantive or clinical) reasons, we only care about slopes larger than 2. We can use the equivalence argument to conduct an equivalence test:\n\n\nR\nPython\n\n\n\n\navg_slopes(mod, equivalence = c(-2, 2))\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % p (NonSup) p (NonInf) p (Equiv)\n drat 7.22 1.365 5.29 < 0.001 23.0 4.549 9.90 0.9999 <0.001 0.9999\n qsec 1.12 0.433 2.59 0.00947 6.7 0.275 1.97 0.0216 <0.001 0.0216\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, statistic.noninf, statistic.nonsup, p.value.noninf, p.value.nonsup, p.value.equiv \nType: response \n\n\n\n\n\ns = avg_slopes(mod, equivalence = [-2., 2.])\nprint(s)\n\nshape: (2, 9)\n┌──────┬─────────────┬──────────┬───────────┬───┬──────────┬──────┬───────┬───────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪═════════════╪══════════╪═══════════╪═══╪══════════╪══════╪═══════╪═══════╡\n│ qsec ┆ mean(dY/dX) ┆ 1.12 ┆ 0.435 ┆ … ┆ 0.0152 ┆ 6.04 ┆ 0.234 ┆ 2.01 │\n│ drat ┆ mean(dY/dX) ┆ 7.22 ┆ 1.37 ┆ … ┆ 1.25e-05 ┆ 16.3 ┆ 4.43 ┆ 10 │\n└──────┴─────────────┴──────────┴───────────┴───┴──────────┴──────┴───────┴───────┘\n\nColumns: term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, statistic_noninf, statistic_nonsup, p_value_noninf, p_value_nonsup, p_value_equiv\n\n\n\n\n\nSee the Hypothesis Tests and Custom Contrasts vignette for background, details, and for instructions on how to conduct hypothesis tests in more complex situations."
+ "text": "1.5 Hypothesis and equivalence tests\nThe hypotheses() function and the hypothesis argument can be used to conduct linear and non-linear hypothesis tests on model coefficients, or on any of the quantities computed by the functions introduced above.\nConsider this model:\n\n\nR\nPython\n\n\n\n\nmod <- lm(mpg ~ qsec * drat, data = mtcars)\ncoef(mod)\n\n(Intercept) qsec drat qsec:drat \n 12.3371987 -1.0241183 -3.4371461 0.5973153 \n\n\n\n\n\nmod = smf.ols('mpg ~ qsec * drat', data=mtcars).fit()\nprint(mod.params)\n\nIntercept 12.337199\nqsec -1.024118\ndrat -3.437146\nqsec:drat 0.597315\ndtype: float64\n\n\n\n\n\nCan we reject the null hypothesis that the drat coefficient is 2 times the size of the qsec coefficient?\n\nhypotheses(mod, \"drat = 2 * qsec\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n drat = 2 * qsec -1.39 10.8 -0.129 0.897 0.2 -22.5 19.7\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n\n\nWe can ask the same question but refer to parameters by position, with indices b1, b2, b3, etc.:\n\n\nR\nPython\n\n\n\n\nhypotheses(mod, \"b3 = 2 * b2\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n b3 = 2 * b2 -1.39 10.8 -0.129 0.897 0.2 -22.5 19.7\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n\n\n\n\n\nh = hypotheses(mod, \"b3 = 2 * b2\")\nprint(h)\n\nshape: (1, 8)\n┌─────────┬──────────┬───────────┬────────┬─────────┬───────┬───────┬───────┐\n│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞═════════╪══════════╪═══════════╪════════╪═════════╪═══════╪═══════╪═══════╡\n│ b3=2*b2 ┆ -1.39 ┆ 10.8 ┆ -0.129 ┆ 0.898 ┆ 0.155 ┆ -23.5 ┆ 20.7 │\n└─────────┴──────────┴───────────┴────────┴─────────┴───────┴───────┴───────┘\n\nColumns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nThe main functions in marginaleffects all have a hypothesis argument, which means that we can do complex model testing. For example, consider two slope estimates:\n\n\nR\nPython\n\n\n\n\nslopes(\n mod,\n variables = \"drat\",\n newdata = datagrid(qsec = range))\n\n\n Term qsec Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n drat 14.5 5.22 3.80 1.38 0.1690 2.6 -2.221 12.7\n drat 22.9 10.24 5.15 1.99 0.0469 4.4 0.142 20.3\n\nColumns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, qsec, predicted_lo, predicted_hi, predicted, mpg, drat \nType: response \n\n\n\n\n\ns = slopes(\n mod,\n variables = \"drat\",\n newdata = datagrid(mod, qsec = [mtcars[\"qsec\"].min(), mtcars[\"qsec\"].max()]))\nprint(s)\n\nshape: (2, 9)\n┌──────┬──────────┬──────────┬───────────┬───┬─────────┬──────┬────────┬───────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪══════════╪══════════╪═══════════╪═══╪═════════╪══════╪════════╪═══════╡\n│ drat ┆ dY/dX ┆ 5.22 ┆ 3.8 ┆ … ┆ 0.18 ┆ 2.47 ┆ -2.56 ┆ 13 │\n│ drat ┆ dY/dX ┆ 10.2 ┆ 5.16 ┆ … ┆ 0.0573 ┆ 4.13 ┆ -0.338 ┆ 20.8 │\n└──────┴──────────┴──────────┴───────────┴───┴─────────┴──────┴────────┴───────┘\n\nColumns: rowid, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, predicted, predicted_lo, predicted_hi, qsec, rownames, mpg, cyl, disp, hp, drat, wt, vs, am, gear, carb\n\n\n\n\n\nAre these two slopes significantly different from one another? To test this, we can use the hypothesis argument:\n\n\nR\nPython\n\n\n\n\nslopes(\n mod,\n hypothesis = \"b1 = b2\",\n variables = \"drat\",\n newdata = datagrid(qsec = range))\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n b1=b2 -5.02 8.52 -0.589 0.556 0.8 -21.7 11.7\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\n\n\n\ns = slopes(\n mod,\n hypothesis = \"b1 = b2\",\n variables = \"drat\",\n newdata = datagrid(mod, qsec = [mtcars[\"qsec\"].min(), mtcars[\"qsec\"].max()]))\nprint(s)\n\nshape: (1, 8)\n┌───────┬──────────┬───────────┬────────┬─────────┬───────┬───────┬───────┐\n│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞═══════╪══════════╪═══════════╪════════╪═════════╪═══════╪═══════╪═══════╡\n│ b1=b2 ┆ -5.02 ┆ 8.53 ┆ -0.588 ┆ 0.561 ┆ 0.834 ┆ -22.5 ┆ 12.5 │\n└───────┴──────────┴───────────┴────────┴─────────┴───────┴───────┴───────┘\n\nColumns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nAlternatively, we can also refer to values with term names (when they are unique):\n\n\nR\nPython\n\n\n\n\navg_slopes(mod)\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n drat 7.22 1.365 5.29 < 0.001 23.0 4.549 9.90\n qsec 1.12 0.433 2.60 0.00942 6.7 0.276 1.97\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_slopes(mod, hypothesis = \"drat = qsec\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n drat=qsec 6.1 1.45 4.2 <0.001 15.2 3.25 8.95\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\n\n\n\ns = avg_slopes(mod)\nprint(s)\n\nshape: (2, 9)\n┌──────┬─────────────┬──────────┬───────────┬───┬──────────┬──────┬───────┬───────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪═════════════╪══════════╪═══════════╪═══╪══════════╪══════╪═══════╪═══════╡\n│ qsec ┆ mean(dY/dX) ┆ 1.12 ┆ 0.432 ┆ … ┆ 0.0147 ┆ 6.09 ┆ 0.239 ┆ 2.01 │\n│ drat ┆ mean(dY/dX) ┆ 7.22 ┆ 1.37 ┆ … ┆ 1.25e-05 ┆ 16.3 ┆ 4.43 ┆ 10 │\n└──────┴─────────────┴──────────┴───────────┴───┴──────────┴──────┴───────┴───────┘\n\nColumns: term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\ns = avg_slopes(mod, hypothesis = \"drat = qsec\")\nprint(s)\n\nshape: (1, 8)\n┌───────────┬──────────┬───────────┬─────┬──────────┬─────┬──────┬───────┐\n│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n╞═══════════╪══════════╪═══════════╪═════╪══════════╪═════╪══════╪═══════╡\n│ drat=qsec ┆ 6.1 ┆ 1.45 ┆ 4.2 ┆ 0.000245 ┆ 12 ┆ 3.13 ┆ 9.07 │\n└───────────┴──────────┴───────────┴─────┴──────────┴─────┴──────┴───────┘\n\nColumns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high\n\n\n\n\n\nNow, imagine that for theoretical (or substantive or clinical) reasons, we only care about slopes larger than 2. We can use the equivalence argument to conduct an equivalence test:\n\n\nR\nPython\n\n\n\n\navg_slopes(mod, equivalence = c(-2, 2))\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % p (NonSup) p (NonInf) p (Equiv)\n drat 7.22 1.365 5.29 < 0.001 23.0 4.549 9.90 0.9999 <0.001 0.9999\n qsec 1.12 0.433 2.60 0.00942 6.7 0.276 1.97 0.0215 <0.001 0.0215\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, statistic.noninf, statistic.nonsup, p.value.noninf, p.value.nonsup, p.value.equiv \nType: response \n\n\n\n\n\ns = avg_slopes(mod, equivalence = [-2., 2.])\nprint(s)\n\nshape: (2, 9)\n┌──────┬─────────────┬──────────┬───────────┬───┬──────────┬──────┬───────┬───────┐\n│ Term ┆ Contrast ┆ Estimate ┆ Std.Error ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n╞══════╪═════════════╪══════════╪═══════════╪═══╪══════════╪══════╪═══════╪═══════╡\n│ drat ┆ mean(dY/dX) ┆ 7.22 ┆ 1.37 ┆ … ┆ 1.25e-05 ┆ 16.3 ┆ 4.43 ┆ 10 │\n│ qsec ┆ mean(dY/dX) ┆ 1.12 ┆ 0.432 ┆ … ┆ 0.0147 ┆ 6.09 ┆ 0.239 ┆ 2.01 │\n└──────┴─────────────┴──────────┴───────────┴───┴──────────┴──────┴───────┴───────┘\n\nColumns: term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high, statistic_noninf, statistic_nonsup, p_value_noninf, p_value_nonsup, p_value_equiv\n\n\n\n\n\nSee the Hypothesis Tests and Custom Contrasts vignette for background, details, and for instructions on how to conduct hypothesis tests in more complex situations."
},
{
"objectID": "articles/marginaleffects.html#more",
@@ -207,7 +207,7 @@
"href": "articles/slopes.html#slopes-function",
"title": "\n4 Slopes\n",
"section": "\n4.2 slopes() function",
- "text": "4.2 slopes() function\nThe marginal effect is a unit-level measure of association between changes in a regressor and changes in the response. Except in the simplest linear models, the value of the marginal effect will be different from individual to individual, because it will depend on the values of the other covariates for each individual.\nThe slopes() function thus produces distinct estimates of the marginal effect for each row of the data used to fit the model. The output of marginaleffects is a simple data.frame, which can be inspected with all the usual R commands.\nTo show this, we load the library, download the Palmer Penguins, and estimate a GLM model:\n\nlibrary(marginaleffects)\n\ndat <- read.csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv\")\ndat$large_penguin <- ifelse(dat$body_mass_g > median(dat$body_mass_g, na.rm = TRUE), 1, 0)\n\nmod <- glm(large_penguin ~ bill_length_mm + flipper_length_mm + species,\n data = dat, family = binomial)\n\n\nmfx <- slopes(mod)\nhead(mfx)\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm dY/dX 0.0176 0.00837 2.11 0.03520 4.8 0.00122 0.0340\n#> bill_length_mm dY/dX 0.0359 0.01236 2.90 0.00371 8.1 0.01164 0.0601\n#> bill_length_mm dY/dX 0.0844 0.02110 4.00 < 0.001 14.0 0.04309 0.1258\n#> bill_length_mm dY/dX 0.0347 0.00642 5.41 < 0.001 23.9 0.02214 0.0473\n#> bill_length_mm dY/dX 0.0509 0.01351 3.77 < 0.001 12.6 0.02441 0.0774\n#> bill_length_mm dY/dX 0.0165 0.00778 2.12 0.03367 4.9 0.00128 0.0318\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, large_penguin, bill_length_mm, flipper_length_mm, species \n#> Type: response"
+ "text": "4.2 slopes() function\nThe marginal effect is a unit-level measure of association between changes in a regressor and changes in the response. Except in the simplest linear models, the value of the marginal effect will be different from individual to individual, because it will depend on the values of the other covariates for each individual.\nThe slopes() function thus produces distinct estimates of the marginal effect for each row of the data used to fit the model. The output of marginaleffects is a simple data.frame, which can be inspected with all the usual R commands.\nTo show this, we load the library, download the Palmer Penguins, and estimate a GLM model:\n\nlibrary(marginaleffects)\n\ndat <- read.csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv\")\ndat$large_penguin <- ifelse(dat$body_mass_g > median(dat$body_mass_g, na.rm = TRUE), 1, 0)\n\nmod <- glm(large_penguin ~ bill_length_mm + flipper_length_mm + species,\n data = dat, family = binomial)\n\n\nmfx <- slopes(mod)\nhead(mfx)\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm dY/dX 0.0176 0.00837 2.11 0.03520 4.8 0.00122 0.0340\n#> bill_length_mm dY/dX 0.0359 0.01236 2.90 0.00371 8.1 0.01164 0.0601\n#> bill_length_mm dY/dX 0.0844 0.02110 4.00 < 0.001 14.0 0.04309 0.1258\n#> bill_length_mm dY/dX 0.0347 0.00642 5.41 < 0.001 23.9 0.02214 0.0473\n#> bill_length_mm dY/dX 0.0509 0.01352 3.77 < 0.001 12.6 0.02440 0.0774\n#> bill_length_mm dY/dX 0.0165 0.00778 2.12 0.03367 4.9 0.00128 0.0318\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, large_penguin, bill_length_mm, flipper_length_mm, species \n#> Type: response"
},
{
"objectID": "articles/slopes.html#the-marginal-effects-zoo",
@@ -221,7 +221,7 @@
"href": "articles/slopes.html#average-marginal-effect-ame",
"title": "\n4 Slopes\n",
"section": "\n4.4 Average Marginal Effect (AME)",
- "text": "4.4 Average Marginal Effect (AME)\nA dataset with one marginal effect estimate per unit of observation is a bit unwieldy and difficult to interpret. Many analysts like to report the “Average Marginal Effect”, that is, the average of all the observation-specific marginal effects. These are easy to compute based on the full data.frame shown above, but the avg_slopes() function is convenient:\n\navg_slopes(mod)\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm dY/dX 0.0276 0.00578 4.773 <0.001 19.1 0.01625 0.0389\n#> flipper_length_mm dY/dX 0.0106 0.00235 4.512 <0.001 17.2 0.00599 0.0152\n#> species Chinstrap - Adelie -0.4148 0.05654 -7.336 <0.001 42.0 -0.52561 -0.3040\n#> species Gentoo - Adelie 0.0617 0.10688 0.577 0.564 0.8 -0.14779 0.2712\n#> \n#> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\nNote that since marginal effects are derivatives, they are only properly defined for continuous numeric variables. When the model also includes categorical regressors, the summary function will try to display relevant (regression-adjusted) contrasts between different categories, as shown above.\nYou can also extract average marginal effects using tidy and glance methods which conform to the broom package specification:\n\ntidy(mfx)\n#> # A tibble: 4 × 8\n#> term contrast estimate std.error statistic p.value conf.low conf.high\n#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n#> 1 bill_length_mm mean(dY/dX) 0.0276 0.00578 4.77 1.81e- 6 0.0163 0.0389\n#> 2 flipper_length_mm mean(dY/dX) 0.0106 0.00235 4.51 6.42e- 6 0.00599 0.0152\n#> 3 species mean(Chinstrap) - mean(Adelie) -0.415 0.0565 -7.34 2.20e-13 -0.526 -0.304 \n#> 4 species mean(Gentoo) - mean(Adelie) 0.0617 0.107 0.577 5.64e- 1 -0.148 0.271\n\nglance(mfx)\n#> # A tibble: 1 × 7\n#> aic bic r2.tjur rmse nobs F logLik \n#> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <logLik> \n#> 1 180. 199. 0.695 0.276 342 15.7 -84.92257"
+ "text": "4.4 Average Marginal Effect (AME)\nA dataset with one marginal effect estimate per unit of observation is a bit unwieldy and difficult to interpret. Many analysts like to report the “Average Marginal Effect”, that is, the average of all the observation-specific marginal effects. These are easy to compute based on the full data.frame shown above, but the avg_slopes() function is convenient:\n\navg_slopes(mod)\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm dY/dX 0.0276 0.00578 4.773 <0.001 19.1 0.01625 0.0389\n#> flipper_length_mm dY/dX 0.0106 0.00235 4.512 <0.001 17.3 0.00599 0.0152\n#> species Chinstrap - Adelie -0.4148 0.05654 -7.336 <0.001 42.0 -0.52561 -0.3040\n#> species Gentoo - Adelie 0.0617 0.10688 0.577 0.564 0.8 -0.14779 0.2712\n#> \n#> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\nNote that since marginal effects are derivatives, they are only properly defined for continuous numeric variables. When the model also includes categorical regressors, the summary function will try to display relevant (regression-adjusted) contrasts between different categories, as shown above.\nYou can also extract average marginal effects using tidy and glance methods which conform to the broom package specification:\n\ntidy(mfx)\n#> # A tibble: 4 × 8\n#> term contrast estimate std.error statistic p.value conf.low conf.high\n#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n#> 1 bill_length_mm mean(dY/dX) 0.0276 0.00578 4.77 1.82e- 6 0.0163 0.0389\n#> 2 flipper_length_mm mean(dY/dX) 0.0106 0.00235 4.51 6.42e- 6 0.00599 0.0152\n#> 3 species mean(Chinstrap) - mean(Adelie) -0.415 0.0565 -7.34 2.20e-13 -0.526 -0.304 \n#> 4 species mean(Gentoo) - mean(Adelie) 0.0617 0.107 0.577 5.64e- 1 -0.148 0.271\n\nglance(mfx)\n#> # A tibble: 1 × 7\n#> aic bic r2.tjur rmse nobs F logLik \n#> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <logLik> \n#> 1 180. 199. 0.695 0.276 342 15.7 -84.92257"
},
{
"objectID": "articles/slopes.html#group-average-marginal-effect-g-ame",
@@ -235,14 +235,14 @@
"href": "articles/slopes.html#marginal-effect-at-user-specified-values",
"title": "\n4 Slopes\n",
"section": "\n4.6 Marginal Effect at User-Specified Values",
- "text": "4.6 Marginal Effect at User-Specified Values\nSometimes, we are not interested in all the unit-specific marginal effects, but would rather look at the estimated marginal effects for certain “typical” individuals, or for user-specified values of the regressors. The datagrid() function helps us build a data grid full of “typical” rows. For example, to generate artificial Adelies and Gentoos with 180mm flippers:\n\ndatagrid(flipper_length_mm = 180,\n species = c(\"Adelie\", \"Gentoo\"),\n model = mod)\n#> large_penguin bill_length_mm flipper_length_mm species\n#> 1 0.4853801 43.92193 180 Adelie\n#> 2 0.4853801 43.92193 180 Gentoo\n\nThe same command can be used (omitting the model argument) to marginaleffects’s newdata argument to compute marginal effects for those (fictional) individuals:\n\nslopes(\n mod,\n newdata = datagrid(\n flipper_length_mm = 180,\n species = c(\"Adelie\", \"Gentoo\")))\n#> \n#> Term Contrast flipper_length_mm species Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm dY/dX 180 Adelie 0.0607 0.03322 1.827 0.0677 3.9 -0.00441 0.12580\n#> bill_length_mm dY/dX 180 Gentoo 0.0847 0.03922 2.159 0.0309 5.0 0.00779 0.16155\n#> flipper_length_mm dY/dX 180 Adelie 0.0233 0.00551 4.231 <0.001 15.4 0.01250 0.03408\n#> flipper_length_mm dY/dX 180 Gentoo 0.0325 0.00850 3.823 <0.001 12.9 0.01584 0.04916\n#> species Chinstrap - Adelie 180 Adelie -0.2111 0.10668 -1.978 0.0479 4.4 -0.42013 -0.00197\n#> species Chinstrap - Adelie 180 Gentoo -0.2111 0.10668 -1.978 0.0479 4.4 -0.42013 -0.00197\n#> species Gentoo - Adelie 180 Adelie 0.1591 0.30225 0.526 0.5986 0.7 -0.43328 0.75152\n#> species Gentoo - Adelie 180 Gentoo 0.1591 0.30225 0.526 0.5986 0.7 -0.43328 0.75152\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, flipper_length_mm, species, predicted_lo, predicted_hi, predicted, large_penguin, bill_length_mm \n#> Type: response\n\nWhen variables are omitted from the datagrid() call, they will automatically be set at their mean or mode (depending on variable type)."
+ "text": "4.6 Marginal Effect at User-Specified Values\nSometimes, we are not interested in all the unit-specific marginal effects, but would rather look at the estimated marginal effects for certain “typical” individuals, or for user-specified values of the regressors. The datagrid() function helps us build a data grid full of “typical” rows. For example, to generate artificial Adelies and Gentoos with 180mm flippers:\n\ndatagrid(flipper_length_mm = 180,\n species = c(\"Adelie\", \"Gentoo\"),\n model = mod)\n#> large_penguin bill_length_mm flipper_length_mm species\n#> 1 0.4853801 43.92193 180 Adelie\n#> 2 0.4853801 43.92193 180 Gentoo\n\nThe same command can be used (omitting the model argument) to marginaleffects’s newdata argument to compute marginal effects for those (fictional) individuals:\n\nslopes(\n mod,\n newdata = datagrid(\n flipper_length_mm = 180,\n species = c(\"Adelie\", \"Gentoo\")))\n#> \n#> Term Contrast flipper_length_mm species Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm dY/dX 180 Adelie 0.0607 0.03322 1.827 0.0677 3.9 -0.00442 0.12580\n#> bill_length_mm dY/dX 180 Gentoo 0.0847 0.03923 2.158 0.0309 5.0 0.00778 0.16156\n#> flipper_length_mm dY/dX 180 Adelie 0.0233 0.00551 4.231 <0.001 15.4 0.01250 0.03408\n#> flipper_length_mm dY/dX 180 Gentoo 0.0325 0.00850 3.822 <0.001 12.9 0.01583 0.04917\n#> species Chinstrap - Adelie 180 Adelie -0.2111 0.10668 -1.978 0.0479 4.4 -0.42013 -0.00197\n#> species Chinstrap - Adelie 180 Gentoo -0.2111 0.10668 -1.978 0.0479 4.4 -0.42013 -0.00197\n#> species Gentoo - Adelie 180 Adelie 0.1591 0.30225 0.526 0.5986 0.7 -0.43328 0.75152\n#> species Gentoo - Adelie 180 Gentoo 0.1591 0.30225 0.526 0.5986 0.7 -0.43328 0.75152\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, flipper_length_mm, species, predicted_lo, predicted_hi, predicted, large_penguin, bill_length_mm \n#> Type: response\n\nWhen variables are omitted from the datagrid() call, they will automatically be set at their mean or mode (depending on variable type)."
},
{
"objectID": "articles/slopes.html#marginal-effect-at-the-mean-mem",
"href": "articles/slopes.html#marginal-effect-at-the-mean-mem",
"title": "\n4 Slopes\n",
"section": "\n4.7 Marginal Effect at the Mean (MEM)",
- "text": "4.7 Marginal Effect at the Mean (MEM)\nThe “Marginal Effect at the Mean” is a marginal effect calculated for a hypothetical observation where each regressor is set at its mean or mode. By default, the datagrid() function that we used in the previous section sets all regressors to their means or modes. To calculate the MEM, we can set the newdata argument, which determines the values of predictors at which we want to compute marginal effects:\n\nslopes(mod, newdata = \"mean\")\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm dY/dX 0.0502 0.01244 4.038 <0.001 14.2 0.02586 0.0746\n#> flipper_length_mm dY/dX 0.0193 0.00553 3.489 <0.001 11.0 0.00845 0.0301\n#> species Chinstrap - Adelie -0.8070 0.07690 -10.494 <0.001 83.2 -0.95776 -0.6563\n#> species Gentoo - Adelie 0.0829 0.11469 0.722 0.47 1.1 -0.14193 0.3076\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, large_penguin, bill_length_mm, flipper_length_mm, species \n#> Type: response"
+ "text": "4.7 Marginal Effect at the Mean (MEM)\nThe “Marginal Effect at the Mean” is a marginal effect calculated for a hypothetical observation where each regressor is set at its mean or mode. By default, the datagrid() function that we used in the previous section sets all regressors to their means or modes. To calculate the MEM, we can set the newdata argument, which determines the values of predictors at which we want to compute marginal effects:\n\nslopes(mod, newdata = \"mean\")\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm dY/dX 0.0502 0.01244 4.038 <0.001 14.2 0.02585 0.0746\n#> flipper_length_mm dY/dX 0.0193 0.00553 3.489 <0.001 11.0 0.00845 0.0301\n#> species Chinstrap - Adelie -0.8070 0.07690 -10.494 <0.001 83.2 -0.95776 -0.6563\n#> species Gentoo - Adelie 0.0829 0.11469 0.722 0.47 1.1 -0.14193 0.3076\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, large_penguin, bill_length_mm, flipper_length_mm, species \n#> Type: response"
},
{
"objectID": "articles/slopes.html#counterfactual-marginal-effects",
@@ -263,14 +263,14 @@
"href": "articles/slopes.html#example-quadratic",
"title": "\n4 Slopes\n",
"section": "\n4.10 Example: Quadratic",
- "text": "4.10 Example: Quadratic\nIn the “Definition” section of this vignette, we considered how marginal effects can be computed analytically in a simple quadratic equation context. We can now use the slopes() function to replicate our analysis of the quadratic function in a regression application.\nSay you estimate a linear regression model with a quadratic term:\n\\[Y = \\beta_0 + \\beta_1 X^2 + \\varepsilon\\]\nand obtain estimates of \\(\\beta_0=1\\) and \\(\\beta_1=2\\). Taking the partial derivative with respect to \\(X\\) and plugging in our estimates gives us the marginal effect of \\(X\\) on \\(Y\\):\n\\[\\partial Y / \\partial X = \\beta_0 + 2 \\cdot \\beta_1 X\\] \\[\\partial Y / \\partial X = 1 + 4X\\]\nThis result suggests that the effect of a change in \\(X\\) on \\(Y\\) depends on the level of \\(X\\). When \\(X\\) is large and positive, an increase in \\(X\\) is associated to a large increase in \\(Y\\). When \\(X\\) is small and positive, an increase in \\(X\\) is associated to a small increase in \\(Y\\). When \\(X\\) is a large negative value, an increase in \\(X\\) is associated with a decrease in \\(Y\\).\nmarginaleffects arrives at the same conclusion in simulated data:\n\nlibrary(tidyverse)\nN <- 1e5\nquad <- data.frame(x = rnorm(N))\nquad$y <- 1 + 1 * quad$x + 2 * quad$x^2 + rnorm(N)\nmod <- lm(y ~ x + I(x^2), quad)\n\nslopes(mod, newdata = datagrid(x = -2:2)) |>\n mutate(truth = 1 + 4 * x) |>\n select(estimate, truth)\n#> \n#> Estimate\n#> -7.012\n#> -3.008\n#> 0.996\n#> 4.999\n#> 9.003\n#> \n#> Columns: estimate, truth\n\nWe can plot conditional adjusted predictions with plot_predictions() function:\n\nplot_predictions(mod, condition = \"x\")\n\n\n\n\nWe can plot conditional marginal effects with the plot_slopes() function (see section below):\n\nplot_slopes(mod, variables = \"x\", condition = \"x\")\n\n\n\n\nAgain, the conclusion is the same. When \\(x<0\\), an increase in \\(x\\) is associated with an decrease in \\(y\\). When \\(x>1/4\\), the marginal effect is positive, which suggests that an increase in \\(x\\) is associated with an increase in \\(y\\)."
+ "text": "4.10 Example: Quadratic\nIn the “Definition” section of this vignette, we considered how marginal effects can be computed analytically in a simple quadratic equation context. We can now use the slopes() function to replicate our analysis of the quadratic function in a regression application.\nSay you estimate a linear regression model with a quadratic term:\n\\[Y = \\beta_0 + \\beta_1 X^2 + \\varepsilon\\]\nand obtain estimates of \\(\\beta_0=1\\) and \\(\\beta_1=2\\). Taking the partial derivative with respect to \\(X\\) and plugging in our estimates gives us the marginal effect of \\(X\\) on \\(Y\\):\n\\[\\partial Y / \\partial X = \\beta_0 + 2 \\cdot \\beta_1 X\\] \\[\\partial Y / \\partial X = 1 + 4X\\]\nThis result suggests that the effect of a change in \\(X\\) on \\(Y\\) depends on the level of \\(X\\). When \\(X\\) is large and positive, an increase in \\(X\\) is associated to a large increase in \\(Y\\). When \\(X\\) is small and positive, an increase in \\(X\\) is associated to a small increase in \\(Y\\). When \\(X\\) is a large negative value, an increase in \\(X\\) is associated with a decrease in \\(Y\\).\nmarginaleffects arrives at the same conclusion in simulated data:\n\nlibrary(tidyverse)\nN <- 1e5\nquad <- data.frame(x = rnorm(N))\nquad$y <- 1 + 1 * quad$x + 2 * quad$x^2 + rnorm(N)\nmod <- lm(y ~ x + I(x^2), quad)\n\nslopes(mod, newdata = datagrid(x = -2:2)) |>\n mutate(truth = 1 + 4 * x) |>\n select(estimate, truth)\n#> \n#> Estimate\n#> -7.014\n#> -3.009\n#> 0.996\n#> 5.000\n#> 9.005\n#> \n#> Columns: estimate, truth\n\nWe can plot conditional adjusted predictions with plot_predictions() function:\n\nplot_predictions(mod, condition = \"x\")\n\n\n\n\nWe can plot conditional marginal effects with the plot_slopes() function (see section below):\n\nplot_slopes(mod, variables = \"x\", condition = \"x\")\n\n\n\n\nAgain, the conclusion is the same. When \\(x<0\\), an increase in \\(x\\) is associated with an decrease in \\(y\\). When \\(x>1/4\\), the marginal effect is positive, which suggests that an increase in \\(x\\) is associated with an increase in \\(y\\)."
},
{
"objectID": "articles/slopes.html#slopes-vs-predictions-a-visual-interpretation",
"href": "articles/slopes.html#slopes-vs-predictions-a-visual-interpretation",
"title": "\n4 Slopes\n",
"section": "\n4.11 Slopes vs Predictions: A Visual Interpretation",
- "text": "4.11 Slopes vs Predictions: A Visual Interpretation\nOften, analysts will plot predicted values of the outcome with a best fit line:\n\nlibrary(ggplot2)\n\nmod <- lm(mpg ~ hp * qsec, data = mtcars)\n\nplot_predictions(mod, condition = \"hp\", vcov = TRUE) +\n geom_point(data = mtcars, aes(hp, mpg)) \n\n\n\n\nThe slope of this line is calculated using the same technique we all learned in grade school: dividing rise over run.\n\np <- plot_predictions(mod, condition = \"hp\", vcov = TRUE, draw = FALSE)\nplot_predictions(mod, condition = \"hp\", vcov = TRUE) +\n geom_segment(aes(x = p$hp[10], xend = p$hp[10], y = p$estimate[10], yend = p$estimate[20])) +\n geom_segment(aes(x = p$hp[10], xend = p$hp[20], y = p$estimate[20], yend = p$estimate[20])) +\n annotate(\"text\", label = \"Rise\", y = 10, x = 140) +\n annotate(\"text\", label = \"Run\", y = 2, x = 200)\n\n\n\n\nInstead of computing this slope manually, we can just call:\n\navg_slopes(mod, variables = \"hp\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp -0.112 0.0126 -8.92 <0.001 61.0 -0.137 -0.0874\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\nNow, consider the fact that our model includes an interaction between hp and qsec. This means that the slope will actually differ based on the value of the moderator variable qsec:\n\nplot_predictions(mod, condition = list(\"hp\", \"qsec\" = \"quartile\"))\n\n\n\n\nWe can estimate the slopes of these three fit lines easily:\n\nslopes(\n mod,\n variables = \"hp\",\n newdata = datagrid(qsec = quantile(mtcars$qsec, probs = c(.25, .5, .75))))\n#> \n#> Term qsec Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp 16.9 -0.0934 0.0111 -8.43 <0.001 54.7 -0.115 -0.0717\n#> hp 17.7 -0.1093 0.0123 -8.92 <0.001 60.9 -0.133 -0.0853\n#> hp 18.9 -0.1325 0.0154 -8.60 <0.001 56.8 -0.163 -0.1023\n#> \n#> Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, qsec, predicted_lo, predicted_hi, predicted, mpg, hp \n#> Type: response\n\nAs we see in the graph, all three slopes are negative, but the Q3 slope is steepest.\nWe could then push this one step further, and measure the slope of mpg with respect to hp, for all observed values of qsec. This is achieved with the plot_slopes() function:\n\nplot_slopes(mod, variables = \"hp\", condition = \"qsec\") +\n geom_hline(yintercept = 0, linetype = 3)\n\n\n\n\nThis plot shows that the marginal effect of hp on mpg is always negative (the slope is always below zero), and that this effect becomes even more negative as qsec increases."
+ "text": "4.11 Slopes vs Predictions: A Visual Interpretation\nOften, analysts will plot predicted values of the outcome with a best fit line:\n\nlibrary(ggplot2)\n\nmod <- lm(mpg ~ hp * qsec, data = mtcars)\n\nplot_predictions(mod, condition = \"hp\", vcov = TRUE) +\n geom_point(data = mtcars, aes(hp, mpg)) \n\n\n\n\nThe slope of this line is calculated using the same technique we all learned in grade school: dividing rise over run.\n\np <- plot_predictions(mod, condition = \"hp\", vcov = TRUE, draw = FALSE)\nplot_predictions(mod, condition = \"hp\", vcov = TRUE) +\n geom_segment(aes(x = p$hp[10], xend = p$hp[10], y = p$estimate[10], yend = p$estimate[20])) +\n geom_segment(aes(x = p$hp[10], xend = p$hp[20], y = p$estimate[20], yend = p$estimate[20])) +\n annotate(\"text\", label = \"Rise\", y = 10, x = 140) +\n annotate(\"text\", label = \"Run\", y = 2, x = 200)\n\n\n\n\nInstead of computing this slope manually, we can just call:\n\navg_slopes(mod, variables = \"hp\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp -0.112 0.0126 -8.92 <0.001 61.0 -0.137 -0.0874\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\nNow, consider the fact that our model includes an interaction between hp and qsec. This means that the slope will actually differ based on the value of the moderator variable qsec:\n\nplot_predictions(mod, condition = list(\"hp\", \"qsec\" = \"quartile\"))\n\n\n\n\nWe can estimate the slopes of these three fit lines easily:\n\nslopes(\n mod,\n variables = \"hp\",\n newdata = datagrid(qsec = quantile(mtcars$qsec, probs = c(.25, .5, .75))))\n#> \n#> Term qsec Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp 16.9 -0.0934 0.0111 -8.43 <0.001 54.7 -0.115 -0.0717\n#> hp 17.7 -0.1093 0.0123 -8.92 <0.001 60.8 -0.133 -0.0853\n#> hp 18.9 -0.1325 0.0154 -8.60 <0.001 56.8 -0.163 -0.1023\n#> \n#> Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, qsec, predicted_lo, predicted_hi, predicted, mpg, hp \n#> Type: response\n\nAs we see in the graph, all three slopes are negative, but the Q3 slope is steepest.\nWe could then push this one step further, and measure the slope of mpg with respect to hp, for all observed values of qsec. This is achieved with the plot_slopes() function:\n\nplot_slopes(mod, variables = \"hp\", condition = \"qsec\") +\n geom_hline(yintercept = 0, linetype = 3)\n\n\n\n\nThis plot shows that the marginal effect of hp on mpg is always negative (the slope is always below zero), and that this effect becomes even more negative as qsec increases."
},
{
"objectID": "articles/slopes.html#prediction-types",
@@ -284,7 +284,7 @@
"href": "articles/slopes.html#manual-computation",
"title": "\n4 Slopes\n",
"section": "\n4.13 Manual computation",
- "text": "4.13 Manual computation\nNow we illustrate how to reproduce the output of slopes() manually:\n\nlibrary(marginaleffects)\n\nmod <- glm(am ~ hp, family = binomial, data = mtcars)\n\neps <- 1e-4\nd1 <- transform(mtcars, hp = hp - eps / 2)\nd2 <- transform(mtcars, hp = hp + eps / 2)\np1 <- predict(mod, type = \"response\", newdata = d1)\np2 <- predict(mod, type = \"response\", newdata = d2)\ns <- (p2 - p1) / eps\ntail(s)\n#> Porsche 914-2 Lotus Europa Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E \n#> -0.0020285496 -0.0020192814 -0.0013143243 -0.0018326764 -0.0008900012 -0.0020233577\n\nWhich is equivalent to:\n\nslopes(mod, eps = eps) |> tail()\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp -0.00203 0.001481 -1.37 0.17086 2.5 -0.00493 0.000875\n#> hp -0.00202 0.001475 -1.37 0.17088 2.5 -0.00491 0.000871\n#> hp -0.00131 0.000482 -2.72 0.00645 7.3 -0.00226 -0.000369\n#> hp -0.00183 0.001210 -1.51 0.12993 2.9 -0.00420 0.000539\n#> hp -0.00089 0.000279 -3.19 0.00141 9.5 -0.00144 -0.000344\n#> hp -0.00202 0.001490 -1.36 0.17444 2.5 -0.00494 0.000897\n#> \n#> Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, am, hp \n#> Type: response\n\nAnd we can get average marginal effects by subgroup as follows:\n\ntapply(s, mtcars$cyl, mean)\n#> 4 6 8 \n#> -0.002010526 -0.001990774 -0.001632681\n\nslopes(mod, eps = eps, by = \"cyl\")\n#> \n#> Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp mean(dY/dX) 4 -0.00201 0.001441 -1.40 0.163 2.6 -0.00484 0.000814\n#> hp mean(dY/dX) 6 -0.00199 0.001504 -1.32 0.186 2.4 -0.00494 0.000957\n#> hp mean(dY/dX) 8 -0.00163 0.000954 -1.71 0.087 3.5 -0.00350 0.000237\n#> \n#> Columns: term, contrast, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \n#> Type: response"
+ "text": "4.13 Manual computation\nNow we illustrate how to reproduce the output of slopes() manually:\n\nlibrary(marginaleffects)\n\nmod <- glm(am ~ hp, family = binomial, data = mtcars)\n\neps <- 1e-4\nd1 <- transform(mtcars, hp = hp - eps / 2)\nd2 <- transform(mtcars, hp = hp + eps / 2)\np1 <- predict(mod, type = \"response\", newdata = d1)\np2 <- predict(mod, type = \"response\", newdata = d2)\ns <- (p2 - p1) / eps\ntail(s)\n#> Porsche 914-2 Lotus Europa Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E \n#> -0.0020285496 -0.0020192814 -0.0013143243 -0.0018326764 -0.0008900012 -0.0020233577\n\nWhich is equivalent to:\n\nslopes(mod, eps = eps) |> tail()\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp -0.00203 0.001612 -1.26 0.20818 2.3 -0.00519 0.001130\n#> hp -0.00202 0.001584 -1.27 0.20236 2.3 -0.00512 0.001085\n#> hp -0.00131 0.000430 -3.06 0.00225 8.8 -0.00216 -0.000471\n#> hp -0.00183 0.001311 -1.40 0.16204 2.6 -0.00440 0.000736\n#> hp -0.00089 0.000266 -3.34 < 0.001 10.2 -0.00141 -0.000368\n#> hp -0.00202 0.001571 -1.29 0.19776 2.3 -0.00510 0.001056\n#> \n#> Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, am, hp \n#> Type: response\n\nAnd we can get average marginal effects by subgroup as follows:\n\ntapply(s, mtcars$cyl, mean)\n#> 4 6 8 \n#> -0.002010526 -0.001990774 -0.001632681\n\nslopes(mod, eps = eps, by = \"cyl\")\n#> \n#> Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp mean(dY/dX) 4 -0.00201 0.001482 -1.36 0.1748 2.5 -0.00491 0.000893\n#> hp mean(dY/dX) 6 -0.00199 0.001459 -1.36 0.1723 2.5 -0.00485 0.000868\n#> hp mean(dY/dX) 8 -0.00163 0.000967 -1.69 0.0914 3.5 -0.00353 0.000263\n#> \n#> Columns: term, contrast, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \n#> Type: response"
},
{
"objectID": "articles/marginalmeans.html#marginal-means-vs.-average-predictions",
@@ -347,7 +347,7 @@
"href": "articles/plot.html#comparisons",
"title": "\n6 Plots\n",
"section": "\n6.2 Comparisons",
- "text": "6.2 Comparisons\n\n6.2.1 Conditional comparisons\nThe syntax for conditional comparisons is the same as the syntax for conditional predictions, except that we now need to specify the variable(s) of interest using an additional argument:\n\ncomparisons(mod,\n variables = \"flipper_length_mm\",\n newdata = datagrid(flipper_length_mm = c(172, 231), species = unique))\n#> \n#> Term Contrast flipper_length_mm species Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % bill_length_mm island\n#> flipper_length_mm +1 172 Adelie 15.3 9.25 1.66 0.0976 3.4 -2.81 33.5 43.9 Biscoe\n#> flipper_length_mm +1 172 Chinstrap 15.9 11.37 1.40 0.1609 2.6 -6.34 38.2 43.9 Biscoe\n#> flipper_length_mm +1 172 Gentoo 51.7 8.70 5.95 <0.001 28.5 34.68 68.8 43.9 Biscoe\n#> flipper_length_mm +1 231 Adelie 15.3 9.25 1.66 0.0976 3.4 -2.81 33.5 43.9 Biscoe\n#> flipper_length_mm +1 231 Chinstrap 15.9 11.37 1.40 0.1609 2.6 -6.34 38.2 43.9 Biscoe\n#> flipper_length_mm +1 231 Gentoo 51.7 8.70 5.95 <0.001 28.5 34.68 68.8 43.9 Biscoe\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, flipper_length_mm, species, predicted_lo, predicted_hi, predicted, body_mass_g, bill_length_mm, island \n#> Type: response\n\nplot_comparisons(mod,\n variables = \"flipper_length_mm\",\n condition = c(\"bill_length_mm\", \"species\"))\n\n\n\n\nWe can specify custom comparisons, as we would using the variables argument of the comparisons() function. For example, see what happens to the predicted outcome when flipper_length_mm increases by 1 standard deviation or by 10mm:\n\nplot_comparisons(mod,\n variables = list(\"flipper_length_mm\" = \"sd\"),\n condition = c(\"bill_length_mm\", \"species\")) +\n\nplot_comparisons(mod,\n variables = list(\"flipper_length_mm\" = 10),\n condition = c(\"bill_length_mm\", \"species\"))\n\n\n\n\nNotice that the vertical scale is different in the plots above, reflecting the fact that we are plotting the effect of a change of 1 standard deviation on the left vs 10 units on the right.\nLike the comparisons() function, plot_comparisons() is a very powerful tool because it allows us to compute and display custom comparisons such as differences, ratios, odds, lift, and arbitrary functions of predicted outcomes. For example, if we want to plot the ratio of predicted body mass for different species of penguins, we could do:\n\nplot_comparisons(mod,\n variables = \"species\",\n condition = \"bill_length_mm\",\n comparison = \"ratio\")\n\n\n\n\nThe left panel shows that the ratio of Chinstrap body mass to Adelie body mass is approximately constant, at slightly above 0.8. The right panel shows that the ratio of Gentoo to Adelie body mass is depends on their bill length. For birds with short bills, Gentoos seem to have smaller body mass than Adelies. For birds with long bills, Gentoos seem heavier than Adelies, although the null ratio (1) is not outside the confidence interval.\n\n6.2.2 Marginal comparisons\nAs above, we can also display marginal comparisons, by subgroups:\n\nplot_comparisons(mod,\n variables = \"flipper_length_mm\",\n by = \"species\") +\n\nplot_comparisons(mod,\n variables = \"flipper_length_mm\",\n by = c(\"species\", \"island\"))\n\n\n\n\nMultiple contrasts at once:\n\nplot_comparisons(mod,\n variables = c(\"flipper_length_mm\", \"bill_length_mm\"),\n by = c(\"species\", \"island\"))"
+ "text": "6.2 Comparisons\n\n6.2.1 Conditional comparisons\nThe syntax for conditional comparisons is the same as the syntax for conditional predictions, except that we now need to specify the variable(s) of interest using an additional argument:\n\ncomparisons(mod,\n variables = \"flipper_length_mm\",\n newdata = datagrid(flipper_length_mm = c(172, 231), species = unique))\n#> \n#> Term Contrast flipper_length_mm species Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % bill_length_mm island\n#> flipper_length_mm +1 172 Adelie 15.3 9.25 1.66 0.0976 3.4 -2.81 33.5 43.9 Biscoe\n#> flipper_length_mm +1 172 Chinstrap 15.9 11.37 1.40 0.1610 2.6 -6.34 38.2 43.9 Biscoe\n#> flipper_length_mm +1 172 Gentoo 51.7 8.70 5.95 <0.001 28.5 34.68 68.8 43.9 Biscoe\n#> flipper_length_mm +1 231 Adelie 15.3 9.25 1.66 0.0976 3.4 -2.81 33.5 43.9 Biscoe\n#> flipper_length_mm +1 231 Chinstrap 15.9 11.37 1.40 0.1610 2.6 -6.34 38.2 43.9 Biscoe\n#> flipper_length_mm +1 231 Gentoo 51.7 8.70 5.95 <0.001 28.5 34.68 68.8 43.9 Biscoe\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, flipper_length_mm, species, predicted_lo, predicted_hi, predicted, body_mass_g, bill_length_mm, island \n#> Type: response\n\nplot_comparisons(mod,\n variables = \"flipper_length_mm\",\n condition = c(\"bill_length_mm\", \"species\"))\n\n\n\n\nWe can specify custom comparisons, as we would using the variables argument of the comparisons() function. For example, see what happens to the predicted outcome when flipper_length_mm increases by 1 standard deviation or by 10mm:\n\nplot_comparisons(mod,\n variables = list(\"flipper_length_mm\" = \"sd\"),\n condition = c(\"bill_length_mm\", \"species\")) +\n\nplot_comparisons(mod,\n variables = list(\"flipper_length_mm\" = 10),\n condition = c(\"bill_length_mm\", \"species\"))\n\n\n\n\nNotice that the vertical scale is different in the plots above, reflecting the fact that we are plotting the effect of a change of 1 standard deviation on the left vs 10 units on the right.\nLike the comparisons() function, plot_comparisons() is a very powerful tool because it allows us to compute and display custom comparisons such as differences, ratios, odds, lift, and arbitrary functions of predicted outcomes. For example, if we want to plot the ratio of predicted body mass for different species of penguins, we could do:\n\nplot_comparisons(mod,\n variables = \"species\",\n condition = \"bill_length_mm\",\n comparison = \"ratio\")\n\n\n\n\nThe left panel shows that the ratio of Chinstrap body mass to Adelie body mass is approximately constant, at slightly above 0.8. The right panel shows that the ratio of Gentoo to Adelie body mass is depends on their bill length. For birds with short bills, Gentoos seem to have smaller body mass than Adelies. For birds with long bills, Gentoos seem heavier than Adelies, although the null ratio (1) is not outside the confidence interval.\n\n6.2.2 Marginal comparisons\nAs above, we can also display marginal comparisons, by subgroups:\n\nplot_comparisons(mod,\n variables = \"flipper_length_mm\",\n by = \"species\") +\n\nplot_comparisons(mod,\n variables = \"flipper_length_mm\",\n by = c(\"species\", \"island\"))\n\n\n\n\nMultiple contrasts at once:\n\nplot_comparisons(mod,\n variables = c(\"flipper_length_mm\", \"bill_length_mm\"),\n by = c(\"species\", \"island\"))"
},
{
"objectID": "articles/plot.html#slopes",
@@ -368,7 +368,7 @@
"href": "articles/plot.html#customization",
"title": "\n6 Plots\n",
"section": "\n6.5 Customization",
- "text": "6.5 Customization\nA very useful feature of the plotting functions in this package is that they produce normal ggplot2 objects. So we can customize them to our heart’s content, using ggplot2 itself, or one of the many packages designed to augment its functionalities:\n\nlibrary(ggrepel)\n\nmt <- mtcars\nmt$label <- row.names(mt)\n\nmod <- lm(mpg ~ hp * factor(cyl), data = mt)\n\nplot_predictions(mod, condition = c(\"hp\", \"cyl\"), points = .5, rug = TRUE, vcov = FALSE) +\n geom_text_repel(aes(x = hp, y = mpg, label = label),\n data = subset(mt, hp > 250),\n nudge_y = 2) +\n theme_classic()\n\n\n\n\nAll the plotting functions work with all the model supported by the marginaleffects package, so we can plot the output of a logistic regression model. This plot shows the probability of survival aboard the Titanic, for different ages and different ticket classes:\n\nlibrary(ggdist)\nlibrary(ggplot2)\n\ndat <- \"https://vincentarelbundock.github.io/Rdatasets/csv/Stat2Data/Titanic.csv\"\ndat <- read.csv(dat)\n\nmod <- glm(Survived ~ Age * SexCode * PClass, data = dat, family = binomial)\n\nplot_predictions(mod, condition = c(\"Age\", \"PClass\")) +\n geom_dots(\n alpha = .8,\n scale = .3,\n pch = 18,\n data = dat, aes(\n x = Age,\n y = Survived,\n side = ifelse(Survived == 1, \"bottom\", \"top\")))\n\n\n\n\nThanks to Andrew Heiss who inspired this plot.\nDesigning effective data visualizations requires a lot of customization to the specific context and data. The plotting functions in marginaleffects offer a powerful way to iterate quickly between plots and models, but they obviously cannot support all the features that users may want. Thankfully, it is very easy to use the slopes() functions to generate datasets that can then be used in ggplot2 or any other data visualization tool. Just use the draw argument:\n\np <- plot_predictions(mod, condition = c(\"Age\", \"PClass\"), draw = FALSE)\nhead(p)\n#> rowid estimate p.value s.value conf.low conf.high Survived SexCode Age PClass\n#> 1 1 0.8679723 0.0013307149 9.553583 0.6754794 0.9540527 0.4140212 0.3809524 0.17000 1st\n#> 2 2 0.8956789 0.0001333343 12.872665 0.7401973 0.9627887 0.4140212 0.3809524 0.17000 2nd\n#> 3 3 0.4044513 0.2667759176 1.906300 0.2554245 0.5734603 0.4140212 0.3809524 0.17000 3rd\n#> 4 4 0.8631027 0.0011563594 9.756194 0.6749549 0.9503544 0.4140212 0.3809524 1.61551 1st\n#> 5 5 0.8813224 0.0001728862 12.497890 0.7228529 0.9548415 0.4140212 0.3809524 1.61551 2nd\n#> 6 6 0.3934924 0.1899483149 2.396321 0.2535791 0.5533716 0.4140212 0.3809524 1.61551 3rd\n\nThis allows us to feed the data easily to other functions, such as those in the useful ggdist and distributional packages:\n\nlibrary(ggdist)\nlibrary(distributional)\nplot_slopes(mod, variables = \"SexCode\", condition = c(\"Age\", \"PClass\"), type = \"link\", draw = FALSE) |>\n ggplot() +\n stat_lineribbon(aes(\n x = Age,\n ydist = dist_normal(mu = estimate, sigma = std.error),\n fill = PClass),\n alpha = 1 / 4)"
+ "text": "6.5 Customization\nA very useful feature of the plotting functions in this package is that they produce normal ggplot2 objects. So we can customize them to our heart’s content, using ggplot2 itself, or one of the many packages designed to augment its functionalities:\n\nlibrary(ggrepel)\n\nmt <- mtcars\nmt$label <- row.names(mt)\n\nmod <- lm(mpg ~ hp * factor(cyl), data = mt)\n\nplot_predictions(mod, condition = c(\"hp\", \"cyl\"), points = .5, rug = TRUE, vcov = FALSE) +\n geom_text_repel(aes(x = hp, y = mpg, label = label),\n data = subset(mt, hp > 250),\n nudge_y = 2) +\n theme_classic()\n\n\n\n\nAll the plotting functions work with all the model supported by the marginaleffects package, so we can plot the output of a logistic regression model. This plot shows the probability of survival aboard the Titanic, for different ages and different ticket classes:\n\nlibrary(ggdist)\nlibrary(ggplot2)\n\ndat <- \"https://vincentarelbundock.github.io/Rdatasets/csv/Stat2Data/Titanic.csv\"\ndat <- read.csv(dat)\n\nmod <- glm(Survived ~ Age * SexCode * PClass, data = dat, family = binomial)\n\nplot_predictions(mod, condition = c(\"Age\", \"PClass\")) +\n geom_dots(\n alpha = .8,\n scale = .3,\n pch = 18,\n data = dat, aes(\n x = Age,\n y = Survived,\n side = ifelse(Survived == 1, \"bottom\", \"top\")))\n\n\n\n\nThanks to Andrew Heiss who inspired this plot.\nDesigning effective data visualizations requires a lot of customization to the specific context and data. The plotting functions in marginaleffects offer a powerful way to iterate quickly between plots and models, but they obviously cannot support all the features that users may want. Thankfully, it is very easy to use the slopes() functions to generate datasets that can then be used in ggplot2 or any other data visualization tool. Just use the draw argument:\n\np <- plot_predictions(mod, condition = c(\"Age\", \"PClass\"), draw = FALSE)\nhead(p)\n#> rowid estimate p.value s.value conf.low conf.high Survived SexCode Age PClass\n#> 1 1 0.8679723 0.0013307148 9.553583 0.6754794 0.9540527 0.4140212 0.3809524 0.17000 1st\n#> 2 2 0.8956789 0.0001333343 12.872665 0.7401973 0.9627887 0.4140212 0.3809524 0.17000 2nd\n#> 3 3 0.4044513 0.2667759176 1.906300 0.2554245 0.5734603 0.4140212 0.3809524 0.17000 3rd\n#> 4 4 0.8631027 0.0011563592 9.756195 0.6749549 0.9503543 0.4140212 0.3809524 1.61551 1st\n#> 5 5 0.8813224 0.0001728858 12.497893 0.7228530 0.9548415 0.4140212 0.3809524 1.61551 2nd\n#> 6 6 0.3934924 0.1899483119 2.396321 0.2535791 0.5533716 0.4140212 0.3809524 1.61551 3rd\n\nThis allows us to feed the data easily to other functions, such as those in the useful ggdist and distributional packages:\n\nlibrary(ggdist)\nlibrary(distributional)\nplot_slopes(mod, variables = \"SexCode\", condition = c(\"Age\", \"PClass\"), type = \"link\", draw = FALSE) |>\n ggplot() +\n stat_lineribbon(aes(\n x = Age,\n ydist = dist_normal(mu = estimate, sigma = std.error),\n fill = PClass),\n alpha = 1 / 4)"
},
{
"objectID": "articles/plot.html#fits-and-smooths",
@@ -389,7 +389,7 @@
"href": "articles/plot.html#plot-and-marginaleffects-objects",
"title": "\n6 Plots\n",
"section": "\n6.8 plot() and marginaleffects objects",
- "text": "6.8 plot() and marginaleffects objects\nSome users may feel inclined to call plot() on a object produced by marginaleffects object. Doing so will generate an informative error like this one:\n\nmod <- lm(mpg ~ hp * wt * factor(cyl), data = mtcars)\np <- predictions(mod)\nplot(p)\n#> Error: Please use the `plot_predictions()` function.\n\nThe reason for this error is that the user query is underspecified. marginaleffects allows users to compute so many quantities of interest that it is not clear what the user wants when they simply call plot(). Adding several new arguments would compete with the main plotting functions, and risk sowing confusion. The marginaleffects developers thus decided to support one main path to plotting: plot_predictions(), plot_comparisons(), and plot_slopes().\nThat said, it may be useful to remind users that all marginaleffects output are standard “tidy” data frames. Although they get pretty-printed to the console, all the listed columns are accessible via standard R operators. For example:\n\np <- avg_predictions(mod, by = \"cyl\")\np\n#> \n#> cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 4 26.7 0.695 38.4 <0.001 Inf 25.3 28.0\n#> 6 19.7 0.871 22.7 <0.001 375.1 18.0 21.5\n#> 8 15.1 0.616 24.5 <0.001 438.2 13.9 16.3\n#> \n#> Columns: cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\np$estimate\n#> [1] 26.66364 19.74286 15.10000\n\np$std.error\n#> [1] 0.6951236 0.8713836 0.6161612\n\np$conf.low\n#> [1] 25.30122 18.03498 13.89235\n\nThis allows us to plot all results very easily with standard plotting functions:\n\nplot_predictions(mod, by = \"cyl\")\n\n\n\n\nplot(p$cyl, p$estimate)\n\n\n\n\nggplot(p, aes(x = cyl, y = estimate, ymin = conf.low, ymax = conf.high)) +\n geom_pointrange()"
+ "text": "6.8 plot() and marginaleffects objects\nSome users may feel inclined to call plot() on a object produced by marginaleffects object. Doing so will generate an informative error like this one:\n\nmod <- lm(mpg ~ hp * wt * factor(cyl), data = mtcars)\np <- predictions(mod)\nplot(p)\n#> Error: Please use the `plot_predictions()` function.\n\nThe reason for this error is that the user query is underspecified. marginaleffects allows users to compute so many quantities of interest that it is not clear what the user wants when they simply call plot(). Adding several new arguments would compete with the main plotting functions, and risk sowing confusion. The marginaleffects developers thus decided to support one main path to plotting: plot_predictions(), plot_comparisons(), and plot_slopes().\nThat said, it may be useful to remind users that all marginaleffects output are standard “tidy” data frames. Although they get pretty-printed to the console, all the listed columns are accessible via standard R operators. For example:\n\np <- avg_predictions(mod, by = \"cyl\")\np\n#> \n#> cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 4 26.7 0.695 38.4 <0.001 Inf 25.3 28.0\n#> 6 19.7 0.871 22.7 <0.001 375.1 18.0 21.5\n#> 8 15.1 0.616 24.5 <0.001 438.2 13.9 16.3\n#> \n#> Columns: cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\np$estimate\n#> [1] 26.66364 19.74286 15.10000\n\np$std.error\n#> [1] 0.6951236 0.8713835 0.6161612\n\np$conf.low\n#> [1] 25.30122 18.03498 13.89235\n\nThis allows us to plot all results very easily with standard plotting functions:\n\nplot_predictions(mod, by = \"cyl\")\n\n\n\n\nplot(p$cyl, p$estimate)\n\n\n\n\nggplot(p, aes(x = cyl, y = estimate, ymin = conf.low, ymax = conf.high)) +\n geom_pointrange()"
},
{
"objectID": "articles/hypothesis.html#null-hypothesis",
@@ -403,7 +403,7 @@
"href": "articles/hypothesis.html#hypothesis-tests-with-the-delta-method",
"title": "\n7 Hypothesis Tests\n",
"section": "\n7.2 Hypothesis tests with the delta method",
- "text": "7.2 Hypothesis tests with the delta method\nThe marginaleffects package includes a powerful function called hypotheses(). This function emulates the behavior of the well-established car::deltaMethod and car::linearHypothesis functions, but it supports more models, requires fewer dependencies, and offers some convenience features like shortcuts for robust standard errors.\nhypotheses() can be used to compute estimates and standard errors of arbitrary functions of model parameters. For example, it can be used to conduct tests of equality between coefficients, or to test the value of some linear or non-linear combination of quantities of interest. hypotheses() can also be used to conduct hypothesis tests on other functions of a model’s parameter, such as adjusted predictions or marginal effects.\nLet’s start by estimating a simple model:\n\nlibrary(marginaleffects)\nmod <- lm(mpg ~ hp + wt + factor(cyl), data = mtcars)\n\nWhen the FUN and hypothesis arguments of hypotheses() equal NULL (the default), the function returns a data.frame of raw estimates:\n\nhypotheses(mod)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> (Intercept) 35.8460 2.041 17.56 <0.001 227.0 31.8457 39.846319\n#> hp -0.0231 0.012 -1.93 0.0531 4.2 -0.0465 0.000306\n#> wt -3.1814 0.720 -4.42 <0.001 16.6 -4.5918 -1.771012\n#> factor(cyl)6 -3.3590 1.402 -2.40 0.0166 5.9 -6.1062 -0.611803\n#> factor(cyl)8 -3.1859 2.170 -1.47 0.1422 2.8 -7.4399 1.068169\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nTest of equality between coefficients:\n\nhypotheses(mod, \"hp = wt\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp = wt 3.16 0.72 4.39 <0.001 16.4 1.75 4.57\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nNon-linear function of coefficients\n\nhypotheses(mod, \"exp(hp + wt) = 0.1\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> exp(hp + wt) = 0.1 -0.0594 0.0292 -2.04 0.0418 4.6 -0.117 -0.0022\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nThe vcov argument behaves in the same was as in the slopes() function. It allows us to easily compute robust standard errors:\n\nhypotheses(mod, \"hp = wt\", vcov = \"HC3\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp = wt 3.16 0.805 3.92 <0.001 13.5 1.58 4.74\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nWe can use shortcuts like b1, b2, ... to identify the position of each parameter in the output of FUN. For example, b2=b3 is equivalent to hp=wt because those term names appear in the 2nd and 3rd row when we call hypotheses(mod).\n\nhypotheses(mod, \"b2 = b3\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> b2 = b3 3.16 0.72 4.39 <0.001 16.4 1.75 4.57\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\n\nhypotheses(mod, hypothesis = \"b* / b3 = 1\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> b1 / b3 = 1 -12.26735 2.07340 -5.9165 <0.001 28.2 -16.33 -8.204\n#> b2 / b3 = 1 -0.99273 0.00413 -240.5539 <0.001 Inf -1.00 -0.985\n#> b3 / b3 = 1 0.00000 NA NA NA NA NA NA\n#> b4 / b3 = 1 0.05583 0.58287 0.0958 0.924 0.1 -1.09 1.198\n#> b5 / b3 = 1 0.00141 0.82981 0.0017 0.999 0.0 -1.62 1.628\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nTerm names with special characters must be enclosed in backticks:\n\nhypotheses(mod, \"`factor(cyl)6` = `factor(cyl)8`\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> `factor(cyl)6` = `factor(cyl)8` -0.173 1.65 -0.105 0.917 0.1 -3.41 3.07\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\n\n7.2.1 Arbitrary functions: FUN\n\nThe FUN argument can be used to compute standard errors for arbitrary functions of model parameters. This user-supplied function must accept a single model object, and return a numeric vector or a data.frame with two columns named term and estimate.\n\nmod <- glm(am ~ hp + mpg, data = mtcars, family = binomial)\n\nf <- function(x) {\n out <- x$coefficients[\"hp\"] + x$coefficients[\"mpg\"]\n return(out)\n}\nhypotheses(mod, FUN = f)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 1 1.31 0.593 2.22 0.0266 5.2 0.153 2.48\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nWith labels:\n\nf <- function(x) {\n out <- data.frame(\n term = \"Horsepower + Miles per Gallon\",\n estimate = x$coefficients[\"hp\"] + x$coefficients[\"mpg\"]\n )\n return(out)\n}\nhypotheses(mod, FUN = f)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> Horsepower + Miles per Gallon 1.31 0.593 2.22 0.0266 5.2 0.153 2.48\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nTest of equality between two predictions (row 2 vs row 3):\n\nf <- function(x) predict(x, newdata = mtcars)\nhypotheses(mod, FUN = f, hypothesis = \"b2 = b3\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> b2 = b3 -1.33 0.616 -2.16 0.0305 5.0 -2.54 -0.125\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nNote that we specified the newdata argument in the f function. This is because the predict() method associated with lm objects will automatically the original fitted values when newdata is NULL, instead of returning the slightly altered fitted values which we need to compute numerical derivatives in the delta method.\nWe can also use numeric vectors to specify linear combinations of parameters. For example, there are 3 coefficients in the last model we estimated. To test the null hypothesis that the sum of the 2nd and 3rd coefficients is equal to 0, we can do:\n\nhypotheses(mod, hypothesis = c(0, 1, 1))\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> custom 1.31 0.593 2.22 0.0266 5.2 0.153 2.48\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nSee below for more example of how to use string formulas, numeric vectors, or matrices to calculate custom contrasts, linear combinations, and linear or non-linear hypothesis tests.\n\n7.2.2 Arbitrary quantities with data frames\nmarginaleffects can also compute uncertainty estimates for arbitrary quantities hosted in a data frame, as long as the user can supply a variance-covariance matrix. (Thanks to Kyle F Butts for this cool feature and example!)\nSay you run a monte-carlo simulation and you want to perform hypothesis of various quantities returned from each simulation. The quantities are correlated within each draw:\n\n# simulated means and medians\ndraw <- function(i) { \n x <- rnorm(n = 10000, mean = 0, sd = 1)\n out <- data.frame(median = median(x), mean = mean(x))\n return(out)\n}\nsims <- do.call(\"rbind\", lapply(1:25, draw))\n\n# average mean and average median \ncoeftable <- data.frame(\n term = c(\"median\", \"mean\"),\n estimate = c(mean(sims$median), mean(sims$mean))\n)\n\n# variance-covariance\nvcov <- cov(sims)\n\n# is the median equal to the mean?\nhypotheses(\n coeftable,\n vcov = vcov,\n hypothesis = \"median = mean\"\n)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> median = mean -0.000969 0.00702 -0.138 0.89 0.2 -0.0147 0.0128\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high"
+ "text": "7.2 Hypothesis tests with the delta method\nThe marginaleffects package includes a powerful function called hypotheses(). This function emulates the behavior of the well-established car::deltaMethod and car::linearHypothesis functions, but it supports more models, requires fewer dependencies, and offers some convenience features like shortcuts for robust standard errors.\nhypotheses() can be used to compute estimates and standard errors of arbitrary functions of model parameters. For example, it can be used to conduct tests of equality between coefficients, or to test the value of some linear or non-linear combination of quantities of interest. hypotheses() can also be used to conduct hypothesis tests on other functions of a model’s parameter, such as adjusted predictions or marginal effects.\nLet’s start by estimating a simple model:\n\nlibrary(marginaleffects)\nmod <- lm(mpg ~ hp + wt + factor(cyl), data = mtcars)\n\nWhen the FUN and hypothesis arguments of hypotheses() equal NULL (the default), the function returns a data.frame of raw estimates:\n\nhypotheses(mod)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> (Intercept) 35.8460 2.041 17.56 <0.001 227.0 31.8457 39.846319\n#> hp -0.0231 0.012 -1.93 0.0531 4.2 -0.0465 0.000306\n#> wt -3.1814 0.720 -4.42 <0.001 16.6 -4.5918 -1.771012\n#> factor(cyl)6 -3.3590 1.402 -2.40 0.0166 5.9 -6.1062 -0.611803\n#> factor(cyl)8 -3.1859 2.170 -1.47 0.1422 2.8 -7.4399 1.068169\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nTest of equality between coefficients:\n\nhypotheses(mod, \"hp = wt\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp = wt 3.16 0.72 4.39 <0.001 16.4 1.75 4.57\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nNon-linear function of coefficients\n\nhypotheses(mod, \"exp(hp + wt) = 0.1\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> exp(hp + wt) = 0.1 -0.0594 0.0292 -2.04 0.0418 4.6 -0.117 -0.0022\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nThe vcov argument behaves in the same was as in the slopes() function. It allows us to easily compute robust standard errors:\n\nhypotheses(mod, \"hp = wt\", vcov = \"HC3\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp = wt 3.16 0.805 3.92 <0.001 13.5 1.58 4.74\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nWe can use shortcuts like b1, b2, ... to identify the position of each parameter in the output of FUN. For example, b2=b3 is equivalent to hp=wt because those term names appear in the 2nd and 3rd row when we call hypotheses(mod).\n\nhypotheses(mod, \"b2 = b3\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> b2 = b3 3.16 0.72 4.39 <0.001 16.4 1.75 4.57\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\n\nhypotheses(mod, hypothesis = \"b* / b3 = 1\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> b1 / b3 = 1 -12.26735 2.07340 -5.9165 <0.001 28.2 -16.33 -8.204\n#> b2 / b3 = 1 -0.99273 0.00413 -240.5539 <0.001 Inf -1.00 -0.985\n#> b3 / b3 = 1 0.00000 NA NA NA NA NA NA\n#> b4 / b3 = 1 0.05583 0.58287 0.0958 0.924 0.1 -1.09 1.198\n#> b5 / b3 = 1 0.00141 0.82981 0.0017 0.999 0.0 -1.62 1.628\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nTerm names with special characters must be enclosed in backticks:\n\nhypotheses(mod, \"`factor(cyl)6` = `factor(cyl)8`\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> `factor(cyl)6` = `factor(cyl)8` -0.173 1.65 -0.105 0.917 0.1 -3.41 3.07\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\n\n7.2.1 Arbitrary functions: FUN\n\nThe FUN argument can be used to compute standard errors for arbitrary functions of model parameters. This user-supplied function must accept a single model object, and return a numeric vector or a data.frame with two columns named term and estimate.\n\nmod <- glm(am ~ hp + mpg, data = mtcars, family = binomial)\n\nf <- function(x) {\n out <- x$coefficients[\"hp\"] + x$coefficients[\"mpg\"]\n return(out)\n}\nhypotheses(mod, FUN = f)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 1 1.31 0.593 2.22 0.0266 5.2 0.153 2.48\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nWith labels:\n\nf <- function(x) {\n out <- data.frame(\n term = \"Horsepower + Miles per Gallon\",\n estimate = x$coefficients[\"hp\"] + x$coefficients[\"mpg\"]\n )\n return(out)\n}\nhypotheses(mod, FUN = f)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> Horsepower + Miles per Gallon 1.31 0.593 2.22 0.0266 5.2 0.153 2.48\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nTest of equality between two predictions (row 2 vs row 3):\n\nf <- function(x) predict(x, newdata = mtcars)\nhypotheses(mod, FUN = f, hypothesis = \"b2 = b3\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> b2 = b3 -1.33 0.616 -2.16 0.0305 5.0 -2.54 -0.125\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nNote that we specified the newdata argument in the f function. This is because the predict() method associated with lm objects will automatically the original fitted values when newdata is NULL, instead of returning the slightly altered fitted values which we need to compute numerical derivatives in the delta method.\nWe can also use numeric vectors to specify linear combinations of parameters. For example, there are 3 coefficients in the last model we estimated. To test the null hypothesis that the sum of the 2nd and 3rd coefficients is equal to 0, we can do:\n\nhypotheses(mod, hypothesis = c(0, 1, 1))\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> custom 1.31 0.593 2.22 0.0266 5.2 0.153 2.48\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high\n\nSee below for more example of how to use string formulas, numeric vectors, or matrices to calculate custom contrasts, linear combinations, and linear or non-linear hypothesis tests.\n\n7.2.2 Arbitrary quantities with data frames\nmarginaleffects can also compute uncertainty estimates for arbitrary quantities hosted in a data frame, as long as the user can supply a variance-covariance matrix. (Thanks to Kyle F Butts for this cool feature and example!)\nSay you run a monte-carlo simulation and you want to perform hypothesis of various quantities returned from each simulation. The quantities are correlated within each draw:\n\n# simulated means and medians\ndraw <- function(i) { \n x <- rnorm(n = 10000, mean = 0, sd = 1)\n out <- data.frame(median = median(x), mean = mean(x))\n return(out)\n}\nsims <- do.call(\"rbind\", lapply(1:25, draw))\n\n# average mean and average median \ncoeftable <- data.frame(\n term = c(\"median\", \"mean\"),\n estimate = c(mean(sims$median), mean(sims$mean))\n)\n\n# variance-covariance\nvcov <- cov(sims)\n\n# is the median equal to the mean?\nhypotheses(\n coeftable,\n vcov = vcov,\n hypothesis = \"median = mean\"\n)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> median = mean -0.0012 0.00594 -0.201 0.841 0.3 -0.0128 0.0104\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high"
},
{
"objectID": "articles/hypothesis.html#hypotheses-formulas",
@@ -452,7 +452,7 @@
"href": "articles/brms.html#random-effects-model",
"title": "\n8 Bayes\n",
"section": "\n8.2 Random effects model",
- "text": "8.2 Random effects model\nThis section replicates some of the analyses of a random effects model published in Andrew Heiss’ blog post: “A guide to correctly calculating posterior predictions and average marginal effects with multilevel Bayesian models.” The objective is mainly to illustrate the use of marginaleffects. Please refer to the original post for a detailed discussion of the quantities computed below.\nLoad libraries and download data:\n\nlibrary(brms)\nlibrary(ggdist)\nlibrary(patchwork)\nlibrary(marginaleffects)\n\nvdem_2015 <- read.csv(\"https://github.com/vincentarelbundock/marginaleffects/raw/main/data-raw/vdem_2015.csv\")\n\nhead(vdem_2015)\n#> country_name country_text_id year region media_index party_autonomy_ord polyarchy civil_liberties party_autonomy\n#> 1 Mexico MEX 2015 Latin America and the Caribbean 0.837 3 0.631 0.704 TRUE\n#> 2 Suriname SUR 2015 Latin America and the Caribbean 0.883 4 0.777 0.887 TRUE\n#> 3 Sweden SWE 2015 Western Europe and North America 0.956 4 0.915 0.968 TRUE\n#> 4 Switzerland CHE 2015 Western Europe and North America 0.939 4 0.901 0.960 TRUE\n#> 5 Ghana GHA 2015 Sub-Saharan Africa 0.858 4 0.724 0.921 TRUE\n#> 6 South Africa ZAF 2015 Sub-Saharan Africa 0.898 4 0.752 0.869 TRUE\n\nFit a basic model:\n\nmod <- brm(\n bf(media_index ~ party_autonomy + civil_liberties + (1 | region),\n phi ~ (1 | region)),\n data = vdem_2015,\n family = Beta(),\n control = list(adapt_delta = 0.9))\n\n\n8.2.1 Posterior predictions\nTo compute posterior predictions for specific values of the regressors, we use the newdata argument and the datagrid() function. We also use the type argument to compute two types of predictions: accounting for residual (observation-level) residual variance (prediction) or ignoring it (response).\n\nnd = datagrid(model = mod,\n party_autonomy = c(TRUE, FALSE),\n civil_liberties = .5,\n region = \"Middle East and North Africa\")\np1 <- predictions(mod, type = \"response\", newdata = nd) |>\n posterior_draws() |>\n transform(type = \"Response\")\np2 <- predictions(mod, type = \"prediction\", newdata = nd) |>\n posterior_draws() |>\n transform(type = \"Prediction\")\npred <- rbind(p1, p2)\n\nExtract posterior draws and plot them:\n\nggplot(pred, aes(x = draw, fill = party_autonomy)) +\n stat_halfeye(alpha = .5) +\n facet_wrap(~ type) +\n labs(x = \"Media index (predicted)\", \n y = \"Posterior density\",\n fill = \"Party autonomy\")\n\n\n\n\n\n8.2.2 Marginal effects and contrasts\nAs noted in the Marginal Effects vignette, there should be one distinct marginal effect for each combination of regressor values. Here, we consider only one combination of regressor values, where region is “Middle East and North Africa”, and civil_liberties is 0.5. Then, we calculate the mean of the posterior distribution of marginal effects:\n\nmfx <- slopes(mod,\n newdata = datagrid(civil_liberties = .5,\n region = \"Middle East and North Africa\"))\nmfx\n#> \n#> Term Contrast civil_liberties region Estimate 2.5 % 97.5 %\n#> civil_liberties dY/dX 0.5 Middle East and North Africa 0.816 0.621 1.007\n#> party_autonomy TRUE - FALSE 0.5 Middle East and North Africa 0.252 0.166 0.336\n#> \n#> Columns: rowid, term, contrast, estimate, conf.low, conf.high, civil_liberties, region, predicted_lo, predicted_hi, predicted, tmp_idx, media_index, party_autonomy \n#> Type: response\n\nUse the posterior_draws() to extract draws from the posterior distribution of marginal effects, and plot them:\n\nmfx <- posterior_draws(mfx)\n\nggplot(mfx, aes(x = draw, y = term)) +\n stat_halfeye() +\n labs(x = \"Marginal effect\", y = \"\")\n\n\n\n\nPlot marginal effects, conditional on a regressor:\n\nplot_slopes(mod,\n variables = \"civil_liberties\",\n condition = \"party_autonomy\")\n\n\n\n\n\n8.2.3 Continuous predictors\n\npred <- predictions(mod,\n newdata = datagrid(party_autonomy = FALSE,\n region = \"Middle East and North Africa\",\n civil_liberties = seq(0, 1, by = 0.05))) |>\n posterior_draws()\n\nggplot(pred, aes(x = civil_liberties, y = draw)) +\n stat_lineribbon() +\n scale_fill_brewer(palette = \"Reds\") +\n labs(x = \"Civil liberties\",\n y = \"Media index (predicted)\",\n fill = \"\")\n\n\n\n\nThe slope of this line for different values of civil liberties can be obtained with:\n\nmfx <- slopes(mod,\n newdata = datagrid(\n civil_liberties = c(.2, .5, .8),\n party_autonomy = FALSE,\n region = \"Middle East and North Africa\"),\n variables = \"civil_liberties\")\nmfx\n#> \n#> Term civil_liberties party_autonomy region Estimate 2.5 % 97.5 %\n#> civil_liberties 0.2 FALSE Middle East and North Africa 0.490 0.361 0.639\n#> civil_liberties 0.5 FALSE Middle East and North Africa 0.807 0.612 0.993\n#> civil_liberties 0.8 FALSE Middle East and North Africa 0.807 0.674 0.934\n#> \n#> Columns: rowid, term, estimate, conf.low, conf.high, civil_liberties, party_autonomy, region, predicted_lo, predicted_hi, predicted, tmp_idx, media_index \n#> Type: response\n\nAnd plotted:\n\nmfx <- posterior_draws(mfx)\n\nggplot(mfx, aes(x = draw, fill = factor(civil_liberties))) +\n stat_halfeye(slab_alpha = .5) +\n labs(x = \"Marginal effect of Civil Liberties on Media Index\",\n y = \"Posterior density\",\n fill = \"Civil liberties\")\n\n\n\n\nThe slopes() function can use the ellipsis (...) to push any argument forward to the posterior_predict function. This can alter the types of predictions returned. For example, the re_formula=NA argument of the posterior_predict.brmsfit method will compute marginaleffects without including any group-level effects:\n\nmfx <- slopes(\n mod,\n newdata = datagrid(\n civil_liberties = c(.2, .5, .8),\n party_autonomy = FALSE,\n region = \"Middle East and North Africa\"),\n variables = \"civil_liberties\",\n re_formula = NA) |>\n posterior_draws()\n\nggplot(mfx, aes(x = draw, fill = factor(civil_liberties))) +\n stat_halfeye(slab_alpha = .5) +\n labs(x = \"Marginal effect of Civil Liberties on Media Index\",\n y = \"Posterior density\",\n fill = \"Civil liberties\")\n\n\n\n\n\n8.2.4 Global grand mean\n\npred <- predictions(\n mod,\n re_formula = NA,\n newdata = datagrid(party_autonomy = c(TRUE, FALSE))) |>\n posterior_draws()\n\nmfx <- slopes(\n mod,\n re_formula = NA,\n variables = \"party_autonomy\") |>\n posterior_draws()\n\nplot1 <- ggplot(pred, aes(x = draw, fill = party_autonomy)) +\n stat_halfeye(slab_alpha = .5) +\n labs(x = \"Media index (Predicted)\",\n y = \"Posterior density\",\n fill = \"Party autonomy\")\n\nplot2 <- ggplot(mfx, aes(x = draw)) +\n stat_halfeye(slab_alpha = .5) +\n labs(x = \"Contrast: Party autonomy TRUE - FALSE\",\n y = \"\",\n fill = \"Party autonomy\")\n\n## combine plots using the `patchwork` package\nplot1 + plot2\n\n\n\n\n\n8.2.5 Region-specific predictions and contrasts\nPredicted media index by region and level of civil liberties:\n\npred <- predictions(mod,\n newdata = datagrid(region = vdem_2015$region,\n party_autonomy = FALSE, \n civil_liberties = seq(0, 1, length.out = 100))) |> \n posterior_draws()\n\nggplot(pred, aes(x = civil_liberties, y = draw)) +\n stat_lineribbon() +\n scale_fill_brewer(palette = \"Reds\") +\n facet_wrap(~ region) +\n labs(x = \"Civil liberties\",\n y = \"Media index (predicted)\",\n fill = \"\")\n\n\n\n\nPredicted media index by region and level of civil liberties:\n\npred <- predictions(mod,\n newdata = datagrid(region = vdem_2015$region,\n civil_liberties = c(.2, .8),\n party_autonomy = FALSE)) |>\n posterior_draws()\n\nggplot(pred, aes(x = draw, fill = factor(civil_liberties))) +\n stat_halfeye(slab_alpha = .5) +\n facet_wrap(~ region) +\n labs(x = \"Media index (predicted)\",\n y = \"Posterior density\",\n fill = \"Civil liberties\")\n\n\n\n\nPredicted media index by region and party autonomy:\n\npred <- predictions(mod,\n newdata = datagrid(region = vdem_2015$region,\n party_autonomy = c(TRUE, FALSE),\n civil_liberties = .5)) |>\n posterior_draws()\n\nggplot(pred, aes(x = draw, y = region , fill = party_autonomy)) +\n stat_halfeye(slab_alpha = .5) +\n labs(x = \"Media index (predicted)\",\n y = \"\",\n fill = \"Party autonomy\")\n\n\n\n\nTRUE/FALSE contrasts (marginal effects) of party autonomy by region:\n\nmfx <- slopes(\n mod,\n variables = \"party_autonomy\",\n newdata = datagrid(\n region = vdem_2015$region,\n civil_liberties = .5)) |>\n posterior_draws()\n\nggplot(mfx, aes(x = draw, y = region , fill = party_autonomy)) +\n stat_halfeye(slab_alpha = .5) +\n labs(x = \"Media index (predicted)\",\n y = \"\",\n fill = \"Party autonomy\")\n\n\n\n\n\n8.2.6 Hypothetical groups\nWe can also obtain predictions or marginal effects for a hypothetical group instead of one of the observed regions. To achieve this, we create a dataset with NA in the region column. Then we call the marginaleffects or predictions() functions with the allow_new_levels argument. This argument is pushed through via the ellipsis (...) to the posterior_epred function of the brms package:\n\ndat <- data.frame(civil_liberties = .5,\n party_autonomy = FALSE,\n region = \"New Region\")\n\nmfx <- slopes(\n mod,\n variables = \"party_autonomy\",\n allow_new_levels = TRUE,\n newdata = dat)\n\ndraws <- posterior_draws(mfx)\n\nggplot(draws, aes(x = draw)) +\n stat_halfeye() +\n labs(x = \"Marginal effect of party autonomy in a generic world region\", y = \"\")\n\n\n\n\n\n8.2.7 Averaging, marginalizing, integrating random effects\nConsider a logistic regression model with random effects:\n\ndat <- read.csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/plm/EmplUK.csv\")\ndat$x <- as.numeric(dat$output > median(dat$output))\ndat$y <- as.numeric(dat$emp > median(dat$emp))\nmod <- brm(y ~ x + (1 | firm), data = dat, backend = \"cmdstanr\", family = \"bernoulli\")\n\nWe can compute adjusted predictions for a given value of x and for each firm (random effects) as follows:\n\np <- predictions(mod, newdata = datagrid(x = 0, firm = unique))\nhead(p)\n#> \n#> x firm Estimate 2.5 % 97.5 %\n#> 0 1 1.0e+00 9.01e-01 1.0000\n#> 0 2 1.0e+00 8.95e-01 1.0000\n#> 0 3 1.0e+00 9.12e-01 1.0000\n#> 0 4 1.0e+00 7.97e-01 1.0000\n#> 0 5 1.0e+00 9.09e-01 1.0000\n#> 0 6 4.9e-08 8.42e-21 0.0019\n#> \n#> Columns: rowid, estimate, conf.low, conf.high, y, x, firm \n#> Type: response\n\nWe can average/marginalize/integrate across random effects with the avg_predictions() function or the by argument:\n\navg_predictions(mod, newdata = datagrid(x = 0, firm = unique))\n#> \n#> Estimate 2.5 % 97.5 %\n#> 0.454 0.44 0.468\n#> \n#> Columns: estimate, conf.low, conf.high \n#> Type: response\n\npredictions(mod, newdata = datagrid(x = 0:1, firm = unique), by = \"x\")\n#> \n#> x Estimate 2.5 % 97.5 %\n#> 0 0.454 0.440 0.468\n#> 1 0.557 0.546 0.570\n#> \n#> Columns: x, estimate, conf.low, conf.high \n#> Type: response\n\nWe can also draw from the (assumed gaussian) population distribution of random effects, by asking predictions() to make predictions for new “levels” of the random effects. If we then take an average of predictions using avg_predictions() or the by argument, we will have “integrated out the random effects”, as described in the brmsmargins package vignette. In the code below, we make predictions for 100 firm identifiers which were not in the original dataset. We also ask predictions() to push forward the allow_new_levels and sample_new_levels arguments to the brms::posterior_epred function:\n\npredictions(\n mod,\n newdata = datagrid(x = 0:1, firm = -1:-100),\n allow_new_levels = TRUE,\n sample_new_levels = \"gaussian\",\n by = \"x\")\n#> \n#> x Estimate 2.5 % 97.5 %\n#> 0 0.450 0.343 0.565\n#> 1 0.549 0.441 0.664\n#> \n#> Columns: x, estimate, conf.low, conf.high \n#> Type: response\n\nWe can “integrate out” random effects in the other slopes() functions too. For instance,\n\navg_comparisons(\n mod,\n newdata = datagrid(firm = -1:-100),\n allow_new_levels = TRUE,\n sample_new_levels = \"gaussian\")\n#> \n#> Term Contrast Estimate 2.5 % 97.5 %\n#> x 1 - 0 0.0967 0.0465 0.162\n#> \n#> Columns: term, contrast, estimate, conf.low, conf.high \n#> Type: response\n\nThis is nearly equivalent the brmsmargins command output (with slight variations due to different random seeds):\n\nlibrary(brmsmargins)\nbm <- brmsmargins(\n k = 100,\n object = mod,\n at = data.frame(x = c(0, 1)),\n CI = .95,\n CIType = \"ETI\",\n contrasts = cbind(\"AME x\" = c(-1, 1)),\n effects = \"integrateoutRE\")\nbm$ContrastSummary |> data.frame()\n#> M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label\n#> 1 0.09898374 0.09694253 0.04870074 0.1611877 NA NA 0.95 ETI <NA> <NA> AME x\n\nSee the alternative software vignette for more information on brmsmargins."
+ "text": "8.2 Random effects model\nThis section replicates some of the analyses of a random effects model published in Andrew Heiss’ blog post: “A guide to correctly calculating posterior predictions and average marginal effects with multilevel Bayesian models.” The objective is mainly to illustrate the use of marginaleffects. Please refer to the original post for a detailed discussion of the quantities computed below.\nLoad libraries and download data:\n\nlibrary(brms)\nlibrary(ggdist)\nlibrary(patchwork)\nlibrary(marginaleffects)\n\nvdem_2015 <- read.csv(\"https://github.com/vincentarelbundock/marginaleffects/raw/main/data-raw/vdem_2015.csv\")\n\nhead(vdem_2015)\n#> country_name country_text_id year region media_index party_autonomy_ord polyarchy civil_liberties party_autonomy\n#> 1 Mexico MEX 2015 Latin America and the Caribbean 0.837 3 0.631 0.704 TRUE\n#> 2 Suriname SUR 2015 Latin America and the Caribbean 0.883 4 0.777 0.887 TRUE\n#> 3 Sweden SWE 2015 Western Europe and North America 0.956 4 0.915 0.968 TRUE\n#> 4 Switzerland CHE 2015 Western Europe and North America 0.939 4 0.901 0.960 TRUE\n#> 5 Ghana GHA 2015 Sub-Saharan Africa 0.858 4 0.724 0.921 TRUE\n#> 6 South Africa ZAF 2015 Sub-Saharan Africa 0.898 4 0.752 0.869 TRUE\n\nFit a basic model:\n\nmod <- brm(\n bf(media_index ~ party_autonomy + civil_liberties + (1 | region),\n phi ~ (1 | region)),\n data = vdem_2015,\n family = Beta(),\n control = list(adapt_delta = 0.9))\n\n\n8.2.1 Posterior predictions\nTo compute posterior predictions for specific values of the regressors, we use the newdata argument and the datagrid() function. We also use the type argument to compute two types of predictions: accounting for residual (observation-level) residual variance (prediction) or ignoring it (response).\n\nnd = datagrid(model = mod,\n party_autonomy = c(TRUE, FALSE),\n civil_liberties = .5,\n region = \"Middle East and North Africa\")\np1 <- predictions(mod, type = \"response\", newdata = nd) |>\n posterior_draws() |>\n transform(type = \"Response\")\np2 <- predictions(mod, type = \"prediction\", newdata = nd) |>\n posterior_draws() |>\n transform(type = \"Prediction\")\npred <- rbind(p1, p2)\n\nExtract posterior draws and plot them:\n\nggplot(pred, aes(x = draw, fill = party_autonomy)) +\n stat_halfeye(alpha = .5) +\n facet_wrap(~ type) +\n labs(x = \"Media index (predicted)\", \n y = \"Posterior density\",\n fill = \"Party autonomy\")\n\n\n\n\n\n8.2.2 Marginal effects and contrasts\nAs noted in the Marginal Effects vignette, there should be one distinct marginal effect for each combination of regressor values. Here, we consider only one combination of regressor values, where region is “Middle East and North Africa”, and civil_liberties is 0.5. Then, we calculate the mean of the posterior distribution of marginal effects:\n\nmfx <- slopes(mod,\n newdata = datagrid(civil_liberties = .5,\n region = \"Middle East and North Africa\"))\nmfx\n#> \n#> Term Contrast civil_liberties region Estimate 2.5 % 97.5 %\n#> civil_liberties dY/dX 0.5 Middle East and North Africa 0.816 0.621 1.007\n#> party_autonomy TRUE - FALSE 0.5 Middle East and North Africa 0.252 0.166 0.336\n#> \n#> Columns: rowid, term, contrast, estimate, conf.low, conf.high, civil_liberties, region, predicted_lo, predicted_hi, predicted, tmp_idx, media_index, party_autonomy \n#> Type: response\n\nUse the posterior_draws() to extract draws from the posterior distribution of marginal effects, and plot them:\n\nmfx <- posterior_draws(mfx)\n\nggplot(mfx, aes(x = draw, y = term)) +\n stat_halfeye() +\n labs(x = \"Marginal effect\", y = \"\")\n\n\n\n\nPlot marginal effects, conditional on a regressor:\n\nplot_slopes(mod,\n variables = \"civil_liberties\",\n condition = \"party_autonomy\")\n\n\n\n\n\n8.2.3 Continuous predictors\n\npred <- predictions(mod,\n newdata = datagrid(party_autonomy = FALSE,\n region = \"Middle East and North Africa\",\n civil_liberties = seq(0, 1, by = 0.05))) |>\n posterior_draws()\n\nggplot(pred, aes(x = civil_liberties, y = draw)) +\n stat_lineribbon() +\n scale_fill_brewer(palette = \"Reds\") +\n labs(x = \"Civil liberties\",\n y = \"Media index (predicted)\",\n fill = \"\")\n\n\n\n\nThe slope of this line for different values of civil liberties can be obtained with:\n\nmfx <- slopes(mod,\n newdata = datagrid(\n civil_liberties = c(.2, .5, .8),\n party_autonomy = FALSE,\n region = \"Middle East and North Africa\"),\n variables = \"civil_liberties\")\nmfx\n#> \n#> Term civil_liberties party_autonomy region Estimate 2.5 % 97.5 %\n#> civil_liberties 0.2 FALSE Middle East and North Africa 0.490 0.361 0.639\n#> civil_liberties 0.5 FALSE Middle East and North Africa 0.807 0.612 0.993\n#> civil_liberties 0.8 FALSE Middle East and North Africa 0.807 0.674 0.934\n#> \n#> Columns: rowid, term, estimate, conf.low, conf.high, civil_liberties, party_autonomy, region, predicted_lo, predicted_hi, predicted, tmp_idx, media_index \n#> Type: response\n\nAnd plotted:\n\nmfx <- posterior_draws(mfx)\n\nggplot(mfx, aes(x = draw, fill = factor(civil_liberties))) +\n stat_halfeye(slab_alpha = .5) +\n labs(x = \"Marginal effect of Civil Liberties on Media Index\",\n y = \"Posterior density\",\n fill = \"Civil liberties\")\n\n\n\n\nThe slopes() function can use the ellipsis (...) to push any argument forward to the posterior_predict function. This can alter the types of predictions returned. For example, the re_formula=NA argument of the posterior_predict.brmsfit method will compute marginaleffects without including any group-level effects:\n\nmfx <- slopes(\n mod,\n newdata = datagrid(\n civil_liberties = c(.2, .5, .8),\n party_autonomy = FALSE,\n region = \"Middle East and North Africa\"),\n variables = \"civil_liberties\",\n re_formula = NA) |>\n posterior_draws()\n\nggplot(mfx, aes(x = draw, fill = factor(civil_liberties))) +\n stat_halfeye(slab_alpha = .5) +\n labs(x = \"Marginal effect of Civil Liberties on Media Index\",\n y = \"Posterior density\",\n fill = \"Civil liberties\")\n\n\n\n\n\n8.2.4 Global grand mean\n\npred <- predictions(\n mod,\n re_formula = NA,\n newdata = datagrid(party_autonomy = c(TRUE, FALSE))) |>\n posterior_draws()\n\nmfx <- slopes(\n mod,\n re_formula = NA,\n variables = \"party_autonomy\") |>\n posterior_draws()\n\nplot1 <- ggplot(pred, aes(x = draw, fill = party_autonomy)) +\n stat_halfeye(slab_alpha = .5) +\n labs(x = \"Media index (Predicted)\",\n y = \"Posterior density\",\n fill = \"Party autonomy\")\n\nplot2 <- ggplot(mfx, aes(x = draw)) +\n stat_halfeye(slab_alpha = .5) +\n labs(x = \"Contrast: Party autonomy TRUE - FALSE\",\n y = \"\",\n fill = \"Party autonomy\")\n\n## combine plots using the `patchwork` package\nplot1 + plot2\n\n\n\n\n\n8.2.5 Region-specific predictions and contrasts\nPredicted media index by region and level of civil liberties:\n\npred <- predictions(mod,\n newdata = datagrid(region = vdem_2015$region,\n party_autonomy = FALSE, \n civil_liberties = seq(0, 1, length.out = 100))) |> \n posterior_draws()\n\nggplot(pred, aes(x = civil_liberties, y = draw)) +\n stat_lineribbon() +\n scale_fill_brewer(palette = \"Reds\") +\n facet_wrap(~ region) +\n labs(x = \"Civil liberties\",\n y = \"Media index (predicted)\",\n fill = \"\")\n\n\n\n\nPredicted media index by region and level of civil liberties:\n\npred <- predictions(mod,\n newdata = datagrid(region = vdem_2015$region,\n civil_liberties = c(.2, .8),\n party_autonomy = FALSE)) |>\n posterior_draws()\n\nggplot(pred, aes(x = draw, fill = factor(civil_liberties))) +\n stat_halfeye(slab_alpha = .5) +\n facet_wrap(~ region) +\n labs(x = \"Media index (predicted)\",\n y = \"Posterior density\",\n fill = \"Civil liberties\")\n\n\n\n\nPredicted media index by region and party autonomy:\n\npred <- predictions(mod,\n newdata = datagrid(region = vdem_2015$region,\n party_autonomy = c(TRUE, FALSE),\n civil_liberties = .5)) |>\n posterior_draws()\n\nggplot(pred, aes(x = draw, y = region , fill = party_autonomy)) +\n stat_halfeye(slab_alpha = .5) +\n labs(x = \"Media index (predicted)\",\n y = \"\",\n fill = \"Party autonomy\")\n\n\n\n\nTRUE/FALSE contrasts (marginal effects) of party autonomy by region:\n\nmfx <- slopes(\n mod,\n variables = \"party_autonomy\",\n newdata = datagrid(\n region = vdem_2015$region,\n civil_liberties = .5)) |>\n posterior_draws()\n\nggplot(mfx, aes(x = draw, y = region , fill = party_autonomy)) +\n stat_halfeye(slab_alpha = .5) +\n labs(x = \"Media index (predicted)\",\n y = \"\",\n fill = \"Party autonomy\")\n\n\n\n\n\n8.2.6 Hypothetical groups\nWe can also obtain predictions or marginal effects for a hypothetical group instead of one of the observed regions. To achieve this, we create a dataset with NA in the region column. Then we call the marginaleffects or predictions() functions with the allow_new_levels argument. This argument is pushed through via the ellipsis (...) to the posterior_epred function of the brms package:\n\ndat <- data.frame(civil_liberties = .5,\n party_autonomy = FALSE,\n region = \"New Region\")\n\nmfx <- slopes(\n mod,\n variables = \"party_autonomy\",\n allow_new_levels = TRUE,\n newdata = dat)\n\ndraws <- posterior_draws(mfx)\n\nggplot(draws, aes(x = draw)) +\n stat_halfeye() +\n labs(x = \"Marginal effect of party autonomy in a generic world region\", y = \"\")\n\n\n\n\n\n8.2.7 Averaging, marginalizing, integrating random effects\nConsider a logistic regression model with random effects:\n\ndat <- read.csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/plm/EmplUK.csv\")\ndat$x <- as.numeric(dat$output > median(dat$output))\ndat$y <- as.numeric(dat$emp > median(dat$emp))\nmod <- brm(y ~ x + (1 | firm), data = dat, backend = \"cmdstanr\", family = \"bernoulli\")\n\nWe can compute adjusted predictions for a given value of x and for each firm (random effects) as follows:\n\np <- predictions(mod, newdata = datagrid(x = 0, firm = unique))\nhead(p)\n#> \n#> x firm Estimate 2.5 % 97.5 %\n#> 0 1 1.0e+00 9.01e-01 1.0000\n#> 0 2 1.0e+00 8.95e-01 1.0000\n#> 0 3 1.0e+00 9.12e-01 1.0000\n#> 0 4 1.0e+00 7.97e-01 1.0000\n#> 0 5 1.0e+00 9.09e-01 1.0000\n#> 0 6 4.9e-08 8.42e-21 0.0019\n#> \n#> Columns: rowid, estimate, conf.low, conf.high, y, x, firm \n#> Type: response\n\nWe can average/marginalize/integrate across random effects with the avg_predictions() function or the by argument:\n\navg_predictions(mod, newdata = datagrid(x = 0, firm = unique))\n#> \n#> Estimate 2.5 % 97.5 %\n#> 0.454 0.44 0.468\n#> \n#> Columns: estimate, conf.low, conf.high \n#> Type: response\n\npredictions(mod, newdata = datagrid(x = 0:1, firm = unique), by = \"x\")\n#> \n#> x Estimate 2.5 % 97.5 %\n#> 0 0.454 0.440 0.468\n#> 1 0.557 0.546 0.570\n#> \n#> Columns: x, estimate, conf.low, conf.high \n#> Type: response\n\nWe can also draw from the (assumed gaussian) population distribution of random effects, by asking predictions() to make predictions for new “levels” of the random effects. If we then take an average of predictions using avg_predictions() or the by argument, we will have “integrated out the random effects”, as described in the brmsmargins package vignette. In the code below, we make predictions for 100 firm identifiers which were not in the original dataset. We also ask predictions() to push forward the allow_new_levels and sample_new_levels arguments to the brms::posterior_epred function:\n\npredictions(\n mod,\n newdata = datagrid(x = 0:1, firm = -1:-100),\n allow_new_levels = TRUE,\n sample_new_levels = \"gaussian\",\n by = \"x\")\n#> \n#> x Estimate 2.5 % 97.5 %\n#> 0 0.454 0.338 0.565\n#> 1 0.552 0.436 0.664\n#> \n#> Columns: x, estimate, conf.low, conf.high \n#> Type: response\n\nWe can “integrate out” random effects in the other slopes() functions too. For instance,\n\navg_comparisons(\n mod,\n newdata = datagrid(firm = -1:-100),\n allow_new_levels = TRUE,\n sample_new_levels = \"gaussian\")\n#> \n#> Term Contrast Estimate 2.5 % 97.5 %\n#> x 1 - 0 0.0965 0.0494 0.162\n#> \n#> Columns: term, contrast, estimate, conf.low, conf.high \n#> Type: response\n\nThis is nearly equivalent the brmsmargins command output (with slight variations due to different random seeds):\n\nlibrary(brmsmargins)\nbm <- brmsmargins(\n k = 100,\n object = mod,\n at = data.frame(x = c(0, 1)),\n CI = .95,\n CIType = \"ETI\",\n contrasts = cbind(\"AME x\" = c(-1, 1)),\n effects = \"integrateoutRE\")\nbm$ContrastSummary |> data.frame()\n#> M Mdn LL UL PercentROPE PercentMID CI CIType ROPE MID Label\n#> 1 0.09864399 0.09651684 0.04835076 0.1610664 NA NA 0.95 ETI <NA> <NA> AME x\n\nSee the alternative software vignette for more information on brmsmargins."
},
{
"objectID": "articles/brms.html#multinomial-logit",
@@ -501,14 +501,14 @@
"href": "articles/bootstrap.html#bootstrap",
"title": "\n9 Bootstrap & Simulation\n",
"section": "\n9.2 Bootstrap",
- "text": "9.2 Bootstrap\nmarginaleffects supports three bootstrap frameworks in R: the well-established boot package, the newer rsample package, and the so-called “bayesian bootstrap” in fwb.\n\n9.2.1 boot\n\n\navg_comparisons(mod, by = \"Species\", variables = \"Petal.Width\") |>\n inferences(method = \"boot\")\n#> \n#> Term Contrast Species Estimate Std. Error 2.5 % 97.5 %\n#> Petal.Width mean(+1) setosa -0.1103 0.257 -0.581 0.408\n#> Petal.Width mean(+1) versicolor -0.0201 0.156 -0.299 0.302\n#> Petal.Width mean(+1) virginica 0.0216 0.183 -0.339 0.373\n#> \n#> Columns: term, contrast, Species, estimate, predicted_lo, predicted_hi, predicted, std.error, conf.low, conf.high \n#> Type: response\n\nAll unknown arguments that we feed to inferences() are pushed forward to boot::boot():\n\nest <- avg_comparisons(mod, by = \"Species\", variables = \"Petal.Width\") |>\n inferences(method = \"boot\", sim = \"balanced\", R = 500, conf_type = \"bca\")\nest\n#> \n#> Term Contrast Species Estimate Std. Error 2.5 % 97.5 %\n#> Petal.Width mean(+1) setosa -0.1103 0.267 -0.643 0.384\n#> Petal.Width mean(+1) versicolor -0.0201 0.164 -0.325 0.352\n#> Petal.Width mean(+1) virginica 0.0216 0.193 -0.349 0.388\n#> \n#> Columns: term, contrast, Species, estimate, predicted_lo, predicted_hi, predicted, std.error, conf.low, conf.high \n#> Type: response\n\nWe can extract the original boot object from an attribute:\n\nattr(est, \"inferences\")\n#> \n#> BALANCED BOOTSTRAP\n#> \n#> \n#> Call:\n#> bootstrap_boot(model = model, INF_FUN = INF_FUN, newdata = ..1, \n#> vcov = ..2, variables = ..3, type = ..4, by = ..5, conf_level = ..6, \n#> cross = ..7, comparison = ..8, transform = ..9, wts = ..10, \n#> hypothesis = ..11, eps = ..12)\n#> \n#> \n#> Bootstrap Statistics :\n#> original bias std. error\n#> t1* -0.11025325 0.005684363 0.2671518\n#> t2* -0.02006005 0.002562855 0.1637331\n#> t3* 0.02158742 0.001121472 0.1926193\n\nOr we can extract the individual draws with the posterior_draws() function:\n\nposterior_draws(est) |> head()\n#> drawid draw term contrast Species estimate predicted_lo predicted_hi predicted std.error conf.low conf.high\n#> 1 1 0.05380033 Petal.Width mean(+1) setosa -0.11025325 5.013640 4.901389 4.957514 0.2671518 -0.6426428 0.3836699\n#> 2 1 0.10305571 Petal.Width mean(+1) versicolor -0.02006005 6.330887 6.325011 6.327949 0.1637331 -0.3247546 0.3521155\n#> 3 1 0.12579979 Petal.Width mean(+1) virginica 0.02158742 6.997499 7.033528 7.015513 0.1926193 -0.3493749 0.3881292\n#> 4 2 -0.27902802 Petal.Width mean(+1) setosa -0.11025325 5.013640 4.901389 4.957514 0.2671518 -0.6426428 0.3836699\n#> 5 2 -0.32378525 Petal.Width mean(+1) versicolor -0.02006005 6.330887 6.325011 6.327949 0.1637331 -0.3247546 0.3521155\n#> 6 2 -0.34445228 Petal.Width mean(+1) virginica 0.02158742 6.997499 7.033528 7.015513 0.1926193 -0.3493749 0.3881292\n\nposterior_draws(est, shape = \"DxP\") |> dim()\n#> [1] 500 3\n\n\n9.2.2 rsample\n\nAs before, we can pass arguments to rsample::bootstraps() through inferences(). For example, for stratified resampling:\n\nest <- avg_comparisons(mod, by = \"Species\", variables = \"Petal.Width\") |>\n inferences(method = \"rsample\", R = 100, strata = \"Species\")\nest\n#> \n#> Term Contrast Species Estimate 2.5 % 97.5 %\n#> Petal.Width mean(+1) setosa -0.1103 -0.559 0.470\n#> Petal.Width mean(+1) versicolor -0.0201 -0.357 0.309\n#> Petal.Width mean(+1) virginica 0.0216 -0.411 0.370\n#> \n#> Columns: term, contrast, Species, estimate, predicted_lo, predicted_hi, predicted, conf.low, conf.high \n#> Type: response\n\nattr(est, \"inferences\")\n#> # Bootstrap sampling using stratification with apparent sample \n#> # A tibble: 101 × 3\n#> splits id estimates \n#> <list> <chr> <list> \n#> 1 <split [150/56]> Bootstrap001 <tibble [3 × 7]>\n#> 2 <split [150/56]> Bootstrap002 <tibble [3 × 7]>\n#> 3 <split [150/54]> Bootstrap003 <tibble [3 × 7]>\n#> 4 <split [150/63]> Bootstrap004 <tibble [3 × 7]>\n#> 5 <split [150/54]> Bootstrap005 <tibble [3 × 7]>\n#> 6 <split [150/56]> Bootstrap006 <tibble [3 × 7]>\n#> 7 <split [150/58]> Bootstrap007 <tibble [3 × 7]>\n#> 8 <split [150/53]> Bootstrap008 <tibble [3 × 7]>\n#> 9 <split [150/54]> Bootstrap009 <tibble [3 × 7]>\n#> 10 <split [150/54]> Bootstrap010 <tibble [3 × 7]>\n#> # ℹ 91 more rows\n\nOr we can extract the individual draws with the posterior_draws() function:\n\nposterior_draws(est) |> head()\n#> drawid draw term contrast Species estimate predicted_lo predicted_hi predicted conf.low conf.high\n#> 1 1 -0.02715873 Petal.Width mean(+1) setosa -0.11025325 5.013640 4.901389 4.957514 -0.5590238 0.4703151\n#> 2 1 -0.20167877 Petal.Width mean(+1) versicolor -0.02006005 6.330887 6.325011 6.327949 -0.3570730 0.3088383\n#> 3 1 -0.28226487 Petal.Width mean(+1) virginica 0.02158742 6.997499 7.033528 7.015513 -0.4111337 0.3704871\n#> 4 2 -0.17211812 Petal.Width mean(+1) setosa -0.11025325 5.013640 4.901389 4.957514 -0.5590238 0.4703151\n#> 5 2 -0.34497272 Petal.Width mean(+1) versicolor -0.02006005 6.330887 6.325011 6.327949 -0.3570730 0.3088383\n#> 6 2 -0.42478979 Petal.Width mean(+1) virginica 0.02158742 6.997499 7.033528 7.015513 -0.4111337 0.3704871\n\nposterior_draws(est, shape = \"PxD\") |> dim()\n#> [1] 3 100\n\n\n9.2.3 Fractional Weighted Bootstrap (aka Bayesian Bootstrap)\nThe fwb package implements fractional weighted bootstrap (aka Bayesian bootstrap):\n\n“fwb implements the fractional weighted bootstrap (FWB), also known as the Bayesian bootstrap, following the treatment by Xu et al. (2020). The FWB involves generating sets of weights from a uniform Dirichlet distribution to be used in estimating statistics of interest, which yields a posterior distribution that can be interpreted in the same way the traditional (resampling-based) bootstrap distribution can be.” -Noah Greifer\n\nThe inferences() function makes it easy to apply this inference strategy to marginaleffects objects:\n\navg_comparisons(mod) |> inferences(method = \"fwb\")\n#> \n#> Term Contrast Estimate Std. Error 2.5 % 97.5 %\n#> Petal.Length +1 0.8929 0.0789 0.738 1.047\n#> Petal.Width +1 -0.0362 0.1550 -0.326 0.266\n#> Species versicolor - setosa -1.4629 0.3313 -2.158 -0.779\n#> Species virginica - setosa -1.9842 0.3952 -2.783 -1.190\n#> \n#> Columns: term, contrast, estimate, std.error, conf.low, conf.high \n#> Type: response"
+ "text": "9.2 Bootstrap\nmarginaleffects supports three bootstrap frameworks in R: the well-established boot package, the newer rsample package, and the so-called “bayesian bootstrap” in fwb.\n\n9.2.1 boot\n\n\navg_comparisons(mod, by = \"Species\", variables = \"Petal.Width\") |>\n inferences(method = \"boot\")\n#> \n#> Term Contrast Species Estimate Std. Error 2.5 % 97.5 %\n#> Petal.Width mean(+1) setosa -0.1103 0.267 -0.625 0.443\n#> Petal.Width mean(+1) versicolor -0.0201 0.162 -0.340 0.327\n#> Petal.Width mean(+1) virginica 0.0216 0.182 -0.335 0.368\n#> \n#> Columns: term, contrast, Species, estimate, predicted_lo, predicted_hi, predicted, std.error, conf.low, conf.high \n#> Type: response\n\nAll unknown arguments that we feed to inferences() are pushed forward to boot::boot():\n\nest <- avg_comparisons(mod, by = \"Species\", variables = \"Petal.Width\") |>\n inferences(method = \"boot\", sim = \"balanced\", R = 500, conf_type = \"bca\")\nest\n#> \n#> Term Contrast Species Estimate Std. Error 2.5 % 97.5 %\n#> Petal.Width mean(+1) setosa -0.1103 0.266 -0.662 0.404\n#> Petal.Width mean(+1) versicolor -0.0201 0.162 -0.335 0.298\n#> Petal.Width mean(+1) virginica 0.0216 0.184 -0.344 0.377\n#> \n#> Columns: term, contrast, Species, estimate, predicted_lo, predicted_hi, predicted, std.error, conf.low, conf.high \n#> Type: response\n\nWe can extract the original boot object from an attribute:\n\nattr(est, \"inferences\")\n#> \n#> BALANCED BOOTSTRAP\n#> \n#> \n#> Call:\n#> bootstrap_boot(model = model, INF_FUN = INF_FUN, newdata = ..1, \n#> vcov = ..2, variables = ..3, type = ..4, by = ..5, conf_level = ..6, \n#> cross = ..7, comparison = ..8, transform = ..9, wts = ..10, \n#> hypothesis = ..11, eps = ..12)\n#> \n#> \n#> Bootstrap Statistics :\n#> original bias std. error\n#> t1* -0.11025325 0.003230574 0.2663606\n#> t2* -0.02006005 0.003873671 0.1615387\n#> t3* 0.02158742 0.004170627 0.1837267\n\nOr we can extract the individual draws with the posterior_draws() function:\n\nposterior_draws(est) |> head()\n#> drawid draw term contrast Species estimate predicted_lo predicted_hi predicted std.error conf.low conf.high\n#> 1 1 -0.031097605 Petal.Width mean(+1) setosa -0.11025325 5.013640 4.901389 4.957514 0.2663606 -0.6618164 0.4041951\n#> 2 1 -0.010926106 Petal.Width mean(+1) versicolor -0.02006005 6.330887 6.325011 6.327949 0.1615387 -0.3352636 0.2983942\n#> 3 1 -0.001611747 Petal.Width mean(+1) virginica 0.02158742 6.997499 7.033528 7.015513 0.1837267 -0.3438075 0.3769131\n#> 4 2 -0.403310043 Petal.Width mean(+1) setosa -0.11025325 5.013640 4.901389 4.957514 0.2663606 -0.6618164 0.4041951\n#> 5 2 -0.057683806 Petal.Width mean(+1) versicolor -0.02006005 6.330887 6.325011 6.327949 0.1615387 -0.3352636 0.2983942\n#> 6 2 0.101912012 Petal.Width mean(+1) virginica 0.02158742 6.997499 7.033528 7.015513 0.1837267 -0.3438075 0.3769131\n\nposterior_draws(est, shape = \"DxP\") |> dim()\n#> [1] 500 3\n\n\n9.2.2 rsample\n\nAs before, we can pass arguments to rsample::bootstraps() through inferences(). For example, for stratified resampling:\n\nest <- avg_comparisons(mod, by = \"Species\", variables = \"Petal.Width\") |>\n inferences(method = \"rsample\", R = 100, strata = \"Species\")\nest\n#> \n#> Term Contrast Species Estimate 2.5 % 97.5 %\n#> Petal.Width mean(+1) setosa -0.1103 -0.650 0.510\n#> Petal.Width mean(+1) versicolor -0.0201 -0.371 0.264\n#> Petal.Width mean(+1) virginica 0.0216 -0.262 0.322\n#> \n#> Columns: term, contrast, Species, estimate, predicted_lo, predicted_hi, predicted, conf.low, conf.high \n#> Type: response\n\nattr(est, \"inferences\")\n#> # Bootstrap sampling using stratification with apparent sample \n#> # A tibble: 101 × 3\n#> splits id estimates \n#> <list> <chr> <list> \n#> 1 <split [150/47]> Bootstrap001 <tibble [3 × 7]>\n#> 2 <split [150/48]> Bootstrap002 <tibble [3 × 7]>\n#> 3 <split [150/57]> Bootstrap003 <tibble [3 × 7]>\n#> 4 <split [150/58]> Bootstrap004 <tibble [3 × 7]>\n#> 5 <split [150/55]> Bootstrap005 <tibble [3 × 7]>\n#> 6 <split [150/53]> Bootstrap006 <tibble [3 × 7]>\n#> 7 <split [150/55]> Bootstrap007 <tibble [3 × 7]>\n#> 8 <split [150/63]> Bootstrap008 <tibble [3 × 7]>\n#> 9 <split [150/54]> Bootstrap009 <tibble [3 × 7]>\n#> 10 <split [150/55]> Bootstrap010 <tibble [3 × 7]>\n#> # ℹ 91 more rows\n\nOr we can extract the individual draws with the posterior_draws() function:\n\nposterior_draws(est) |> head()\n#> drawid draw term contrast Species estimate predicted_lo predicted_hi predicted conf.low conf.high\n#> 1 1 0.2677437 Petal.Width mean(+1) setosa -0.11025325 5.013640 4.901389 4.957514 -0.6497296 0.5100013\n#> 2 1 0.1692484 Petal.Width mean(+1) versicolor -0.02006005 6.330887 6.325011 6.327949 -0.3712161 0.2635059\n#> 3 1 0.1237673 Petal.Width mean(+1) virginica 0.02158742 6.997499 7.033528 7.015513 -0.2617070 0.3222299\n#> 4 2 -0.9913947 Petal.Width mean(+1) setosa -0.11025325 5.013640 4.901389 4.957514 -0.6497296 0.5100013\n#> 5 2 -0.4053751 Petal.Width mean(+1) versicolor -0.02006005 6.330887 6.325011 6.327949 -0.3712161 0.2635059\n#> 6 2 -0.1347756 Petal.Width mean(+1) virginica 0.02158742 6.997499 7.033528 7.015513 -0.2617070 0.3222299\n\nposterior_draws(est, shape = \"PxD\") |> dim()\n#> [1] 3 100\n\n\n9.2.3 Fractional Weighted Bootstrap (aka Bayesian Bootstrap)\nThe fwb package implements fractional weighted bootstrap (aka Bayesian bootstrap):\n\n“fwb implements the fractional weighted bootstrap (FWB), also known as the Bayesian bootstrap, following the treatment by Xu et al. (2020). The FWB involves generating sets of weights from a uniform Dirichlet distribution to be used in estimating statistics of interest, which yields a posterior distribution that can be interpreted in the same way the traditional (resampling-based) bootstrap distribution can be.” -Noah Greifer\n\nThe inferences() function makes it easy to apply this inference strategy to marginaleffects objects:\n\navg_comparisons(mod) |> inferences(method = \"fwb\")\n#> \n#> Term Contrast Estimate Std. Error 2.5 % 97.5 %\n#> Petal.Length +1 0.8929 0.0798 0.723 1.040\n#> Petal.Width +1 -0.0362 0.1620 -0.342 0.286\n#> Species versicolor - setosa -1.4629 0.3393 -2.131 -0.831\n#> Species virginica - setosa -1.9842 0.3973 -2.732 -1.231\n#> \n#> Columns: term, contrast, estimate, std.error, conf.low, conf.high \n#> Type: response"
},
{
"objectID": "articles/bootstrap.html#simulation-based-inference",
"href": "articles/bootstrap.html#simulation-based-inference",
"title": "\n9 Bootstrap & Simulation\n",
"section": "\n9.3 Simulation-based inference",
- "text": "9.3 Simulation-based inference\nThis simulation-based strategy to compute confidence intervals was described in Krinsky & Robb (1986) and popularized by King, Tomz, Wittenberg (2000). We proceed in 3 steps:\n\nDraw R sets of simulated coefficients from a multivariate normal distribution with mean equal to the original model’s estimated coefficients and variance equal to the model’s variance-covariance matrix (classical, “HC3”, or other).\nUse the R sets of coefficients to compute R sets of estimands: predictions, comparisons, or slopes.\nTake quantiles of the resulting distribution of estimands to obtain a confidence interval and the standard deviation of simulated estimates to estimate the standard error.\n\nHere are a few examples:\n\nlibrary(ggplot2)\nlibrary(ggdist)\n\navg_comparisons(mod, by = \"Species\", variables = \"Petal.Width\") |>\n inferences(method = \"simulation\")\n#> \n#> Term Contrast Species Estimate Std. Error 2.5 % 97.5 %\n#> Petal.Width mean(+1) setosa -0.1103 0.291 -0.668 0.505\n#> Petal.Width mean(+1) versicolor -0.0201 0.162 -0.327 0.302\n#> Petal.Width mean(+1) virginica 0.0216 0.174 -0.324 0.327\n#> \n#> Columns: term, contrast, Species, estimate, std.error, conf.low, conf.high, predicted_lo, predicted_hi, predicted, tmp_idx \n#> Type: response\n\nSince simulation based inference generates R estimates of the quantities of interest, we can treat them similarly to draws from the posterior distribution in bayesian models. For example, we can extract draws using the posterior_draws() function, and plot their distributions using packages likeggplot2 and ggdist:\n\navg_comparisons(mod, by = \"Species\", variables = \"Petal.Width\") |>\n inferences(method = \"simulation\") |>\n posterior_draws(\"rvar\") |>\n ggplot(aes(y = Species, xdist = rvar)) +\n stat_slabinterval()"
+ "text": "9.3 Simulation-based inference\nThis simulation-based strategy to compute confidence intervals was described in Krinsky & Robb (1986) and popularized by King, Tomz, Wittenberg (2000). We proceed in 3 steps:\n\nDraw R sets of simulated coefficients from a multivariate normal distribution with mean equal to the original model’s estimated coefficients and variance equal to the model’s variance-covariance matrix (classical, “HC3”, or other).\nUse the R sets of coefficients to compute R sets of estimands: predictions, comparisons, or slopes.\nTake quantiles of the resulting distribution of estimands to obtain a confidence interval and the standard deviation of simulated estimates to estimate the standard error.\n\nHere are a few examples:\n\nlibrary(ggplot2)\nlibrary(ggdist)\n\navg_comparisons(mod, by = \"Species\", variables = \"Petal.Width\") |>\n inferences(method = \"simulation\")\n#> \n#> Term Contrast Species Estimate Std. Error 2.5 % 97.5 %\n#> Petal.Width mean(+1) setosa -0.1103 0.272 -0.636 0.435\n#> Petal.Width mean(+1) versicolor -0.0201 0.160 -0.338 0.285\n#> Petal.Width mean(+1) virginica 0.0216 0.172 -0.333 0.350\n#> \n#> Columns: term, contrast, Species, estimate, std.error, conf.low, conf.high, predicted_lo, predicted_hi, predicted, tmp_idx \n#> Type: response\n\nSince simulation based inference generates R estimates of the quantities of interest, we can treat them similarly to draws from the posterior distribution in bayesian models. For example, we can extract draws using the posterior_draws() function, and plot their distributions using packages likeggplot2 and ggdist:\n\navg_comparisons(mod, by = \"Species\", variables = \"Petal.Width\") |>\n inferences(method = \"simulation\") |>\n posterior_draws(\"rvar\") |>\n ggplot(aes(y = Species, xdist = rvar)) +\n stat_slabinterval()"
},
{
"objectID": "articles/bootstrap.html#multiple-imputation-and-missing-data",
@@ -522,7 +522,7 @@
"href": "articles/categorical.html#masspolr-function",
"title": "\n10 Categorical outcomes\n",
"section": "\n10.1 MASS::polr function",
- "text": "10.1 MASS::polr function\nConsider a simple ordered logit model in which we predict the number of gears of a car based its miles per gallon and horsepower:\n\nlibrary(MASS)\nmod <- polr(factor(gear) ~ mpg + hp, data = mtcars, Hess = TRUE)\n\nNow, consider a car with 25 miles per gallon and 110 horsepower. The expected predicted probability for each outcome level (gear) for this car is:\n\npredictions(mod, newdata = datagrid(mpg = 25, hp = 110))\n#> \n#> Group mpg hp Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 3 25 110 0.203 0.0959 2.12 0.0339 4.9 0.0155 0.391\n#> 4 25 110 0.578 0.1229 4.70 <0.001 18.6 0.3373 0.819\n#> 5 25 110 0.218 0.1007 2.17 0.0302 5.1 0.0209 0.416\n#> \n#> Columns: rowid, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, gear, mpg, hp \n#> Type: probs\n\nSince the gear is categorical, we make one prediction for each level of the outcome.\nNow consider the marginal effects (aka slopes or partial derivatives) for the same car:\n\nslopes(mod, variables = \"mpg\", newdata = datagrid(mpg = 25, hp = 110))\n#> \n#> Group Term mpg hp Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 3 mpg 25 110 -0.06041 0.0169 -3.5802 <0.001 11.5 -0.09348 -0.0273\n#> 4 mpg 25 110 -0.00321 0.0335 -0.0958 0.9237 0.1 -0.06896 0.0625\n#> 5 mpg 25 110 0.06362 0.0301 2.1129 0.0346 4.9 0.00461 0.1226\n#> \n#> Columns: rowid, term, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp, predicted_lo, predicted_hi, predicted, gear \n#> Type: probs\n\nAgain, marginaleffects produces one estimate of the slope for each outcome level. For a small step size \\(\\varepsilon\\), the printed quantities are estimated as:\n\\[\\frac{P(gear=3|mpg=25+\\varepsilon, hp=110)-P(gear=3|mpg=25-\\varepsilon, hp=110)}{2 \\cdot \\varepsilon}\\] \\[\\frac{P(gear=4|mpg=25+\\varepsilon, hp=110)-P(gear=4|mpg=25-\\varepsilon, hp=110)}{2 \\cdot \\varepsilon}\\] \\[\\frac{P(gear=5|mpg=25+\\varepsilon, hp=110)-P(gear=5|mpg=25-\\varepsilon, hp=110)}{2 \\cdot \\varepsilon}\\]\nWhen we call avg_slopes(), marginaleffects will repeat the same computation for every row of the original dataset, and then report the average slope for each level of the outcome:\n\navg_slopes(mod)\n#> \n#> Group Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 3 hp -0.00377 0.001514 -2.49 0.01285 6.3 -0.006735 -0.00080\n#> 3 mpg -0.07014 0.015484 -4.53 < 0.001 17.4 -0.100488 -0.03979\n#> 4 hp 0.00201 0.000958 2.10 0.03555 4.8 0.000136 0.00389\n#> 4 mpg 0.03747 0.013861 2.70 0.00687 7.2 0.010303 0.06464\n#> 5 hp 0.00175 0.000833 2.11 0.03519 4.8 0.000122 0.00339\n#> 5 mpg 0.03267 0.009571 3.41 < 0.001 10.6 0.013909 0.05143\n#> \n#> Columns: term, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: probs"
+ "text": "10.1 MASS::polr function\nConsider a simple ordered logit model in which we predict the number of gears of a car based its miles per gallon and horsepower:\n\nlibrary(MASS)\nmod <- polr(factor(gear) ~ mpg + hp, data = mtcars, Hess = TRUE)\n\nNow, consider a car with 25 miles per gallon and 110 horsepower. The expected predicted probability for each outcome level (gear) for this car is:\n\npredictions(mod, newdata = datagrid(mpg = 25, hp = 110))\n#> \n#> Group mpg hp Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 3 25 110 0.203 0.0959 2.12 0.0339 4.9 0.0155 0.391\n#> 4 25 110 0.578 0.1229 4.70 <0.001 18.6 0.3373 0.819\n#> 5 25 110 0.218 0.1007 2.17 0.0302 5.1 0.0209 0.416\n#> \n#> Columns: rowid, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, gear, mpg, hp \n#> Type: probs\n\nSince the gear is categorical, we make one prediction for each level of the outcome.\nNow consider the marginal effects (aka slopes or partial derivatives) for the same car:\n\nslopes(mod, variables = \"mpg\", newdata = datagrid(mpg = 25, hp = 110))\n#> \n#> Group Term mpg hp Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 3 mpg 25 110 -0.06041 0.0169 -3.5813 <0.001 11.5 -0.09347 -0.0273\n#> 4 mpg 25 110 -0.00321 0.0335 -0.0958 0.9237 0.1 -0.06896 0.0625\n#> 5 mpg 25 110 0.06362 0.0301 2.1132 0.0346 4.9 0.00461 0.1226\n#> \n#> Columns: rowid, term, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, hp, predicted_lo, predicted_hi, predicted, gear \n#> Type: probs\n\nAgain, marginaleffects produces one estimate of the slope for each outcome level. For a small step size \\(\\varepsilon\\), the printed quantities are estimated as:\n\\[\\frac{P(gear=3|mpg=25+\\varepsilon, hp=110)-P(gear=3|mpg=25-\\varepsilon, hp=110)}{2 \\cdot \\varepsilon}\\] \\[\\frac{P(gear=4|mpg=25+\\varepsilon, hp=110)-P(gear=4|mpg=25-\\varepsilon, hp=110)}{2 \\cdot \\varepsilon}\\] \\[\\frac{P(gear=5|mpg=25+\\varepsilon, hp=110)-P(gear=5|mpg=25-\\varepsilon, hp=110)}{2 \\cdot \\varepsilon}\\]\nWhen we call avg_slopes(), marginaleffects will repeat the same computation for every row of the original dataset, and then report the average slope for each level of the outcome:\n\navg_slopes(mod)\n#> \n#> Group Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 3 hp -0.00377 0.001514 -2.49 0.01284 6.3 -0.006735 -0.00080\n#> 3 mpg -0.07014 0.015485 -4.53 < 0.001 17.4 -0.100490 -0.03979\n#> 4 hp 0.00201 0.000957 2.10 0.03553 4.8 0.000136 0.00389\n#> 4 mpg 0.03747 0.013861 2.70 0.00687 7.2 0.010303 0.06464\n#> 5 hp 0.00175 0.000833 2.11 0.03519 4.8 0.000122 0.00339\n#> 5 mpg 0.03267 0.009572 3.41 < 0.001 10.6 0.013907 0.05143\n#> \n#> Columns: term, group, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: probs"
},
{
"objectID": "articles/categorical.html#nnet-package",
@@ -557,7 +557,7 @@
"href": "articles/gformula.html#example-with-real-world-data",
"title": "\n11 Causal Inference (G-Computation)\n",
"section": "\n11.3 Example with real-world data",
- "text": "11.3 Example with real-world data\nLet’s illustrate this method by replicating an example from Chapter 13 of Hernán and Robins. The data come from the National Health and Nutrition Examination Survey Data I Epidemiologic Follow-up Study (NHEFS). The outcome is wt82_71, a measure of weight gain. The treatment is qsmk, a binary measure of smoking cessation. There are many confounders.\nStep 1 is to fit a regression model of the outcome on the treatment and control variables:\n\nlibrary(boot)\nlibrary(marginaleffects)\n\nf <- wt82_71 ~ qsmk + sex + race + age + I(age * age) + factor(education) +\n smokeintensity + I(smokeintensity * smokeintensity) + smokeyrs +\n I(smokeyrs * smokeyrs) + factor(exercise) + factor(active) + wt71 +\n I(wt71 * wt71) + I(qsmk * smokeintensity)\n\nurl <- \"https://raw.githubusercontent.com/vincentarelbundock/modelarchive/main/data-raw/nhefs.csv\"\nnhefs <- read.csv(url)\nnhefs <- na.omit(nhefs[, all.vars(f)])\n\nfit <- glm(f, data = nhefs)\n\nSteps 2 and 3 require us to replicate the full dataset by setting the qsmk treatment to counterfactual values. We can do this automatically by calling comparisons().\n\n11.3.1 TLDR\nThese simple commands do everything we need to apply the parametric g-formula:\n\navg_comparisons(fit, variables = list(qsmk = 0:1))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n qsmk 1 - 0 3.52 0.44 7.99 <0.001 49.4 2.65 4.38\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\nThe rest of the vignette walks through the process in a bit more detail and compares to replication code from Hernán and Robins.\n\n11.3.2 Adjusted Predictions\nWe can compute average predictions in the original data, and average predictions in the two counterfactual datasets like this:\n\n## average predicted outcome in the original data\np <- predictions(fit)\nmean(p$estimate)\n\n[1] 2.6383\n\n## average predicted outcome in the two counterfactual datasets\np <- predictions(fit, newdata = datagrid(qsmk = 0:1, grid_type = \"counterfactual\"))\naggregate(estimate ~ qsmk, data = p, FUN = mean)\n\n qsmk estimate\n1 0 1.756213\n2 1 5.273587\n\n\nIn the R code that accompanies their book, Hernán and Robins compute the same quantities manually, as follows:\n\n## create a dataset with 3 copies of each subject\nnhefs$interv <- -1 # 1st copy: equal to original one\n\ninterv0 <- nhefs # 2nd copy: treatment set to 0, outcome to missing\ninterv0$interv <- 0\ninterv0$qsmk <- 0\ninterv0$wt82_71 <- NA\n\ninterv1 <- nhefs # 3rd copy: treatment set to 1, outcome to missing\ninterv1$interv <- 1\ninterv1$qsmk <- 1\ninterv1$wt82_71 <- NA\n\nonesample <- rbind(nhefs, interv0, interv1) # combining datasets\n\n## linear model to estimate mean outcome conditional on treatment and confounders\n## parameters are estimated using original observations only (nhefs)\n## parameter estimates are used to predict mean outcome for observations with \n## treatment set to 0 (interv=0) and to 1 (interv=1)\n\nstd <- glm(f, data = onesample)\nonesample$predicted_meanY <- predict(std, onesample)\n\n## estimate mean outcome in each of the groups interv=0, and interv=1\n## this mean outcome is a weighted average of the mean outcomes in each combination \n## of values of treatment and confounders, that is, the standardized outcome\nmean(onesample[which(onesample$interv == -1), ]$predicted_meanY)\n\n[1] 2.6383\n\nmean(onesample[which(onesample$interv == 0), ]$predicted_meanY)\n\n[1] 1.756213\n\nmean(onesample[which(onesample$interv == 1), ]$predicted_meanY)\n\n[1] 5.273587\n\n\nIt may be useful to note that the datagrid() function provided by marginaleffects can create counterfactual datasets automatically. This is equivalent to the onesample dataset:\n\nnd <- datagrid(\n model = fit,\n qsmk = c(0, 1),\n grid_type = \"counterfactual\")\n\n\n11.3.3 Contrast\nNow we want to compute the treatment effect with the parametric g-formula, which is the difference in average predicted outcomes in the two counterfactual datasets. This is equivalent to taking the average contrast with the comparisons() function. There are three important things to note in the command that follows:\n\nThe variables argument is used to indicate that we want to estimate a “contrast” between adjusted predictions when qsmk is equal to 1 or 0.\n\ncomparisons() automatically produces estimates of uncertainty.\n\n\navg_comparisons(std, variables = list(qsmk = 0:1))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n qsmk 1 - 0 3.52 0.44 7.99 <0.001 49.4 2.65 4.38\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\nUnder the hood, comparisons() did exactly what we described in the g-formula steps above:\nWe can obtain the same result by manually computing the quantities, using the replication code from Hernán and Robins:\n\nmean(onesample[which(onesample$interv == 1), ]$predicted_meanY) -\nmean(onesample[which(onesample$interv == 0), ]$predicted_meanY)\n\n[1] 3.517374\n\n\nAlthough manual computation is simple, it does not provide uncertainty estimates. In contrast, comparisons() has already computed the standard error and confidence interval using the delta method.\nInstead of the delta method, most analysts will rely on bootstrapping. For example, the replication code from Hernán and Robins does this:\n\n## function to calculate difference in means\nstandardization <- function(data, indices) {\n # create a dataset with 3 copies of each subject\n d <- data[indices, ] # 1st copy: equal to original one`\n d$interv <- -1\n d0 <- d # 2nd copy: treatment set to 0, outcome to missing\n d0$interv <- 0\n d0$qsmk <- 0\n d0$wt82_71 <- NA\n d1 <- d # 3rd copy: treatment set to 1, outcome to missing\n d1$interv <- 1\n d1$qsmk <- 1\n d1$wt82_71 <- NA\n d.onesample <- rbind(d, d0, d1) # combining datasets\n\n # linear model to estimate mean outcome conditional on treatment and confounders\n # parameters are estimated using original observations only (interv= -1)\n # parameter estimates are used to predict mean outcome for observations with set\n # treatment (interv=0 and interv=1)\n fit <- glm(f, data = d.onesample)\n\n d.onesample$predicted_meanY <- predict(fit, d.onesample)\n\n # estimate mean outcome in each of the groups interv=-1, interv=0, and interv=1\n return(mean(d.onesample$predicted_meanY[d.onesample$interv == 1]) -\n mean(d.onesample$predicted_meanY[d.onesample$interv == 0]))\n}\n\n## bootstrap\nresults <- boot(data = nhefs, statistic = standardization, R = 1000)\n\n## generating confidence intervals\nse <- sd(results$t[, 1])\nmeant0 <- results$t0\nll <- meant0 - qnorm(0.975) * se\nul <- meant0 + qnorm(0.975) * se\n\nbootstrap <- data.frame(\n \" \" = \"Treatment - No Treatment\",\n estimate = meant0,\n std.error = se,\n conf.low = ll,\n conf.high = ul,\n check.names = FALSE)\nbootstrap\n\n estimate std.error conf.low conf.high\n1 Treatment - No Treatment 3.517374 0.493746 2.54965 4.485099\n\n\nThe results are close to those that we obtained with comparisons(), but the confidence interval differs slightly because of the difference between bootstrapping and the delta method.\n\navg_comparisons(fit, variables = list(qsmk = 0:1))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n qsmk 1 - 0 3.52 0.44 7.99 <0.001 49.4 2.65 4.38\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response"
+ "text": "11.3 Example with real-world data\nLet’s illustrate this method by replicating an example from Chapter 13 of Hernán and Robins. The data come from the National Health and Nutrition Examination Survey Data I Epidemiologic Follow-up Study (NHEFS). The outcome is wt82_71, a measure of weight gain. The treatment is qsmk, a binary measure of smoking cessation. There are many confounders.\nStep 1 is to fit a regression model of the outcome on the treatment and control variables:\n\nlibrary(boot)\nlibrary(marginaleffects)\n\nf <- wt82_71 ~ qsmk + sex + race + age + I(age * age) + factor(education) +\n smokeintensity + I(smokeintensity * smokeintensity) + smokeyrs +\n I(smokeyrs * smokeyrs) + factor(exercise) + factor(active) + wt71 +\n I(wt71 * wt71) + I(qsmk * smokeintensity)\n\nurl <- \"https://raw.githubusercontent.com/vincentarelbundock/modelarchive/main/data-raw/nhefs.csv\"\nnhefs <- read.csv(url)\nnhefs <- na.omit(nhefs[, all.vars(f)])\n\nfit <- glm(f, data = nhefs)\n\nSteps 2 and 3 require us to replicate the full dataset by setting the qsmk treatment to counterfactual values. We can do this automatically by calling comparisons().\n\n11.3.1 TLDR\nThese simple commands do everything we need to apply the parametric g-formula:\n\navg_comparisons(fit, variables = list(qsmk = 0:1))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n qsmk 1 - 0 3.52 0.44 7.99 <0.001 49.4 2.65 4.38\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\nThe rest of the vignette walks through the process in a bit more detail and compares to replication code from Hernán and Robins.\n\n11.3.2 Adjusted Predictions\nWe can compute average predictions in the original data, and average predictions in the two counterfactual datasets like this:\n\n## average predicted outcome in the original data\np <- predictions(fit)\nmean(p$estimate)\n\n[1] 2.6383\n\n## average predicted outcome in the two counterfactual datasets\np <- predictions(fit, newdata = datagrid(qsmk = 0:1, grid_type = \"counterfactual\"))\naggregate(estimate ~ qsmk, data = p, FUN = mean)\n\n qsmk estimate\n1 0 1.756213\n2 1 5.273587\n\n\nIn the R code that accompanies their book, Hernán and Robins compute the same quantities manually, as follows:\n\n## create a dataset with 3 copies of each subject\nnhefs$interv <- -1 # 1st copy: equal to original one\n\ninterv0 <- nhefs # 2nd copy: treatment set to 0, outcome to missing\ninterv0$interv <- 0\ninterv0$qsmk <- 0\ninterv0$wt82_71 <- NA\n\ninterv1 <- nhefs # 3rd copy: treatment set to 1, outcome to missing\ninterv1$interv <- 1\ninterv1$qsmk <- 1\ninterv1$wt82_71 <- NA\n\nonesample <- rbind(nhefs, interv0, interv1) # combining datasets\n\n## linear model to estimate mean outcome conditional on treatment and confounders\n## parameters are estimated using original observations only (nhefs)\n## parameter estimates are used to predict mean outcome for observations with \n## treatment set to 0 (interv=0) and to 1 (interv=1)\n\nstd <- glm(f, data = onesample)\nonesample$predicted_meanY <- predict(std, onesample)\n\n## estimate mean outcome in each of the groups interv=0, and interv=1\n## this mean outcome is a weighted average of the mean outcomes in each combination \n## of values of treatment and confounders, that is, the standardized outcome\nmean(onesample[which(onesample$interv == -1), ]$predicted_meanY)\n\n[1] 2.6383\n\nmean(onesample[which(onesample$interv == 0), ]$predicted_meanY)\n\n[1] 1.756213\n\nmean(onesample[which(onesample$interv == 1), ]$predicted_meanY)\n\n[1] 5.273587\n\n\nIt may be useful to note that the datagrid() function provided by marginaleffects can create counterfactual datasets automatically. This is equivalent to the onesample dataset:\n\nnd <- datagrid(\n model = fit,\n qsmk = c(0, 1),\n grid_type = \"counterfactual\")\n\n\n11.3.3 Contrast\nNow we want to compute the treatment effect with the parametric g-formula, which is the difference in average predicted outcomes in the two counterfactual datasets. This is equivalent to taking the average contrast with the comparisons() function. There are three important things to note in the command that follows:\n\nThe variables argument is used to indicate that we want to estimate a “contrast” between adjusted predictions when qsmk is equal to 1 or 0.\n\ncomparisons() automatically produces estimates of uncertainty.\n\n\navg_comparisons(std, variables = list(qsmk = 0:1))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n qsmk 1 - 0 3.52 0.44 7.99 <0.001 49.4 2.65 4.38\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n\nUnder the hood, comparisons() did exactly what we described in the g-formula steps above:\nWe can obtain the same result by manually computing the quantities, using the replication code from Hernán and Robins:\n\nmean(onesample[which(onesample$interv == 1), ]$predicted_meanY) -\nmean(onesample[which(onesample$interv == 0), ]$predicted_meanY)\n\n[1] 3.517374\n\n\nAlthough manual computation is simple, it does not provide uncertainty estimates. In contrast, comparisons() has already computed the standard error and confidence interval using the delta method.\nInstead of the delta method, most analysts will rely on bootstrapping. For example, the replication code from Hernán and Robins does this:\n\n## function to calculate difference in means\nstandardization <- function(data, indices) {\n # create a dataset with 3 copies of each subject\n d <- data[indices, ] # 1st copy: equal to original one`\n d$interv <- -1\n d0 <- d # 2nd copy: treatment set to 0, outcome to missing\n d0$interv <- 0\n d0$qsmk <- 0\n d0$wt82_71 <- NA\n d1 <- d # 3rd copy: treatment set to 1, outcome to missing\n d1$interv <- 1\n d1$qsmk <- 1\n d1$wt82_71 <- NA\n d.onesample <- rbind(d, d0, d1) # combining datasets\n\n # linear model to estimate mean outcome conditional on treatment and confounders\n # parameters are estimated using original observations only (interv= -1)\n # parameter estimates are used to predict mean outcome for observations with set\n # treatment (interv=0 and interv=1)\n fit <- glm(f, data = d.onesample)\n\n d.onesample$predicted_meanY <- predict(fit, d.onesample)\n\n # estimate mean outcome in each of the groups interv=-1, interv=0, and interv=1\n return(mean(d.onesample$predicted_meanY[d.onesample$interv == 1]) -\n mean(d.onesample$predicted_meanY[d.onesample$interv == 0]))\n}\n\n## bootstrap\nresults <- boot(data = nhefs, statistic = standardization, R = 1000)\n\n## generating confidence intervals\nse <- sd(results$t[, 1])\nmeant0 <- results$t0\nll <- meant0 - qnorm(0.975) * se\nul <- meant0 + qnorm(0.975) * se\n\nbootstrap <- data.frame(\n \" \" = \"Treatment - No Treatment\",\n estimate = meant0,\n std.error = se,\n conf.low = ll,\n conf.high = ul,\n check.names = FALSE)\nbootstrap\n\n estimate std.error conf.low conf.high\n1 Treatment - No Treatment 3.517374 0.4765446 2.583364 4.451385\n\n\nThe results are close to those that we obtained with comparisons(), but the confidence interval differs slightly because of the difference between bootstrapping and the delta method.\n\navg_comparisons(fit, variables = list(qsmk = 0:1))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n qsmk 1 - 0 3.52 0.44 7.99 <0.001 49.4 2.65 4.38\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response"
},
{
"objectID": "articles/conformal.html#confidence-vs.-prediction-intervals",
@@ -599,7 +599,7 @@
"href": "articles/elasticity.html",
"title": "\n13 Elasticity\n",
"section": "",
- "text": "In some contexts, it is useful to interpret the results of a regression model in terms of elasticity or semi-elasticity. One strategy to achieve that is to estimate a log-log or a semilog model, where the left and/or right-hand side variables are logged. Another approach is to note that \\(\\frac{\\partial ln(x)}{\\partial x}=\\frac{1}{x}\\), and to post-process the marginal effects to transform them into elasticities or semi-elasticities.\nFor example, say we estimate a linear model of this form:\n\\[y = \\beta_0 + \\beta_1 x_1 + \\beta_2 x_2 + \\varepsilon\\]\nLet \\(\\hat{y}\\) be the adjusted prediction made by the model for some combination of covariates \\(x_1\\) and \\(x_2\\). The slope with respect to \\(x_1\\) (or “marginal effect”) is:\n\\[\\frac{\\partial \\hat{y}}{\\partial x_1}\\]\nWe can estimate the “eyex”, “eydx”, and “dyex” (semi-)elasticities with respect to \\(x_1\\) as follows:\n\\[\n\\eta_1=\\frac{\\partial \\hat{y}}{\\partial x_1}\\cdot \\frac{x_1}{\\hat{y}}\\\\\n\\eta_2=\\frac{\\partial \\hat{y}}{\\partial x_1}\\cdot \\frac{1}{\\hat{y}} \\\\\n\\eta_3=\\frac{\\partial \\hat{y}}{\\partial x_1}\\cdot x_1,\n\\]\nwith interpretations roughly as follows:\n\nA percentage point increase in \\(x_1\\) is associated to a \\(\\eta_1\\) percentage points increase in \\(y\\).\nA unit increase in \\(x_1\\) is associated to a \\(\\eta_2\\) percentage points increase in \\(y\\).\nA percentage point increase in \\(x_1\\) is associated to a \\(\\eta_3\\) units increase in \\(y\\).\n\nFor further intuition, consider the ratio of change in \\(y\\) to change in \\(x\\): \\(\\frac{\\Delta y}{\\Delta x}\\). We can turn this ratio into a ratio between relative changes by dividing both the numerator and the denominator: \\(\\frac{\\frac{\\Delta y}{y}}{\\frac{\\Delta x}{x}}\\). This is of course linked to the expression for the \\(\\eta_1\\) elasticity above.\nWith the marginaleffects package, these quantities are easy to compute:\n\nlibrary(marginaleffects)\nmod <- lm(mpg ~ hp + wt, data = mtcars)\n\navg_slopes(mod)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp -0.0318 0.00903 -3.52 <0.001 11.2 -0.0495 -0.0141\n#> wt -3.8778 0.63273 -6.13 <0.001 30.1 -5.1180 -2.6377\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\navg_slopes(mod, slope = \"eyex\")\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp eY/eX -0.285 0.0855 -3.34 <0.001 10.2 -0.453 -0.118\n#> wt eY/eX -0.746 0.1418 -5.26 <0.001 22.7 -1.024 -0.468\n#> \n#> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\navg_slopes(mod, slope = \"eydx\")\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp eY/dX -0.00173 0.000502 -3.46 <0.001 10.8 -0.00272 -0.000751\n#> wt eY/dX -0.21165 0.037849 -5.59 <0.001 25.4 -0.28583 -0.137464\n#> \n#> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\navg_slopes(mod, slope = \"dyex\")\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp dY/eX -4.66 1.32 -3.52 <0.001 11.2 -7.26 -2.06\n#> wt dY/eX -12.48 2.04 -6.13 <0.001 30.1 -16.47 -8.49\n#> \n#> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response"
+ "text": "In some contexts, it is useful to interpret the results of a regression model in terms of elasticity or semi-elasticity. One strategy to achieve that is to estimate a log-log or a semilog model, where the left and/or right-hand side variables are logged. Another approach is to note that \\(\\frac{\\partial ln(x)}{\\partial x}=\\frac{1}{x}\\), and to post-process the marginal effects to transform them into elasticities or semi-elasticities.\nFor example, say we estimate a linear model of this form:\n\\[y = \\beta_0 + \\beta_1 x_1 + \\beta_2 x_2 + \\varepsilon\\]\nLet \\(\\hat{y}\\) be the adjusted prediction made by the model for some combination of covariates \\(x_1\\) and \\(x_2\\). The slope with respect to \\(x_1\\) (or “marginal effect”) is:\n\\[\\frac{\\partial \\hat{y}}{\\partial x_1}\\]\nWe can estimate the “eyex”, “eydx”, and “dyex” (semi-)elasticities with respect to \\(x_1\\) as follows:\n\\[\n\\eta_1=\\frac{\\partial \\hat{y}}{\\partial x_1}\\cdot \\frac{x_1}{\\hat{y}}\\\\\n\\eta_2=\\frac{\\partial \\hat{y}}{\\partial x_1}\\cdot \\frac{1}{\\hat{y}} \\\\\n\\eta_3=\\frac{\\partial \\hat{y}}{\\partial x_1}\\cdot x_1,\n\\]\nwith interpretations roughly as follows:\n\nA percentage point increase in \\(x_1\\) is associated to a \\(\\eta_1\\) percentage points increase in \\(y\\).\nA unit increase in \\(x_1\\) is associated to a \\(\\eta_2\\) percentage points increase in \\(y\\).\nA percentage point increase in \\(x_1\\) is associated to a \\(\\eta_3\\) units increase in \\(y\\).\n\nFor further intuition, consider the ratio of change in \\(y\\) to change in \\(x\\): \\(\\frac{\\Delta y}{\\Delta x}\\). We can turn this ratio into a ratio between relative changes by dividing both the numerator and the denominator: \\(\\frac{\\frac{\\Delta y}{y}}{\\frac{\\Delta x}{x}}\\). This is of course linked to the expression for the \\(\\eta_1\\) elasticity above.\nWith the marginaleffects package, these quantities are easy to compute:\n\nlibrary(marginaleffects)\nmod <- lm(mpg ~ hp + wt, data = mtcars)\n\navg_slopes(mod)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp -0.0318 0.00903 -3.52 <0.001 11.2 -0.0495 -0.0141\n#> wt -3.8778 0.63276 -6.13 <0.001 30.1 -5.1180 -2.6376\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\navg_slopes(mod, slope = \"eyex\")\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp eY/eX -0.285 0.0855 -3.34 <0.001 10.2 -0.453 -0.118\n#> wt eY/eX -0.746 0.1418 -5.26 <0.001 22.7 -1.024 -0.468\n#> \n#> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\navg_slopes(mod, slope = \"eydx\")\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp eY/dX -0.00173 0.000502 -3.46 <0.001 10.8 -0.00272 -0.000751\n#> wt eY/dX -0.21165 0.037851 -5.59 <0.001 25.4 -0.28583 -0.137461\n#> \n#> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\navg_slopes(mod, slope = \"dyex\")\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp dY/eX -4.66 1.32 -3.52 <0.001 11.2 -7.26 -2.06\n#> wt dY/eX -12.48 2.04 -6.13 <0.001 30.1 -16.47 -8.49\n#> \n#> Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response"
},
{
"objectID": "articles/equivalence.html#predictions",
@@ -676,7 +676,7 @@
"href": "articles/gam.html#marginal-effects-slopes-and-plot_slopes",
"title": "\n16 GAM\n",
"section": "\n16.3 Marginal Effects: slopes() and plot_slopes()\n",
- "text": "16.3 Marginal Effects: slopes() and plot_slopes()\n\nMarginal effects are slopes of the prediction equation. They are an observation-level quantity. The slopes() function produces a dataset with the same number of rows as the original data, but with new columns for the slop and uncertainty estimates:\n\nmfx <- slopes(model, variables = \"Time\")\nhead(mfx)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> Time 0.0261 0.00137 19.1 <0.001 267.8 0.0234 0.0288\n#> Time 0.0261 0.00136 19.2 <0.001 270.3 0.0234 0.0288\n#> Time 0.0261 0.00133 19.5 <0.001 280.1 0.0235 0.0287\n#> Time 0.0260 0.00128 20.3 <0.001 301.1 0.0235 0.0285\n#> Time 0.0259 0.00120 21.6 <0.001 339.8 0.0235 0.0282\n#> Time 0.0257 0.00109 23.5 <0.001 404.6 0.0236 0.0279\n#> \n#> Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, Y, Group, Time, Subject \n#> Type: response\n\nWe can plot marginal effects for different values of a regressor using the plot_slopes() function. This next plot shows the slope of the prediction equation, that is, the slope of the previous plot, at every value of the Time variable.\n\nplot_slopes(model, variables = \"Time\", condition = \"Time\")\n\n\n\n\nThe marginal effects in this plot can be interpreted as measuring the change in Y that is associated with a small increase in Time, for different baseline values of Time."
+ "text": "16.3 Marginal Effects: slopes() and plot_slopes()\n\nMarginal effects are slopes of the prediction equation. They are an observation-level quantity. The slopes() function produces a dataset with the same number of rows as the original data, but with new columns for the slop and uncertainty estimates:\n\nmfx <- slopes(model, variables = \"Time\")\nhead(mfx)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> Time 0.0261 0.00137 19.1 <0.001 267.8 0.0234 0.0288\n#> Time 0.0261 0.00136 19.2 <0.001 270.4 0.0234 0.0288\n#> Time 0.0261 0.00133 19.5 <0.001 280.0 0.0235 0.0287\n#> Time 0.0260 0.00128 20.3 <0.001 301.4 0.0235 0.0285\n#> Time 0.0259 0.00120 21.6 <0.001 340.0 0.0235 0.0282\n#> Time 0.0257 0.00109 23.5 <0.001 404.4 0.0236 0.0279\n#> \n#> Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, Y, Group, Time, Subject \n#> Type: response\n\nWe can plot marginal effects for different values of a regressor using the plot_slopes() function. This next plot shows the slope of the prediction equation, that is, the slope of the previous plot, at every value of the Time variable.\n\nplot_slopes(model, variables = \"Time\", condition = \"Time\")\n\n\n\n\nThe marginal effects in this plot can be interpreted as measuring the change in Y that is associated with a small increase in Time, for different baseline values of Time."
},
{
"objectID": "articles/gam.html#excluding-terms",
@@ -700,18 +700,25 @@
"text": "This vignette replicates some of the analyses in this excellent blog post by Solomon Kurz: Use emmeans() to include 95% CIs around your lme4-based fitted lines\nLoad libraries and fit two models of chick weights:\n\nlibrary(lme4)\nlibrary(tidyverse)\nlibrary(patchwork)\nlibrary(marginaleffects)\n\n## unconditional linear growth model\nfit1 <- lmer(\n weight ~ 1 + Time + (1 + Time | Chick),\n data = ChickWeight)\n\n## conditional quadratic growth model\nfit2 <- lmer(\n weight ~ 1 + Time + I(Time^2) + Diet + Time:Diet + I(Time^2):Diet + (1 + Time + I(Time^2) | Chick),\n data = ChickWeight)\n\n\n18.0.1 Unit-level predictions\nPredict weight of each chick over time:\n\npred1 <- predictions(fit1,\n newdata = datagrid(Chick = ChickWeight$Chick,\n Time = 0:21))\n\np1 <- ggplot(pred1, aes(Time, estimate, level = Chick)) +\n geom_line() +\n labs(y = \"Predicted weight\", x = \"Time\", title = \"Linear growth model\")\n\npred2 <- predictions(fit2,\n newdata = datagrid(Chick = ChickWeight$Chick,\n Time = 0:21))\n\np2 <- ggplot(pred2, aes(Time, estimate, level = Chick)) +\n geom_line() +\n labs(y = \"Predicted weight\", x = \"Time\", title = \"Quadratic growth model\")\n\np1 + p2\n\n\n\n\nPredictions for each chick, in the 4 counterfactual worlds with different values for the Diet variable:\n\npred <- predictions(fit2)\n\nggplot(pred, aes(Time, estimate, level = Chick)) +\n geom_line() +\n ylab(\"Predicted Weight\") +\n facet_wrap(~ Diet, labeller = label_both)\n\n\n\n\n\n18.0.2 Population-level predictions\nTo make population-level predictions, we set the Chick variable to NA, and set re.form=NA. This last argument is offered by the lme4::predict function which is used behind the scenes to compute predictions:\n\npred <- predictions(\n fit2,\n newdata = datagrid(Chick = NA,\n Diet = 1:4,\n Time = 0:21),\n re.form = NA)\n\nggplot(pred, aes(x = Time, y = estimate, ymin = conf.low, ymax = conf.high)) +\n geom_ribbon(alpha = .1, fill = \"red\") +\n geom_line() +\n facet_wrap(~ Diet, labeller = label_both) +\n labs(title = \"Population-level trajectories\")"
},
{
- "objectID": "articles/machine_learning.html",
- "href": "articles/machine_learning.html",
+ "objectID": "articles/machine_learning.html#tidymodels",
+ "href": "articles/machine_learning.html#tidymodels",
"title": "\n19 Machine Learning\n",
- "section": "",
- "text": "20 tidymodels\nmarginaleffects also supports the tidymodels machine learning framework. When the underlying engine used by tidymodels to train the model is itself supported as a standalone package by marginaleffects, we can obtain estimates of uncertainty estimates:\nsuppressPackageStartupMessages(library(tidymodels))\nmod <- linear_reg(mode = \"regression\") |>\n set_engine(\"lm\") |>\n fit(count ~ ., data = bikes)\navg_comparisons(mod, newdata = bikes, type = \"response\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n holiday False - True 202.251 16.322 12.391 < 0.001 114.7 170.26 234.242\n humidity +1 -13.488 19.974 -0.675 0.49948 1.0 -52.64 25.659\n month +1 -1.363 1.398 -0.974 0.32987 1.6 -4.10 1.378\n season spring - fall -67.924 14.119 -4.811 < 0.001 19.3 -95.60 -40.251\n season summer - fall -9.242 9.583 -0.964 0.33485 1.6 -28.02 9.540\n season winter - fall 31.704 11.471 2.764 0.00571 7.5 9.22 54.187\n temp +1 5.010 0.630 7.954 < 0.001 49.0 3.78 6.244\n weather misty - clear -20.147 6.215 -3.241 0.00119 9.7 -32.33 -7.965\n weather rain - clear -112.211 9.881 -11.356 < 0.001 96.9 -131.58 -92.844\n weekday Fri - Sun 224.628 9.744 23.054 < 0.001 388.2 205.53 243.725\n weekday Mon - Sun 244.680 10.012 24.438 < 0.001 435.7 225.06 264.304\n weekday Sat - Sun 15.072 9.714 1.552 0.12075 3.0 -3.97 34.111\n weekday Thu - Sun 273.735 9.789 27.963 < 0.001 569.2 254.55 292.921\n weekday Tue - Sun 267.743 9.822 27.258 < 0.001 541.1 248.49 286.995\n weekday Wed - Sun 275.465 9.755 28.237 < 0.001 580.3 256.35 294.585\n windspeed +1 -0.542 0.398 -1.362 0.17310 2.5 -1.32 0.238\n workingday False - True 0.000 NA NA NA NA NA NA\n year 1 - 0 108.398 5.226 20.741 < 0.001 315.0 98.16 118.641\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response\nWhen the underlying engine that tidymodels uses to fit the model is not supported by marginaleffects as a standalone model, we can also obtain correct results, but no uncertainy estimates. Here is a random forest model:\nforest_tidy <- rand_forest(mode = \"regression\") |>\n set_engine(\"ranger\") |>\n fit(count ~ ., data = bikes)\navg_comparisons(forest_tidy, newdata = bikes, type = \"numeric\")\n\n\n Term Contrast Estimate\n count +1 0.000\n holiday False - True 13.487\n humidity +1 -24.291\n month +1 4.076\n season spring - fall -29.015\n season summer - fall -6.781\n season winter - fall 4.958\n temp +1 3.399\n weather misty - clear -7.555\n weather rain - clear -59.817\n weekday Fri - Sun 70.596\n weekday Mon - Sun 78.772\n weekday Sat - Sun 22.198\n weekday Thu - Sun 86.375\n weekday Tue - Sun 84.493\n weekday Wed - Sun 86.895\n windspeed +1 0.141\n workingday False - True -192.057\n year 1 - 0 99.677\n\nColumns: term, contrast, estimate \nType: numeric\nWe can plot the results using the standard marginaleffects helpers. For example, to plot predictions, we can do:\nplot_predictions(forest, condition = \"temp\", newdata = bikes)\nAs documented in ?plot_predictions, using condition=\"temp\" is equivalent to creating an equally-spaced grid of temp values, and holding all other predictors at their means or modes. In other words, it is equivalent to:\nd <- datagrid(temp = seq(min(bikes$temp), max(bikes$temp), length.out = 100), newdata = bikes)\np <- predict(forest, newdata = d)\nplot(d$temp, p, type = \"l\")\nAlternatively, we could plot “marginal” predictions, where replicate the full dataset once for every value of temp, and then average the predicted values over each value of the x-axis:\nd <- datagridcf(newdata = bikes, temp = unique)\nplot_predictions(forest, by = \"temp\", newdata = d)\nOf course, we can customize the plot using all the standard ggplot2 functions:\nplot_predictions(forest, by = \"temp\", newdata = d) +\n geom_point(data = bikes, aes(x = temp, y = count), alpha = 0.1) +\n geom_smooth(data = bikes, aes(x = temp, y = count), se = FALSE, color = \"orange\") +\n labs(x = \"Temperature (Celcius)\", y = \"Predicted number of bikes rented per hour\",\n title = \"Black: random forest predictions. Green: LOESS smoother.\") +\n theme_bw()\n\n`geom_smooth()` using method = 'loess' and formula = 'y ~ x'"
+ "section": "\n19.1 tidymodels\n",
+ "text": "19.1 tidymodels\n\nmarginaleffects also supports the tidymodels machine learning framework. When the underlying engine used by tidymodels to train the model is itself supported as a standalone package by marginaleffects, we can obtain both estimates and their standard errors:\n\nlibrary(tidymodels)\n\npenguins <- modeldata::penguins |> \n na.omit() |>\n select(sex, island, species, bill_length_mm)\n\nmod <- linear_reg(mode = \"regression\") |>\n set_engine(\"lm\") |>\n fit(bill_length_mm ~ ., data = penguins)\n\navg_comparisons(mod, type = \"numeric\", newdata = penguins)\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n island Dream - Biscoe -0.489 0.470 -1.04 0.299 1.7 -1.410 0.433\n island Torgersen - Biscoe 0.103 0.488 0.21 0.833 0.3 -0.853 1.059\n sex male - female 3.697 0.255 14.51 <0.001 156.0 3.198 4.197\n species Chinstrap - Adelie 10.347 0.422 24.54 <0.001 439.4 9.521 11.174\n species Gentoo - Adelie 8.546 0.410 20.83 <0.001 317.8 7.742 9.350\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: numeric \n\navg_predictions(mod, type = \"numeric\", newdata = penguins, by = \"island\")\n\n\n island Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n Biscoe 45.2 0.182 248 <0.001 Inf 44.9 45.6\n Dream 44.2 0.210 211 <0.001 Inf 43.8 44.6\n Torgersen 39.0 0.339 115 <0.001 Inf 38.4 39.7\n\nColumns: island, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: numeric \n\n\nWhen the underlying engine that tidymodels uses to fit the model is not supported by marginaleffects as a standalone model, we can also obtain correct results, but no uncertainy estimates. Here is a random forest model:\n\nmod <- rand_forest(mode = \"regression\") |>\n set_engine(\"ranger\") |>\n fit(bill_length_mm ~ ., data = penguins)\n\navg_comparisons(mod, newdata = penguins, type = \"numeric\")\n\n\n Term Contrast Estimate\n bill_length_mm +1 0.000\n island Dream - Biscoe 0.244\n island Torgersen - Biscoe -2.059\n sex male - female 2.711\n species Chinstrap - Adelie 5.915\n species Gentoo - Adelie 5.975\n\nColumns: term, contrast, estimate \nType: numeric \n\n\n\n19.1.1 Workflows\ntidymodels “workflows” are a convenient way to train a model while applying a series of pre-processing steps to the data. marginaleffects supports workflows out of the box. First, let’s consider a simple regression task:\n\npenguins <- modeldata::penguins |> \n na.omit() |>\n select(sex, island, species, bill_length_mm)\n\nmod <- penguins |>\n recipe(bill_length_mm ~ island + species + sex, data = _) |>\n step_dummy(all_nominal_predictors()) |>\n workflow(spec = linear_reg(mode = \"regression\", engine = \"glm\")) |>\n fit(penguins)\n\navg_comparisons(mod, newdata = penguins, type = \"numeric\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n bill_length_mm +1 0.000 NA NA NA NA NA NA\n island Dream - Biscoe -0.489 0.470 -1.04 0.299 1.7 -1.410 0.433\n island Torgersen - Biscoe 0.103 0.488 0.21 0.833 0.3 -0.853 1.059\n sex male - female 3.697 0.255 14.51 <0.001 156.0 3.198 4.197\n species Chinstrap - Adelie 10.347 0.422 24.54 <0.001 439.4 9.521 11.174\n species Gentoo - Adelie 8.546 0.410 20.83 <0.001 317.8 7.742 9.350\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: numeric \n\n\nNow, we run a classification task instead, and plot the predicted probabilities:\n\nmod <- penguins |>\n recipe(sex ~ island + species + bill_length_mm, data = _) |>\n step_dummy(all_nominal_predictors()) |>\n workflow(spec = logistic_reg(mode = \"classification\", engine = \"glm\")) |>\n fit(penguins)\n\nplot_predictions(\n mod,\n condition = c(\"bill_length_mm\", \"group\"),\n newdata = penguins,\n type = \"prob\")\n\n\n\n\nFinally, let’s consider a more complex task, where we train several models and summarize them in a table using modelsummary:\n\nlibrary(modelsummary)\n\nrecipe <- penguins |>\n recipe(sex ~ ., data = _) |>\n step_ns(bill_length_mm, deg_free = 4) |>\n step_dummy(all_nominal_predictors())\n\nmodels <- list(\n logit = logistic_reg(mode = \"classification\", engine = \"glm\"),\n forest = rand_forest(mode = \"classification\", engine = \"ranger\"),\n xgb = boost_tree(mode = \"classification\", engine = \"xgboost\")\n)\n\nlapply(models, \\(x) {\n recipe |>\n workflow(spec = x) |>\n fit(penguins) |>\n avg_comparisons(newdata = penguins, type = \"prob\") }) |>\n modelsummary(shape = term + contrast + group ~ model)\n\n\n\n\n\n\nlogit\nforest\nxgb\n\n\n\nbill_length_mm\n+1\nfemale\n−0.101\n−0.060\n−0.075\n\n\n\n+1\n\n(0.004)\n\n\n\n\n\n+1\nmale\n0.101\n0.060\n0.075\n\n\n\n+1\n\n(0.004)\n\n\n\n\nisland\nDream - Biscoe\nfemale\n−0.044\n0.000\n−0.004\n\n\n\nDream - Biscoe\n\n(0.069)\n\n\n\n\n\nDream - Biscoe\nmale\n0.044\n0.000\n0.004\n\n\n\nDream - Biscoe\n\n(0.069)\n\n\n\n\n\nTorgersen - Biscoe\nfemale\n0.015\n−0.058\n0.008\n\n\n\nTorgersen - Biscoe\n\n(0.074)\n\n\n\n\n\nTorgersen - Biscoe\nmale\n−0.015\n0.058\n−0.008\n\n\n\nTorgersen - Biscoe\n\n(0.074)\n\n\n\n\nsex\nmale - female\nfemale\n0.000\n0.000\n0.000\n\n\n\nmale - female\nmale\n0.000\n0.000\n0.000\n\n\nspecies\nChinstrap - Adelie\nfemale\n0.562\n0.169\n0.441\n\n\n\nChinstrap - Adelie\n\n(0.036)\n\n\n\n\n\nChinstrap - Adelie\nmale\n−0.562\n−0.169\n−0.441\n\n\n\nChinstrap - Adelie\n\n(0.036)\n\n\n\n\n\nGentoo - Adelie\nfemale\n0.453\n0.121\n0.361\n\n\n\nGentoo - Adelie\n\n(0.025)\n\n\n\n\n\nGentoo - Adelie\nmale\n−0.453\n−0.121\n−0.361\n\n\n\nGentoo - Adelie\n\n(0.025)\n\n\n\n\nNum.Obs.\n\n\n333\n\n\n\n\nAIC\n\n\n302.2\n\n\n\n\nBIC\n\n\n336.4\n\n\n\n\nLog.Lik.\n\n\n−142.082"
},
{
"objectID": "articles/machine_learning.html#mlr3",
"href": "articles/machine_learning.html#mlr3",
"title": "\n19 Machine Learning\n",
- "section": "\n19.1 mlr3\n",
- "text": "19.1 mlr3\n\nmlr3 is a machine learning framework for R. It makes it possible for users to train a wide range of models, including linear models, random forests, gradient boosting machines, and neural networks.\nIn this example, we use the bikes dataset supplied by the fmeffects package to train a random forest model predicting the number of bikes rented per hour. We then use marginaleffects to interpret the results of the model.\n\ndata(\"bikes\", package = \"fmeffects\")\n\ntask <- as_task_regr(x = bikes, id = \"bikes\", target = \"count\")\nforest <- lrn(\"regr.ranger\")$train(task)\n\nAs described in other vignettes, we can use the avg_comparisons() function to compute the average change in predicted outcome that is associated with a change in each feature:\n\navg_comparisons(forest, newdata = bikes)\n\n\n Term Contrast Estimate\n count +1 0.000\n holiday False - True 13.835\n humidity +1 -23.142\n month +1 3.874\n season spring - fall -30.230\n season summer - fall -7.471\n season winter - fall 4.135\n temp +1 3.558\n weather misty - clear -7.759\n weather rain - clear -60.609\n weekday Fri - Sun 68.991\n weekday Mon - Sun 77.713\n weekday Sat - Sun 22.763\n weekday Thu - Sun 84.597\n weekday Tue - Sun 83.098\n weekday Wed - Sun 85.500\n windspeed +1 0.254\n workingday False - True -192.565\n year 1 - 0 98.296\n\nColumns: term, contrast, estimate \nType: response \n\n\nThese results are easy to interpret: An increase of 1 degree Celsius in the temperature is associated with an increase of 3.558 bikes rented per hour.\nWe could obtain the same result manually as follows:\n\nlo <- transform(bikes, temp = temp - 0.5)\nhi <- transform(bikes, temp = temp + 0.5)\nmean(predict(forest, newdata = hi) - predict(forest, newdata = lo))\n\n[1] 3.558093\n\n\nAs the code above makes clear, the avg_comparisons() computes the effect of a “centered” change on the outcome. If we want to compute a “Forward Marginal Effect” instead, we can call:\n\navg_comparisons(\n forest,\n variables = list(\"temp\" = \\(x) data.frame(x, x + 1)),\n newdata = bikes)\n\n\n Term Contrast Estimate\n temp custom 2.41\n\nColumns: term, contrast, estimate \nType: response \n\n\nThis is equivalent to using the fmeffects package:\n\nfmeffects::fme(\n model = forest,\n data = bikes,\n target = \"count\",\n feature = \"temp\",\n step.size = 1)$ame \n\n[1] 2.412783\n\n\nWith marginaleffects::avg_comparisons(), we can also compute the average effect of a simultaneous change in multiple predictors, using the variables and cross arguments. In this example, we see what happens (on average) to the predicted outcome when the temp, season, and weather predictors all change together:\n\navg_comparisons(\n forest,\n variables = c(\"temp\", \"season\", \"weather\"),\n cross = TRUE,\n newdata = bikes)\n\n\n Estimate C: season C: temp C: weather\n -33.443 spring - fall +1 misty - clear\n -76.611 spring - fall +1 rain - clear \n -11.686 summer - fall +1 misty - clear\n -62.018 summer - fall +1 rain - clear \n -0.179 winter - fall +1 misty - clear\n -55.485 winter - fall +1 rain - clear \n\nColumns: term, contrast_season, contrast_temp, contrast_weather, estimate \nType: response"
+ "section": "\n19.2 mlr3\n",
+ "text": "19.2 mlr3\n\nmlr3 is a machine learning framework for R. It makes it possible for users to train a wide range of models, including linear models, random forests, gradient boosting machines, and neural networks.\nIn this example, we use the bikes dataset supplied by the fmeffects package to train a random forest model predicting the number of bikes rented per hour. We then use marginaleffects to interpret the results of the model.\n\ndata(\"bikes\", package = \"fmeffects\")\n\ntask <- as_task_regr(x = bikes, id = \"bikes\", target = \"count\")\nforest <- lrn(\"regr.ranger\")$train(task)\n\nAs described in other vignettes, we can use the avg_comparisons() function to compute the average change in predicted outcome that is associated with a change in each feature:\n\navg_comparisons(forest, newdata = bikes)\n\n\n Term Contrast Estimate\n count +1 0.000\n holiday False - True 8.697\n humidity +1 -24.723\n month +1 4.364\n season spring - fall -34.671\n season summer - fall -10.571\n season winter - fall 0.165\n temp +1 3.301\n weather misty - clear -7.633\n weather rain - clear -59.395\n weekday Fri - Sun 65.917\n weekday Mon - Sun 74.898\n weekday Sat - Sun 20.971\n weekday Thu - Sun 81.958\n weekday Tue - Sun 80.634\n weekday Wed - Sun 82.814\n windspeed +1 0.185\n workingday False - True -194.817\n year 1 - 0 99.321\n\nColumns: term, contrast, estimate \nType: response \n\n\nThese results are easy to interpret: An increase of 1 degree Celsius in the temperature is associated with an increase of 3.301 bikes rented per hour.\nWe could obtain the same result manually as follows:\n\nlo <- transform(bikes, temp = temp - 0.5)\nhi <- transform(bikes, temp = temp + 0.5)\nmean(predict(forest, newdata = hi) - predict(forest, newdata = lo))\n\n[1] 3.301041\n\n\n\n19.2.1 fmeffects: Forward or centered effects\nAs the code above makes clear, the avg_comparisons() computes the effect of a “centered” change on the outcome. If we want to compute a “Forward Marginal Effect” instead, we can call:\n\navg_comparisons(\n forest,\n variables = list(\"temp\" = \\(x) data.frame(x, x + 1)),\n newdata = bikes)\n\n\n Term Contrast Estimate\n temp custom 2.25\n\nColumns: term, contrast, estimate \nType: response \n\n\nThis is equivalent to using the fmeffects package:\n\nfmeffects::fme(\n model = forest,\n data = bikes,\n target = \"count\",\n feature = \"temp\",\n step.size = 1)$ame \n\n[1] 2.245841\n\n\nWith marginaleffects::avg_comparisons(), we can also compute the average effect of a simultaneous change in multiple predictors, using the variables and cross arguments. In this example, we see what happens (on average) to the predicted outcome when the temp, season, and weather predictors all change together:\n\navg_comparisons(\n forest,\n variables = c(\"temp\", \"season\", \"weather\"),\n cross = TRUE,\n newdata = bikes)\n\n\n Estimate C: season C: temp C: weather\n -38.49 spring - fall +1 misty - clear\n -81.69 spring - fall +1 rain - clear \n -14.95 summer - fall +1 misty - clear\n -64.69 summer - fall +1 rain - clear \n -4.33 winter - fall +1 misty - clear\n -58.45 winter - fall +1 rain - clear \n\nColumns: term, contrast_season, contrast_temp, contrast_weather, estimate \nType: response"
+ },
+ {
+ "objectID": "articles/machine_learning.html#plots",
+ "href": "articles/machine_learning.html#plots",
+ "title": "\n19 Machine Learning\n",
+ "section": "\n19.3 Plots",
+ "text": "19.3 Plots\nWe can plot the results using the standard marginaleffects helpers. For example, to plot predictions, we can do:\n\nlibrary(mlr3verse)\ndata(\"bikes\", package = \"fmeffects\")\ntask <- as_task_regr(x = bikes, id = \"bikes\", target = \"count\")\nforest <- lrn(\"regr.ranger\")$train(task)\n\nplot_predictions(forest, condition = \"temp\", newdata = bikes)\n\n\n\n\nAs documented in ?plot_predictions, using condition=\"temp\" is equivalent to creating an equally-spaced grid of temp values, and holding all other predictors at their means or modes. In other words, it is equivalent to:\n\nd <- datagrid(temp = seq(min(bikes$temp), max(bikes$temp), length.out = 100), newdata = bikes)\np <- predict(forest, newdata = d)\nplot(d$temp, p, type = \"l\")\n\nAlternatively, we could plot “marginal” predictions, where replicate the full dataset once for every value of temp, and then average the predicted values over each value of the x-axis:\n\nplot_predictions(forest, by = \"temp\", newdata = bikes)\n\n\n\n\nOf course, we can customize the plot using all the standard ggplot2 functions:\n\nplot_predictions(forest, by = \"temp\", newdata = d) +\n geom_point(data = bikes, aes(x = temp, y = count), alpha = 0.1) +\n geom_smooth(data = bikes, aes(x = temp, y = count), se = FALSE, color = \"orange\") +\n labs(x = \"Temperature (Celcius)\", y = \"Predicted number of bikes rented per hour\",\n title = \"Black: random forest predictions. Green: LOESS smoother.\") +\n theme_bw()\n\n`geom_smooth()` using method = 'loess' and formula = 'y ~ x'"
},
{
"objectID": "articles/matching.html#matching",
@@ -732,7 +739,7 @@
"href": "articles/matching.html#quantity-of-interest",
"title": "\n20 Matching\n",
"section": "\n20.3 Quantity of interest",
- "text": "20.3 Quantity of interest\nFinally, we use the avg_comparisons() function of the marginaleffects package to estimate the ATT and its standard error. In effect, this function applies G-Computation to estimate the quantity of interest. We use the following arguments:\n\n\nvariables=\"treat\" indicates that we are interested in the effect of the treat variable.\n\nnewdata=subset(dat, treat == 1) indicates that we want to estimate the effect for the treated individuals only (i.e., the ATT).\n\nwts=\"weights\" indicates that we want to use the weights supplied by the matching method.\n\n\navg_comparisons(\n fit,\n variables = \"treat\",\n newdata = subset(dat, treat == 1),\n wts = \"weights\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n treat 1 - 0 637 1014 0.628 0.53 0.9 -1350 2625\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response"
+ "text": "20.3 Quantity of interest\nFinally, we use the avg_comparisons() function of the marginaleffects package to estimate the ATT and its standard error. In effect, this function applies G-Computation to estimate the quantity of interest. We use the following arguments:\n\n\nvariables=\"treat\" indicates that we are interested in the effect of the treat variable.\n\nnewdata=subset(dat, treat == 1) indicates that we want to estimate the effect for the treated individuals only (i.e., the ATT).\n\nwts=\"weights\" indicates that we want to use the weights supplied by the matching method.\n\n\navg_comparisons(\n fit,\n variables = \"treat\",\n newdata = subset(dat, treat == 1),\n wts = \"weights\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n treat 1 - 0 661 1009 0.654 0.513 1.0 -1318 2639\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response"
},
{
"objectID": "articles/matching.html#learn-more",
@@ -746,21 +753,21 @@
"href": "articles/multiple_imputation.html#mice",
"title": "\n21 Missing Data\n",
"section": "\n21.1 mice\n",
- "text": "21.1 mice\n\nFirst, we impute the dataset using the mice package:\n\nlibrary(mice)\n\ndat_mice <- mice(dat, m = 20, printFlag = FALSE, .Random.seed = 1024)\n\nThen, we use the standard mice syntax to produce an object of class mira with all the models:\n\nmod_mice <- with(dat_mice, lm(Petal.Width ~ Sepal.Length * Sepal.Width + Species))\n\nFinally, we feed the mira object to a marginaleffects function:\n\nmfx_mice <- avg_slopes(mod_mice, by = \"Species\")\nmfx_mice\n#> \n#> Term Contrast Species Estimate Std. Error t Pr(>|t|) S 2.5 % 97.5 % Df\n#> Sepal.Length mean(dY/dX) setosa 0.0684 0.0560 1.222 0.22413 2.2 -0.0424 0.179 120.0\n#> Sepal.Length mean(dY/dX) versicolor 0.0540 0.0558 0.968 0.33557 1.6 -0.0568 0.165 93.6\n#> Sepal.Length mean(dY/dX) virginica 0.0582 0.0512 1.137 0.25822 2.0 -0.0434 0.160 101.3\n#> Sepal.Width mean(dY/dX) setosa 0.1890 0.0836 2.261 0.02432 5.4 0.0246 0.353 400.1\n#> Sepal.Width mean(dY/dX) versicolor 0.2092 0.0772 2.710 0.00807 7.0 0.0558 0.363 89.0\n#> Sepal.Width mean(dY/dX) virginica 0.2242 0.1041 2.154 0.03511 4.8 0.0162 0.432 61.9\n#> Species mean(versicolor) - mean(setosa) setosa 1.1399 0.0977 11.668 < 0.001 68.1 0.9464 1.333 114.8\n#> Species mean(versicolor) - mean(setosa) versicolor 1.1399 0.0977 11.668 < 0.001 68.1 0.9464 1.333 114.8\n#> Species mean(versicolor) - mean(setosa) virginica 1.1399 0.0977 11.668 < 0.001 68.1 0.9464 1.333 114.8\n#> Species mean(virginica) - mean(setosa) setosa 1.7408 0.1108 15.709 < 0.001 100.7 1.5214 1.960 121.6\n#> Species mean(virginica) - mean(setosa) versicolor 1.7408 0.1108 15.709 < 0.001 100.7 1.5214 1.960 121.6\n#> Species mean(virginica) - mean(setosa) virginica 1.7408 0.1108 15.709 < 0.001 100.7 1.5214 1.960 121.6\n#> \n#> Columns: term, contrast, Species, estimate, std.error, s.value, predicted_lo, predicted_hi, predicted, df, statistic, p.value, conf.low, conf.high \n#> Type: response"
+ "text": "21.1 mice\n\nFirst, we impute the dataset using the mice package:\n\nlibrary(mice)\n\ndat_mice <- mice(dat, m = 20, printFlag = FALSE, .Random.seed = 1024)\n\nThen, we use the standard mice syntax to produce an object of class mira with all the models:\n\nmod_mice <- with(dat_mice, lm(Petal.Width ~ Sepal.Length * Sepal.Width + Species))\n\nFinally, we feed the mira object to a marginaleffects function:\n\nmfx_mice <- avg_slopes(mod_mice, by = \"Species\")\nmfx_mice\n#> \n#> Term Contrast Species Estimate Std. Error t Pr(>|t|) S 2.5 % 97.5 % Df\n#> Sepal.Length mean(dY/dX) setosa 0.0684 0.0560 1.222 0.22414 2.2 -0.0424 0.179 120.0\n#> Sepal.Length mean(dY/dX) versicolor 0.0540 0.0558 0.968 0.33550 1.6 -0.0567 0.165 93.6\n#> Sepal.Length mean(dY/dX) virginica 0.0582 0.0512 1.137 0.25818 2.0 -0.0434 0.160 101.2\n#> Sepal.Width mean(dY/dX) setosa 0.1890 0.0836 2.260 0.02436 5.4 0.0246 0.353 400.5\n#> Sepal.Width mean(dY/dX) versicolor 0.2092 0.0772 2.710 0.00807 7.0 0.0558 0.363 89.0\n#> Sepal.Width mean(dY/dX) virginica 0.2242 0.1041 2.155 0.03506 4.8 0.0162 0.432 61.8\n#> Species mean(versicolor) - mean(setosa) setosa 1.1399 0.0977 11.668 < 0.001 68.1 0.9464 1.333 114.8\n#> Species mean(versicolor) - mean(setosa) versicolor 1.1399 0.0977 11.668 < 0.001 68.1 0.9464 1.333 114.8\n#> Species mean(versicolor) - mean(setosa) virginica 1.1399 0.0977 11.668 < 0.001 68.1 0.9464 1.333 114.8\n#> Species mean(virginica) - mean(setosa) setosa 1.7408 0.1108 15.709 < 0.001 100.7 1.5214 1.960 121.6\n#> Species mean(virginica) - mean(setosa) versicolor 1.7408 0.1108 15.709 < 0.001 100.7 1.5214 1.960 121.6\n#> Species mean(virginica) - mean(setosa) virginica 1.7408 0.1108 15.709 < 0.001 100.7 1.5214 1.960 121.6\n#> \n#> Columns: term, contrast, Species, estimate, std.error, s.value, predicted_lo, predicted_hi, predicted, df, statistic, p.value, conf.low, conf.high \n#> Type: response"
},
{
"objectID": "articles/multiple_imputation.html#amelia",
"href": "articles/multiple_imputation.html#amelia",
"title": "\n21 Missing Data\n",
"section": "\n21.2 Amelia\n",
- "text": "21.2 Amelia\n\nWith Amelia, the workflow is essentially the same. First, we impute using Amelia:\n\nlibrary(Amelia)\n\ndat_amelia <- amelia(dat, m = 20, noms = \"Species\", p2s = 0)\n\nThen, we use Amelia syntax to produce an object of class amest with all the models:\n\nmod_amelia <- with(dat_amelia, lm(Petal.Width ~ Sepal.Length * Sepal.Width + Species))\n\nFinally, we feed the amest object to a marginaleffects function:\n\nmfx_amelia <- avg_slopes(mod_amelia, by = \"Species\")\nmfx_amelia\n#> \n#> Term Contrast Species Estimate Std. Error t Pr(>|t|) S 2.5 % 97.5 % Df\n#> Sepal.Length mean(dY/dX) setosa 0.3878 0.0907 4.278 < 0.001 13.3 0.205 0.5705 43.9\n#> Sepal.Length mean(dY/dX) versicolor 0.3231 0.0802 4.030 < 0.001 12.5 0.162 0.4838 55.9\n#> Sepal.Length mean(dY/dX) virginica 0.3467 0.0799 4.340 < 0.001 13.6 0.186 0.5077 44.7\n#> Sepal.Width mean(dY/dX) setosa -0.2079 0.1491 -1.394 0.16879 2.6 -0.507 0.0909 55.0\n#> Sepal.Width mean(dY/dX) versicolor -0.1157 0.1168 -0.991 0.32648 1.6 -0.350 0.1187 51.8\n#> Sepal.Width mean(dY/dX) virginica -0.0452 0.1272 -0.355 0.72319 0.5 -0.298 0.2078 82.1\n#> Species mean(versicolor) - mean(setosa) setosa 0.6127 0.1731 3.541 0.00111 9.8 0.262 0.9635 36.7\n#> Species mean(versicolor) - mean(setosa) versicolor 0.6127 0.1731 3.541 0.00111 9.8 0.262 0.9635 36.7\n#> Species mean(versicolor) - mean(setosa) virginica 0.6127 0.1731 3.541 0.00111 9.8 0.262 0.9635 36.7\n#> Species mean(virginica) - mean(setosa) setosa 1.0364 0.2004 5.171 < 0.001 16.6 0.629 1.4436 34.2\n#> Species mean(virginica) - mean(setosa) versicolor 1.0364 0.2004 5.171 < 0.001 16.6 0.629 1.4436 34.2\n#> Species mean(virginica) - mean(setosa) virginica 1.0364 0.2004 5.171 < 0.001 16.6 0.629 1.4436 34.2\n#> \n#> Columns: term, contrast, Species, estimate, std.error, s.value, predicted_lo, predicted_hi, predicted, df, statistic, p.value, conf.low, conf.high \n#> Type: response"
+ "text": "21.2 Amelia\n\nWith Amelia, the workflow is essentially the same. First, we impute using Amelia:\n\nlibrary(Amelia)\n\ndat_amelia <- amelia(dat, m = 20, noms = \"Species\", p2s = 0)\n\nThen, we use Amelia syntax to produce an object of class amest with all the models:\n\nmod_amelia <- with(dat_amelia, lm(Petal.Width ~ Sepal.Length * Sepal.Width + Species))\n\nFinally, we feed the amest object to a marginaleffects function:\n\nmfx_amelia <- avg_slopes(mod_amelia, by = \"Species\")\nmfx_amelia\n#> \n#> Term Contrast Species Estimate Std. Error t Pr(>|t|) S 2.5 % 97.5 % Df\n#> Sepal.Length mean(dY/dX) setosa 0.3878 0.0907 4.278 < 0.001 13.3 0.205 0.5705 43.9\n#> Sepal.Length mean(dY/dX) versicolor 0.3231 0.0802 4.029 < 0.001 12.5 0.162 0.4838 56.0\n#> Sepal.Length mean(dY/dX) virginica 0.3467 0.0799 4.340 < 0.001 13.6 0.186 0.5077 44.7\n#> Sepal.Width mean(dY/dX) setosa -0.2079 0.1491 -1.394 0.16878 2.6 -0.507 0.0909 55.0\n#> Sepal.Width mean(dY/dX) versicolor -0.1157 0.1168 -0.991 0.32647 1.6 -0.350 0.1187 51.8\n#> Sepal.Width mean(dY/dX) virginica -0.0452 0.1272 -0.355 0.72323 0.5 -0.298 0.2079 82.1\n#> Species mean(versicolor) - mean(setosa) setosa 0.6127 0.1731 3.541 0.00111 9.8 0.262 0.9635 36.7\n#> Species mean(versicolor) - mean(setosa) versicolor 0.6127 0.1731 3.541 0.00111 9.8 0.262 0.9635 36.7\n#> Species mean(versicolor) - mean(setosa) virginica 0.6127 0.1731 3.541 0.00111 9.8 0.262 0.9635 36.7\n#> Species mean(virginica) - mean(setosa) setosa 1.0364 0.2004 5.171 < 0.001 16.6 0.629 1.4436 34.2\n#> Species mean(virginica) - mean(setosa) versicolor 1.0364 0.2004 5.171 < 0.001 16.6 0.629 1.4436 34.2\n#> Species mean(virginica) - mean(setosa) virginica 1.0364 0.2004 5.171 < 0.001 16.6 0.629 1.4436 34.2\n#> \n#> Columns: term, contrast, Species, estimate, std.error, s.value, predicted_lo, predicted_hi, predicted, df, statistic, p.value, conf.low, conf.high \n#> Type: response"
},
{
"objectID": "articles/multiple_imputation.html#other-imputation-packages-missranger-or-lists-of-imputed-data-frames.",
"href": "articles/multiple_imputation.html#other-imputation-packages-missranger-or-lists-of-imputed-data-frames.",
"title": "\n21 Missing Data\n",
"section": "\n21.3 Other imputation packages: missRanger, or lists of imputed data frames.",
- "text": "21.3 Other imputation packages: missRanger, or lists of imputed data frames.\nSeveral R packages can impute missing data. Indeed, the Missing Data CRAN View lists at least a dozen alternatives. Since user interfaces change a lot from package to package, marginaleffects supports a single workflow that can be used, with some adaptation, with all imputation packages:\n\nUse an external package to create a list of imputed data frames.\nApply the datalist2mids() function from the miceadds package to convert the list of imputed data frames to a mids object.\nUse the with() function to fit models to create mira object, as illustrated in the mice and Amelia sections above.\nPass the mira object to a marginaleffects function.\n\nConsider the imputation package missRanger, which generates a list of imputed datasets:\n\nlibrary(miceadds)\nlibrary(missRanger)\n\n## convert lists of imputed datasets to `mids` objects\ndat_missRanger <- replicate(20, missRanger(dat, verbose = 0), simplify = FALSE)\nmids_missRanger <- datlist2mids(dat_missRanger)\n\n## fit models\nmod_missRanger <- with(mids_missRanger, lm(Petal.Width ~ Sepal.Length * Sepal.Width + Species))\n\n## `missRanger` slopes\nmfx_missRanger <- avg_slopes(mod_missRanger, by = \"Species\")\nmfx_missRanger\n#> \n#> Term Contrast Species Estimate Std. Error t Pr(>|t|) S 2.5 % 97.5 % Df\n#> Sepal.Length mean(dY/dX) setosa 0.0586 0.0414 1.42 0.15671 2.7 -0.02248 0.140 2780500\n#> Sepal.Length mean(dY/dX) versicolor 0.0675 0.0392 1.72 0.08514 3.6 -0.00934 0.144 721295\n#> Sepal.Length mean(dY/dX) virginica 0.0643 0.0367 1.75 0.07987 3.6 -0.00766 0.136 1020654\n#> Sepal.Width mean(dY/dX) setosa 0.2314 0.0690 3.35 < 0.001 10.3 0.09612 0.367 1911457\n#> Sepal.Width mean(dY/dX) versicolor 0.2186 0.0551 3.97 < 0.001 13.8 0.11066 0.327 246693\n#> Sepal.Width mean(dY/dX) virginica 0.2089 0.0687 3.04 0.00237 8.7 0.07422 0.344 194780\n#> Species mean(versicolor) - mean(setosa) setosa 1.1589 0.0704 16.46 < 0.001 199.8 1.02091 1.297 1115135\n#> Species mean(versicolor) - mean(setosa) versicolor 1.1589 0.0704 16.46 < 0.001 199.8 1.02091 1.297 1115135\n#> Species mean(versicolor) - mean(setosa) virginica 1.1589 0.0704 16.46 < 0.001 199.8 1.02091 1.297 1115135\n#> Species mean(virginica) - mean(setosa) setosa 1.7781 0.0822 21.64 < 0.001 342.5 1.61703 1.939 1547086\n#> Species mean(virginica) - mean(setosa) versicolor 1.7781 0.0822 21.64 < 0.001 342.5 1.61703 1.939 1547086\n#> Species mean(virginica) - mean(setosa) virginica 1.7781 0.0822 21.64 < 0.001 342.5 1.61703 1.939 1547086\n#> \n#> Columns: term, contrast, Species, estimate, std.error, s.value, predicted_lo, predicted_hi, predicted, df, statistic, p.value, conf.low, conf.high \n#> Type: response"
+ "text": "21.3 Other imputation packages: missRanger, or lists of imputed data frames.\nSeveral R packages can impute missing data. Indeed, the Missing Data CRAN View lists at least a dozen alternatives. Since user interfaces change a lot from package to package, marginaleffects supports a single workflow that can be used, with some adaptation, with all imputation packages:\n\nUse an external package to create a list of imputed data frames.\nApply the datalist2mids() function from the miceadds package to convert the list of imputed data frames to a mids object.\nUse the with() function to fit models to create mira object, as illustrated in the mice and Amelia sections above.\nPass the mira object to a marginaleffects function.\n\nConsider the imputation package missRanger, which generates a list of imputed datasets:\n\nlibrary(miceadds)\nlibrary(missRanger)\n\n## convert lists of imputed datasets to `mids` objects\ndat_missRanger <- replicate(20, missRanger(dat, verbose = 0), simplify = FALSE)\nmids_missRanger <- datlist2mids(dat_missRanger)\n\n## fit models\nmod_missRanger <- with(mids_missRanger, lm(Petal.Width ~ Sepal.Length * Sepal.Width + Species))\n\n## `missRanger` slopes\nmfx_missRanger <- avg_slopes(mod_missRanger, by = \"Species\")\nmfx_missRanger\n#> \n#> Term Contrast Species Estimate Std. Error t Pr(>|t|) S 2.5 % 97.5 % Df\n#> Sepal.Length mean(dY/dX) setosa 0.0586 0.0414 1.42 0.15672 2.7 -0.02249 0.140 2780689\n#> Sepal.Length mean(dY/dX) versicolor 0.0675 0.0392 1.72 0.08514 3.6 -0.00934 0.144 721339\n#> Sepal.Length mean(dY/dX) virginica 0.0643 0.0367 1.75 0.07986 3.6 -0.00766 0.136 1020604\n#> Sepal.Width mean(dY/dX) setosa 0.2314 0.0690 3.35 < 0.001 10.3 0.09612 0.367 1911621\n#> Sepal.Width mean(dY/dX) versicolor 0.2186 0.0551 3.97 < 0.001 13.8 0.11066 0.327 246676\n#> Sepal.Width mean(dY/dX) virginica 0.2089 0.0687 3.04 0.00237 8.7 0.07420 0.344 194885\n#> Species mean(versicolor) - mean(setosa) setosa 1.1589 0.0704 16.46 < 0.001 199.8 1.02091 1.297 1115135\n#> Species mean(versicolor) - mean(setosa) versicolor 1.1589 0.0704 16.46 < 0.001 199.8 1.02091 1.297 1115135\n#> Species mean(versicolor) - mean(setosa) virginica 1.1589 0.0704 16.46 < 0.001 199.8 1.02091 1.297 1115135\n#> Species mean(virginica) - mean(setosa) setosa 1.7781 0.0822 21.64 < 0.001 342.5 1.61703 1.939 1547086\n#> Species mean(virginica) - mean(setosa) versicolor 1.7781 0.0822 21.64 < 0.001 342.5 1.61703 1.939 1547086\n#> Species mean(virginica) - mean(setosa) virginica 1.7781 0.0822 21.64 < 0.001 342.5 1.61703 1.939 1547086\n#> \n#> Columns: term, contrast, Species, estimate, std.error, s.value, predicted_lo, predicted_hi, predicted, df, statistic, p.value, conf.low, conf.high \n#> Type: response"
},
{
"objectID": "articles/multiple_imputation.html#comparing-results-with-different-imputation-software",
@@ -816,21 +823,21 @@
"href": "articles/alternative_software.html#emmeans",
"title": "\n24 Alternative Software\n",
"section": "\n24.1 emmeans\n",
- "text": "24.1 emmeans\n\nThe emmeans package is developed by Russell V. Lenth and colleagues. emmeans is a truly incredible piece of software, and a trailblazer in the R ecosystem. It is an extremely powerful package whose functionality overlaps marginaleffects to a significant degree: marginal means, contrasts, and slopes. Even if the two packages can compute many of the same quantities, emmeans and marginaleffects have pretty different philosophies with respect to user interface and computation.\nAn emmeans analysis typically starts by computing “marginal means” by holding all numeric covariates at their means, and by averaging across a balanced grid of categorical predictors. Then, users can use the contrast() function to estimate the difference between marginal means.\nThe marginaleffects package supplies a marginal_means() function which can replicate most emmeans analyses by computing marginal means. However, the typical analysis is more squarely centered on predicted/fitted values. This is a useful starting point because, in many cases, analysts will find it easy and intuitive to express their scientific queries in terms of changes in predicted values. For example,\n\nHow does the average predicted probability of survival differ between treatment and control group?\nWhat is the difference between the predicted wage of college and high school graduates?\n\nLet’s say we estimate a linear regression model with two continuous regressors and a multiplicative interaction:\n\\[y = \\beta_0 + \\beta_1 x + \\beta_2 z + \\beta_3 x \\cdot z + \\varepsilon\\]\nIn this model, the effect of \\(x\\) on \\(y\\) will depend on the value of covariate \\(z\\). Let’s say the user wants to estimate what happens to the predicted value of \\(y\\) when \\(x\\) increases by 1 unit, when \\(z \\in \\{-1, 0, 1\\}\\). To do this, we use the comparisons() function. The variables argument determines the scientific query of interest, and the newdata argument determines the grid of covariate values on which we want to evaluate the query:\n\nmodel <- lm(y ~ x * z, data)\n\ncomparisons(\n model,\n variables = list(x = 1), # what is the effect of 1-unit change in x?\n newdata = datagrid(z = -1:1) # when z is held at values -1, 0, or 1\n)\n\nAs the vignettes show, marginaleffects can also compute contrasts on marginal means. It can also compute various quantities of interest like raw fitted values, slopes (partial derivatives), and contrasts between marginal means. It also offers a flexible mechanism to run (non-)linear hypothesis tests using the delta method, and it offers fully customizable strategy to compute quantities like odds ratios (or completely arbitrary functions of predicted outcome).\nThus, in my (Vincent’s) biased opinion, the main benefits of marginaleffects over emmeans are:\n\nSupport more model types.\nSimpler, more intuitive, and highly consistent user interface.\nEasier to compute average slopes or unit-level contrasts for whole datasets.\nEasier to compute slopes (aka marginal effects, trends, or partial derivatives) for custom grids and continuous regressors.\nEasier to implement causal inference strategies like the parametric g-formula and regression adjustment in experiments (see vignettes).\nAllows the computation of arbitrary quantities of interest via user-supplied functions and automatic delta method inference.\nCommon plots are easy with the plot_predictions(), plot_comparisons(), and plot_slopes() functions.\n\nTo be fair, many of the marginaleffects advantages listed above come down to subjective preferences over user interface. Readers are thus encouraged to try both packages to see which interface they prefer.\nThe main advantages of emmeans over marginaleffects arise when users are specifically interested in marginal means, where emmeans tends to be much faster and to have a lot of functionality to handle backtransformations. emmeans also has better functionality for effect sizes; notably, the eff_size() function can return effect size estimates that account for uncertainty in both estimated effects and the population SD.\nPlease let me know if you find other features in emmeans so I can add them to this list.\nThe Marginal Means Vignette includes side-by-side comparisons of emmeans and marginaleffects to compute marginal means. The rest of this section compares the syntax for contrasts and marginaleffects.\n\n24.1.1 Contrasts\nAs far as I can tell, emmeans does not provide an easy way to compute unit-level contrasts for every row of the dataset used to fit our model. Therefore, the side-by-side syntax shown below will always include newdata=datagrid() to specify that we want to compute only one contrast: at the mean values of the regressors. In day-to-day practice with slopes(), however, this extra argument would not be necessary.\nFit a model:\n\nlibrary(emmeans)\nlibrary(marginaleffects)\n\nmod <- glm(vs ~ hp + factor(cyl), data = mtcars, family = binomial)\n\nLink scale, pairwise contrasts:\n\nemm <- emmeans(mod, specs = \"cyl\")\ncontrast(emm, method = \"revpairwise\", adjust = \"none\", df = Inf)\n#> contrast estimate SE df z.ratio p.value\n#> cyl6 - cyl4 -0.905 1.63 Inf -0.555 0.5789\n#> cyl8 - cyl4 -19.542 4367.17 Inf -0.004 0.9964\n#> cyl8 - cyl6 -18.637 4367.16 Inf -0.004 0.9966\n#> \n#> Degrees-of-freedom method: user-specified \n#> Results are given on the log odds ratio (not the response) scale.\n\ncomparisons(mod,\n type = \"link\",\n newdata = \"mean\",\n variables = list(cyl = \"pairwise\"))\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % hp cyl\n#> cyl 6 - 4 -0.905 1.63 -0.55506 0.579 0.8 -4.1 2.29 147 8\n#> cyl 8 - 4 -19.542 4367.17 -0.00447 0.996 0.0 -8579.0 8539.95 147 8\n#> cyl 8 - 6 -18.637 4367.17 -0.00427 0.997 0.0 -8578.1 8540.85 147 8\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, vs, hp, cyl \n#> Type: link\n\nResponse scale, reference groups:\n\nemm <- emmeans(mod, specs = \"cyl\", regrid = \"response\")\ncontrast(emm, method = \"trt.vs.ctrl1\", adjust = \"none\", df = Inf, ratios = FALSE)\n#> contrast estimate SE df z.ratio p.value\n#> cyl6 - cyl4 -0.222 0.394 Inf -0.564 0.5727\n#> cyl8 - cyl4 -0.595 0.511 Inf -1.163 0.2447\n#> \n#> Degrees-of-freedom method: user-specified\n\ncomparisons(mod, newdata = \"mean\")\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % hp cyl\n#> cyl 6 - 4 -2.22e-01 3.94e-01 -0.564103 0.573 0.8 -9.94e-01 5.50e-01 147 8\n#> cyl 8 - 4 -5.95e-01 5.11e-01 -1.163332 0.245 2.0 -1.60e+00 4.07e-01 147 8\n#> hp +1 -1.56e-10 6.80e-07 -0.000229 1.000 0.0 -1.33e-06 1.33e-06 147 8\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, vs, hp, cyl \n#> Type: response\n\n\n24.1.2 Contrasts by group\nHere is a slightly more complicated example with contrasts estimated by subgroup in a lme4 mixed effects model. First we estimate a model and compute pairwise contrasts by subgroup using emmeans:\n\nlibrary(dplyr)\nlibrary(lme4)\nlibrary(emmeans)\n\ndat <- read.csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/lme4/VerbAgg.csv\")\ndat$woman <- as.numeric(dat$Gender == \"F\")\n\nmod <- glmer(\n woman ~ btype * resp + situ + (1 + Anger | item),\n family = binomial,\n data = dat)\n\nemmeans(mod, specs = \"btype\", by = \"resp\") |>\n contrast(method = \"revpairwise\", adjust = \"none\")\n#> resp = no:\n#> contrast estimate SE df z.ratio p.value\n#> scold - curse -0.0152 0.1097 Inf -0.139 0.8898\n#> shout - curse -0.2533 0.1022 Inf -2.479 0.0132\n#> shout - scold -0.2381 0.0886 Inf -2.686 0.0072\n#> \n#> resp = perhaps:\n#> contrast estimate SE df z.ratio p.value\n#> scold - curse -0.2393 0.1178 Inf -2.031 0.0422\n#> shout - curse -0.0834 0.1330 Inf -0.627 0.5309\n#> shout - scold 0.1559 0.1358 Inf 1.148 0.2510\n#> \n#> resp = yes:\n#> contrast estimate SE df z.ratio p.value\n#> scold - curse 0.0391 0.1292 Inf 0.302 0.7624\n#> shout - curse 0.5802 0.1784 Inf 3.252 0.0011\n#> shout - scold 0.5411 0.1888 Inf 2.866 0.0042\n#> \n#> Results are averaged over the levels of: situ \n#> Results are given on the log odds ratio (not the response) scale.\n\nWhat did emmeans do to obtain these results? Roughly speaking:\n\nCreate a prediction grid with one cell for each combination of categorical predictors in the model, and all numeric variables held at their means.\nMake adjusted predictions in each cell of the prediction grid.\nTake the average of those predictions (marginal means) for each combination of btype (focal variable) and resp (group by variable).\nCompute pairwise differences (contrasts) in marginal means across different levels of the focal variable btype.\n\nIn short, emmeans computes pairwise contrasts between marginal means, which are themselves averages of adjusted predictions. This is different from the default types of contrasts produced by comparisons(), which reports contrasts between adjusted predictions, without averaging across a pre-specified grid of predictors. What does comparisons() do instead?\nLet newdata be a data frame supplied by the user (or the original data frame used to fit the model), then:\n\nCreate a new data frame called newdata2, which is identical to newdata except that the focal variable is incremented by one level.\nCompute contrasts as the difference between adjusted predictions made on the two datasets:\n\npredict(model, newdata = newdata2) - predict(model, newdata = newdata)\n\n\n\nAlthough it is not idiomatic, we can use still use comparisons() to emulate the emmeans results. First, we create a prediction grid with one cell for each combination of categorical predictor in the model:\n\nnd <- datagrid(\n model = mod,\n resp = dat$resp,\n situ = dat$situ,\n btype = dat$btype)\nnrow(nd)\n#> [1] 18\n\nThis grid has 18 rows, one for each combination of levels for the resp (3), situ (2), and btype (3) variables (3 * 2 * 3 = 18).\nThen we compute pairwise contrasts over this grid:\n\ncmp <- comparisons(mod,\n variables = list(\"btype\" = \"pairwise\"),\n newdata = nd,\n type = \"link\")\nnrow(cmp)\n#> [1] 54\n\nThere are 3 pairwise contrasts, corresponding to the 3 pairwise comparisons possible between the 3 levels of the focal variable btype: scold-curse, shout-scold, shout-curse. The comparisons() function estimates those 3 contrasts for each row of newdata, so we get \\(18 \\times 3 = 54\\) rows.\nFinally, if we wanted contrasts averaged over each subgroup of the resp variable, we can use the avg_comparisons() function with the by argument:\n\navg_comparisons(mod,\n by = \"resp\",\n variables = list(\"btype\" = \"pairwise\"),\n newdata = nd,\n type = \"link\")\n#> \n#> Term Contrast resp Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> btype mean(scold) - mean(curse) no -0.0152 0.1097 -0.139 0.88976 0.2 -0.230 0.19972\n#> btype mean(scold) - mean(curse) perhaps -0.2393 0.1178 -2.031 0.04221 4.6 -0.470 -0.00842\n#> btype mean(scold) - mean(curse) yes 0.0391 0.1292 0.302 0.76239 0.4 -0.214 0.29234\n#> btype mean(shout) - mean(curse) no -0.2533 0.1022 -2.479 0.01319 6.2 -0.454 -0.05300\n#> btype mean(shout) - mean(curse) perhaps -0.0834 0.1330 -0.627 0.53090 0.9 -0.344 0.17737\n#> btype mean(shout) - mean(curse) yes 0.5802 0.1784 3.252 0.00115 9.8 0.230 0.92987\n#> btype mean(shout) - mean(scold) no -0.2381 0.0886 -2.686 0.00723 7.1 -0.412 -0.06436\n#> btype mean(shout) - mean(scold) perhaps 0.1559 0.1358 1.148 0.25103 2.0 -0.110 0.42215\n#> btype mean(shout) - mean(scold) yes 0.5411 0.1888 2.866 0.00416 7.9 0.171 0.91116\n#> \n#> Columns: term, contrast, resp, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \n#> Type: link\n\nThese results are identical to those produced by emmeans (except for \\(t\\) vs. \\(z\\)).\n\n24.1.3 Marginal Effects\nAs far as I can tell, emmeans::emtrends makes it easier to compute marginal effects for a few user-specified values than for large grids or for the full original dataset.\nResponse scale, user-specified values:\n\nmod <- glm(vs ~ hp + factor(cyl), data = mtcars, family = binomial)\n\nemtrends(mod, ~hp, \"hp\", regrid = \"response\", at = list(cyl = 4))\n#> hp hp.trend SE df asymp.LCL asymp.UCL\n#> 147 -0.00786 0.011 Inf -0.0294 0.0137\n#> \n#> Confidence level used: 0.95\n\nslopes(mod, newdata = datagrid(cyl = 4))\n#> \n#> Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> cyl 6 - 4 4 -0.22219 0.394 -0.564 0.573 0.8 -0.9942 0.5498\n#> cyl 8 - 4 4 -0.59469 0.511 -1.163 0.245 2.0 -1.5966 0.4072\n#> hp dY/dX 4 -0.00785 0.011 -0.713 0.476 1.1 -0.0294 0.0137\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, cyl, predicted_lo, predicted_hi, predicted, vs, hp \n#> Type: response\n\nLink scale, user-specified values:\n\nemtrends(mod, ~hp, \"hp\", at = list(cyl = 4))\n#> hp hp.trend SE df asymp.LCL asymp.UCL\n#> 147 -0.0326 0.0339 Inf -0.099 0.0338\n#> \n#> Confidence level used: 0.95\n\nslopes(mod, type = \"link\", newdata = datagrid(cyl = 4))\n#> \n#> Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> cyl 6 - 4 4 -0.9049 1.63e+00 -0.55506 0.579 0.8 -4.100 2.29e+00\n#> cyl 8 - 4 4 -19.5418 4.37e+03 -0.00447 0.996 0.0 -8579.030 8.54e+03\n#> hp dY/dX 4 -0.0326 3.39e-02 -0.96140 0.336 1.6 -0.099 3.38e-02\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, cyl, predicted_lo, predicted_hi, predicted, vs, hp \n#> Type: link\n\n\n24.1.4 More examples\nHere are a few more emmeans vs. marginaleffects comparisons:\n\n## Example of examining a continuous x categorical interaction using emmeans and marginaleffects\n## Authors: Cameron Patrick and Vincent Arel-Bundock\n\nlibrary(tidyverse)\nlibrary(emmeans)\nlibrary(marginaleffects)\n\n## use the mtcars data, set up am as a factor\ndata(mtcars)\nmc <- mtcars |> mutate(am = factor(am))\n\n## fit a linear model to mpg with wt x am interaction\nm <- lm(mpg ~ wt*am, data = mc)\nsummary(m)\n\n## 1. means for each level of am at mean wt.\nemmeans(m, \"am\")\nmarginal_means(m, variables = \"am\")\npredictions(m, newdata = datagrid(am = 0:1))\n\n## 2. means for each level of am at wt = 2.5, 3, 3.5.\nemmeans(m, c(\"am\", \"wt\"), at = list(wt = c(2.5, 3, 3.5)))\npredictions(m, newdata = datagrid(am = 0:1, wt = c(2.5, 3, 3.5))\n\n## 3. means for wt = 2.5, 3, 3.5, averaged over levels of am (implicitly!).\nemmeans(m, \"wt\", at = list(wt = c(2.5, 3, 3.5)))\n\n## same thing, but the averaging is more explicit, using the `by` argument\npredictions(\n m,\n newdata = datagrid(am = 0:1, wt = c(2.5, 3, 3.5)),\n by = \"wt\")\n\n## 4. graphical version of 2.\nemmip(m, am ~ wt, at = list(wt = c(2.5, 3, 3.5)), CIs = TRUE)\nplot_predictions(m, condition = c(\"wt\", \"am\"))\n\n## 5. compare levels of am at specific values of wt.\n## this is a bit ugly because the emmeans defaults for pairs() are silly.\n## infer = TRUE: enable confidence intervals.\n## adjust = \"none\": begone, Tukey.\n## reverse = TRUE: contrasts as (later level) - (earlier level)\npairs(emmeans(m, \"am\", by = \"wt\", at = list(wt = c(2.5, 3, 3.5))),\n infer = TRUE, adjust = \"none\", reverse = TRUE)\n\ncomparisons(\n m,\n variables = \"am\",\n newdata = datagrid(wt = c(2.5, 3, 3.5)))\n\n## 6. plot of pairswise comparisons\nplot(pairs(emmeans(m, \"am\", by = \"wt\", at = list(wt = c(2.5, 3, 3.5))),\n infer = TRUE, adjust = \"none\", reverse = TRUE))\n\n## Since `wt` is numeric, the default is to plot it as a continuous variable on\n## the x-axis. But not that this is the **exact same info** as in the emmeans plot.\nplot_comparisons(m, variables = \"am\", condition = \"wt\")\n\n## You of course customize everything, set draw=FALSE, and feed the raw data to feed to ggplot2\np <- plot_comparisons(\n m,\n variables = \"am\",\n condition = list(wt = c(2.5, 3, 3.5)),\n draw = FALSE)\n\nggplot(p, aes(y = wt, x = comparison, xmin = conf.low, xmax = conf.high)) +\n geom_pointrange()\n\n## 7. slope of wt for each level of am\nemtrends(m, \"am\", \"wt\")\nslopes(m, newdata = datagrid(am = 0:1))"
+ "text": "24.1 emmeans\n\nThe emmeans package is developed by Russell V. Lenth and colleagues. emmeans is a truly incredible piece of software, and a trailblazer in the R ecosystem. It is an extremely powerful package whose functionality overlaps marginaleffects to a significant degree: marginal means, contrasts, and slopes. Even if the two packages can compute many of the same quantities, emmeans and marginaleffects have pretty different philosophies with respect to user interface and computation.\nAn emmeans analysis typically starts by computing “marginal means” by holding all numeric covariates at their means, and by averaging across a balanced grid of categorical predictors. Then, users can use the contrast() function to estimate the difference between marginal means.\nThe marginaleffects package supplies a marginal_means() function which can replicate most emmeans analyses by computing marginal means. However, the typical analysis is more squarely centered on predicted/fitted values. This is a useful starting point because, in many cases, analysts will find it easy and intuitive to express their scientific queries in terms of changes in predicted values. For example,\n\nHow does the average predicted probability of survival differ between treatment and control group?\nWhat is the difference between the predicted wage of college and high school graduates?\n\nLet’s say we estimate a linear regression model with two continuous regressors and a multiplicative interaction:\n\\[y = \\beta_0 + \\beta_1 x + \\beta_2 z + \\beta_3 x \\cdot z + \\varepsilon\\]\nIn this model, the effect of \\(x\\) on \\(y\\) will depend on the value of covariate \\(z\\). Let’s say the user wants to estimate what happens to the predicted value of \\(y\\) when \\(x\\) increases by 1 unit, when \\(z \\in \\{-1, 0, 1\\}\\). To do this, we use the comparisons() function. The variables argument determines the scientific query of interest, and the newdata argument determines the grid of covariate values on which we want to evaluate the query:\n\nmodel <- lm(y ~ x * z, data)\n\ncomparisons(\n model,\n variables = list(x = 1), # what is the effect of 1-unit change in x?\n newdata = datagrid(z = -1:1) # when z is held at values -1, 0, or 1\n)\n\nAs the vignettes show, marginaleffects can also compute contrasts on marginal means. It can also compute various quantities of interest like raw fitted values, slopes (partial derivatives), and contrasts between marginal means. It also offers a flexible mechanism to run (non-)linear hypothesis tests using the delta method, and it offers fully customizable strategy to compute quantities like odds ratios (or completely arbitrary functions of predicted outcome).\nThus, in my (Vincent’s) biased opinion, the main benefits of marginaleffects over emmeans are:\n\nSupport more model types.\nSimpler, more intuitive, and highly consistent user interface.\nEasier to compute average slopes or unit-level contrasts for whole datasets.\nEasier to compute slopes (aka marginal effects, trends, or partial derivatives) for custom grids and continuous regressors.\nEasier to implement causal inference strategies like the parametric g-formula and regression adjustment in experiments (see vignettes).\nAllows the computation of arbitrary quantities of interest via user-supplied functions and automatic delta method inference.\nCommon plots are easy with the plot_predictions(), plot_comparisons(), and plot_slopes() functions.\n\nTo be fair, many of the marginaleffects advantages listed above come down to subjective preferences over user interface. Readers are thus encouraged to try both packages to see which interface they prefer.\nThe main advantages of emmeans over marginaleffects arise when users are specifically interested in marginal means, where emmeans tends to be much faster and to have a lot of functionality to handle backtransformations. emmeans also has better functionality for effect sizes; notably, the eff_size() function can return effect size estimates that account for uncertainty in both estimated effects and the population SD.\nPlease let me know if you find other features in emmeans so I can add them to this list.\nThe Marginal Means Vignette includes side-by-side comparisons of emmeans and marginaleffects to compute marginal means. The rest of this section compares the syntax for contrasts and marginaleffects.\n\n24.1.1 Contrasts\nAs far as I can tell, emmeans does not provide an easy way to compute unit-level contrasts for every row of the dataset used to fit our model. Therefore, the side-by-side syntax shown below will always include newdata=datagrid() to specify that we want to compute only one contrast: at the mean values of the regressors. In day-to-day practice with slopes(), however, this extra argument would not be necessary.\nFit a model:\n\nlibrary(emmeans)\nlibrary(marginaleffects)\n\nmod <- glm(vs ~ hp + factor(cyl), data = mtcars, family = binomial)\n\nLink scale, pairwise contrasts:\n\nemm <- emmeans(mod, specs = \"cyl\")\ncontrast(emm, method = \"revpairwise\", adjust = \"none\", df = Inf)\n#> contrast estimate SE df z.ratio p.value\n#> cyl6 - cyl4 -0.905 1.63 Inf -0.555 0.5789\n#> cyl8 - cyl4 -19.542 4367.17 Inf -0.004 0.9964\n#> cyl8 - cyl6 -18.637 4367.16 Inf -0.004 0.9966\n#> \n#> Degrees-of-freedom method: user-specified \n#> Results are given on the log odds ratio (not the response) scale.\n\ncomparisons(mod,\n type = \"link\",\n newdata = \"mean\",\n variables = list(cyl = \"pairwise\"))\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % hp cyl\n#> cyl 6 - 4 -0.905 1.63 -0.55506 0.579 0.8 -4.1 2.29 147 8\n#> cyl 8 - 4 -19.542 4367.17 -0.00447 0.996 0.0 -8579.0 8539.95 147 8\n#> cyl 8 - 6 -18.637 4367.17 -0.00427 0.997 0.0 -8578.1 8540.85 147 8\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, vs, hp, cyl \n#> Type: link\n\nResponse scale, reference groups:\n\nemm <- emmeans(mod, specs = \"cyl\", regrid = \"response\")\ncontrast(emm, method = \"trt.vs.ctrl1\", adjust = \"none\", df = Inf, ratios = FALSE)\n#> contrast estimate SE df z.ratio p.value\n#> cyl6 - cyl4 -0.222 0.394 Inf -0.564 0.5727\n#> cyl8 - cyl4 -0.595 0.511 Inf -1.163 0.2447\n#> \n#> Degrees-of-freedom method: user-specified\n\ncomparisons(mod, newdata = \"mean\")\n#> \n#> Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % hp cyl\n#> cyl 6 - 4 -2.22e-01 3.94e-01 -0.564103 0.573 0.8 -9.94e-01 5.50e-01 147 8\n#> cyl 8 - 4 -5.95e-01 5.11e-01 -1.163332 0.245 2.0 -1.60e+00 4.07e-01 147 8\n#> hp +1 -1.56e-10 6.80e-07 -0.000229 1.000 0.0 -1.33e-06 1.33e-06 147 8\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, vs, hp, cyl \n#> Type: response\n\n\n24.1.2 Contrasts by group\nHere is a slightly more complicated example with contrasts estimated by subgroup in a lme4 mixed effects model. First we estimate a model and compute pairwise contrasts by subgroup using emmeans:\n\nlibrary(dplyr)\nlibrary(lme4)\nlibrary(emmeans)\n\ndat <- read.csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/lme4/VerbAgg.csv\")\ndat$woman <- as.numeric(dat$Gender == \"F\")\n\nmod <- glmer(\n woman ~ btype * resp + situ + (1 + Anger | item),\n family = binomial,\n data = dat)\n\nemmeans(mod, specs = \"btype\", by = \"resp\") |>\n contrast(method = \"revpairwise\", adjust = \"none\")\n#> resp = no:\n#> contrast estimate SE df z.ratio p.value\n#> scold - curse -0.0152 0.1097 Inf -0.139 0.8898\n#> shout - curse -0.2533 0.1022 Inf -2.479 0.0132\n#> shout - scold -0.2381 0.0886 Inf -2.686 0.0072\n#> \n#> resp = perhaps:\n#> contrast estimate SE df z.ratio p.value\n#> scold - curse -0.2393 0.1178 Inf -2.031 0.0422\n#> shout - curse -0.0834 0.1330 Inf -0.627 0.5309\n#> shout - scold 0.1559 0.1358 Inf 1.148 0.2510\n#> \n#> resp = yes:\n#> contrast estimate SE df z.ratio p.value\n#> scold - curse 0.0391 0.1292 Inf 0.302 0.7624\n#> shout - curse 0.5802 0.1784 Inf 3.252 0.0011\n#> shout - scold 0.5411 0.1888 Inf 2.866 0.0042\n#> \n#> Results are averaged over the levels of: situ \n#> Results are given on the log odds ratio (not the response) scale.\n\nWhat did emmeans do to obtain these results? Roughly speaking:\n\nCreate a prediction grid with one cell for each combination of categorical predictors in the model, and all numeric variables held at their means.\nMake adjusted predictions in each cell of the prediction grid.\nTake the average of those predictions (marginal means) for each combination of btype (focal variable) and resp (group by variable).\nCompute pairwise differences (contrasts) in marginal means across different levels of the focal variable btype.\n\nIn short, emmeans computes pairwise contrasts between marginal means, which are themselves averages of adjusted predictions. This is different from the default types of contrasts produced by comparisons(), which reports contrasts between adjusted predictions, without averaging across a pre-specified grid of predictors. What does comparisons() do instead?\nLet newdata be a data frame supplied by the user (or the original data frame used to fit the model), then:\n\nCreate a new data frame called newdata2, which is identical to newdata except that the focal variable is incremented by one level.\nCompute contrasts as the difference between adjusted predictions made on the two datasets:\n\npredict(model, newdata = newdata2) - predict(model, newdata = newdata)\n\n\n\nAlthough it is not idiomatic, we can use still use comparisons() to emulate the emmeans results. First, we create a prediction grid with one cell for each combination of categorical predictor in the model:\n\nnd <- datagrid(\n model = mod,\n resp = dat$resp,\n situ = dat$situ,\n btype = dat$btype)\nnrow(nd)\n#> [1] 18\n\nThis grid has 18 rows, one for each combination of levels for the resp (3), situ (2), and btype (3) variables (3 * 2 * 3 = 18).\nThen we compute pairwise contrasts over this grid:\n\ncmp <- comparisons(mod,\n variables = list(\"btype\" = \"pairwise\"),\n newdata = nd,\n type = \"link\")\nnrow(cmp)\n#> [1] 54\n\nThere are 3 pairwise contrasts, corresponding to the 3 pairwise comparisons possible between the 3 levels of the focal variable btype: scold-curse, shout-scold, shout-curse. The comparisons() function estimates those 3 contrasts for each row of newdata, so we get \\(18 \\times 3 = 54\\) rows.\nFinally, if we wanted contrasts averaged over each subgroup of the resp variable, we can use the avg_comparisons() function with the by argument:\n\navg_comparisons(mod,\n by = \"resp\",\n variables = list(\"btype\" = \"pairwise\"),\n newdata = nd,\n type = \"link\")\n#> \n#> Term Contrast resp Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> btype mean(scold) - mean(curse) no -0.0152 0.1097 -0.139 0.88976 0.2 -0.230 0.19972\n#> btype mean(scold) - mean(curse) perhaps -0.2393 0.1178 -2.031 0.04221 4.6 -0.470 -0.00842\n#> btype mean(scold) - mean(curse) yes 0.0391 0.1292 0.302 0.76239 0.4 -0.214 0.29234\n#> btype mean(shout) - mean(curse) no -0.2533 0.1022 -2.479 0.01319 6.2 -0.454 -0.05300\n#> btype mean(shout) - mean(curse) perhaps -0.0834 0.1330 -0.627 0.53090 0.9 -0.344 0.17737\n#> btype mean(shout) - mean(curse) yes 0.5802 0.1784 3.252 0.00115 9.8 0.230 0.92987\n#> btype mean(shout) - mean(scold) no -0.2381 0.0886 -2.686 0.00723 7.1 -0.412 -0.06436\n#> btype mean(shout) - mean(scold) perhaps 0.1559 0.1358 1.148 0.25103 2.0 -0.110 0.42215\n#> btype mean(shout) - mean(scold) yes 0.5411 0.1888 2.866 0.00416 7.9 0.171 0.91116\n#> \n#> Columns: term, contrast, resp, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \n#> Type: link\n\nThese results are identical to those produced by emmeans (except for \\(t\\) vs. \\(z\\)).\n\n24.1.3 Marginal Effects\nAs far as I can tell, emmeans::emtrends makes it easier to compute marginal effects for a few user-specified values than for large grids or for the full original dataset.\nResponse scale, user-specified values:\n\nmod <- glm(vs ~ hp + factor(cyl), data = mtcars, family = binomial)\n\nemtrends(mod, ~hp, \"hp\", regrid = \"response\", at = list(cyl = 4))\n#> hp hp.trend SE df asymp.LCL asymp.UCL\n#> 147 -0.00786 0.011 Inf -0.0294 0.0137\n#> \n#> Confidence level used: 0.95\n\nslopes(mod, newdata = datagrid(cyl = 4))\n#> \n#> Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> cyl 6 - 4 4 -0.22219 0.394 -0.564 0.573 0.8 -0.9942 0.5498\n#> cyl 8 - 4 4 -0.59469 0.511 -1.163 0.245 2.0 -1.5966 0.4072\n#> hp dY/dX 4 -0.00785 0.011 -0.713 0.476 1.1 -0.0294 0.0137\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, cyl, predicted_lo, predicted_hi, predicted, vs, hp \n#> Type: response\n\nLink scale, user-specified values:\n\nemtrends(mod, ~hp, \"hp\", at = list(cyl = 4))\n#> hp hp.trend SE df asymp.LCL asymp.UCL\n#> 147 -0.0326 0.0339 Inf -0.099 0.0338\n#> \n#> Confidence level used: 0.95\n\nslopes(mod, type = \"link\", newdata = datagrid(cyl = 4))\n#> \n#> Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> cyl 6 - 4 4 -0.9049 1.63e+00 -0.55506 0.579 0.8 -4.100 2.29e+00\n#> cyl 8 - 4 4 -19.5418 4.37e+03 -0.00447 0.996 0.0 -8579.030 8.54e+03\n#> hp dY/dX 4 -0.0326 3.39e-02 -0.96147 0.336 1.6 -0.099 3.38e-02\n#> \n#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, cyl, predicted_lo, predicted_hi, predicted, vs, hp \n#> Type: link\n\n\n24.1.4 More examples\nHere are a few more emmeans vs. marginaleffects comparisons:\n\n## Example of examining a continuous x categorical interaction using emmeans and marginaleffects\n## Authors: Cameron Patrick and Vincent Arel-Bundock\n\nlibrary(tidyverse)\nlibrary(emmeans)\nlibrary(marginaleffects)\n\n## use the mtcars data, set up am as a factor\ndata(mtcars)\nmc <- mtcars |> mutate(am = factor(am))\n\n## fit a linear model to mpg with wt x am interaction\nm <- lm(mpg ~ wt*am, data = mc)\nsummary(m)\n\n## 1. means for each level of am at mean wt.\nemmeans(m, \"am\")\nmarginal_means(m, variables = \"am\")\npredictions(m, newdata = datagrid(am = 0:1))\n\n## 2. means for each level of am at wt = 2.5, 3, 3.5.\nemmeans(m, c(\"am\", \"wt\"), at = list(wt = c(2.5, 3, 3.5)))\npredictions(m, newdata = datagrid(am = 0:1, wt = c(2.5, 3, 3.5))\n\n## 3. means for wt = 2.5, 3, 3.5, averaged over levels of am (implicitly!).\nemmeans(m, \"wt\", at = list(wt = c(2.5, 3, 3.5)))\n\n## same thing, but the averaging is more explicit, using the `by` argument\npredictions(\n m,\n newdata = datagrid(am = 0:1, wt = c(2.5, 3, 3.5)),\n by = \"wt\")\n\n## 4. graphical version of 2.\nemmip(m, am ~ wt, at = list(wt = c(2.5, 3, 3.5)), CIs = TRUE)\nplot_predictions(m, condition = c(\"wt\", \"am\"))\n\n## 5. compare levels of am at specific values of wt.\n## this is a bit ugly because the emmeans defaults for pairs() are silly.\n## infer = TRUE: enable confidence intervals.\n## adjust = \"none\": begone, Tukey.\n## reverse = TRUE: contrasts as (later level) - (earlier level)\npairs(emmeans(m, \"am\", by = \"wt\", at = list(wt = c(2.5, 3, 3.5))),\n infer = TRUE, adjust = \"none\", reverse = TRUE)\n\ncomparisons(\n m,\n variables = \"am\",\n newdata = datagrid(wt = c(2.5, 3, 3.5)))\n\n## 6. plot of pairswise comparisons\nplot(pairs(emmeans(m, \"am\", by = \"wt\", at = list(wt = c(2.5, 3, 3.5))),\n infer = TRUE, adjust = \"none\", reverse = TRUE))\n\n## Since `wt` is numeric, the default is to plot it as a continuous variable on\n## the x-axis. But not that this is the **exact same info** as in the emmeans plot.\nplot_comparisons(m, variables = \"am\", condition = \"wt\")\n\n## You of course customize everything, set draw=FALSE, and feed the raw data to feed to ggplot2\np <- plot_comparisons(\n m,\n variables = \"am\",\n condition = list(wt = c(2.5, 3, 3.5)),\n draw = FALSE)\n\nggplot(p, aes(y = wt, x = comparison, xmin = conf.low, xmax = conf.high)) +\n geom_pointrange()\n\n## 7. slope of wt for each level of am\nemtrends(m, \"am\", \"wt\")\nslopes(m, newdata = datagrid(am = 0:1))"
},
{
"objectID": "articles/alternative_software.html#margins-and-prediction",
"href": "articles/alternative_software.html#margins-and-prediction",
"title": "\n24 Alternative Software\n",
"section": "\n24.2 margins and prediction\n",
- "text": "24.2 margins and prediction\n\nThe margins and prediction packages for R were designed by Thomas Leeper to emulate the behavior of the margins command from Stata. These packages are trailblazers and strongly influenced the development of marginaleffects. The main benefits of marginaleffects over these packages are:\n\nSupport more model types\nFaster\nMemory efficient\nPlots using ggplot2 instead of Base R\nMore extensive test suite\nActive development\n\nThe syntax of the two packages is very similar.\n\n24.2.1 Average Marginal Effects\n\nlibrary(margins)\nlibrary(marginaleffects)\n\nmod <- lm(mpg ~ cyl + hp + wt, data = mtcars)\n\nmar <- margins(mod)\nsummary(mar)\n#> factor AME SE z p lower upper\n#> cyl -0.9416 0.5509 -1.7092 0.0874 -2.0214 0.1382\n#> hp -0.0180 0.0119 -1.5188 0.1288 -0.0413 0.0052\n#> wt -3.1670 0.7406 -4.2764 0.0000 -4.6185 -1.7155\n\nmfx <- slopes(mod)\n\n\n24.2.2 Individual-Level Marginal Effects\nMarginal effects in a user-specified data frame:\n\nhead(data.frame(mar))\n#> mpg cyl disp hp drat wt qsec vs am gear carb fitted se.fitted dydx_cyl dydx_hp dydx_wt Var_dydx_cyl Var_dydx_hp Var_dydx_wt X_weights X_at_number\n#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 22.82043 0.6876212 -0.9416168 -0.0180381 -3.166973 0.3035074 0.0001410453 0.5484524 NA 1\n#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 22.01285 0.6056817 -0.9416168 -0.0180381 -3.166973 0.3035074 0.0001410453 0.5484524 NA 1\n#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 25.96040 0.7349593 -0.9416168 -0.0180381 -3.166973 0.3035074 0.0001410453 0.5484524 NA 1\n#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 20.93608 0.5800910 -0.9416168 -0.0180381 -3.166973 0.3035074 0.0001410453 0.5484524 NA 1\n#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 17.16780 0.8322986 -0.9416168 -0.0180381 -3.166973 0.3035074 0.0001410453 0.5484524 NA 1\n#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 20.25036 0.6638322 -0.9416168 -0.0180381 -3.166973 0.3035074 0.0001410453 0.5484524 NA 1\n\nhead(mfx)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> cyl -0.942 0.550 -1.71 0.0871 3.5 -2.02 0.137\n#> cyl -0.942 0.550 -1.71 0.0871 3.5 -2.02 0.137\n#> cyl -0.942 0.551 -1.71 0.0875 3.5 -2.02 0.138\n#> cyl -0.942 0.550 -1.71 0.0871 3.5 -2.02 0.137\n#> cyl -0.942 0.551 -1.71 0.0873 3.5 -2.02 0.138\n#> cyl -0.942 0.550 -1.71 0.0871 3.5 -2.02 0.137\n#> \n#> Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, cyl, hp, wt \n#> Type: response\nnd <- data.frame(cyl = 4, hp = 110, wt = 3)\n\n\n24.2.3 Marginal Effects at the Mean\n\nmar <- margins(mod, data = data.frame(prediction::mean_or_mode(mtcars)), unit_ses = TRUE)\ndata.frame(mar)\n#> mpg cyl disp hp drat wt qsec vs am gear carb fitted se.fitted dydx_cyl dydx_hp dydx_wt Var_dydx_cyl Var_dydx_hp Var_dydx_wt SE_dydx_cyl SE_dydx_hp SE_dydx_wt X_weights X_at_number\n#> 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 20.09062 0.4439832 -0.9416168 -0.0180381 -3.166973 0.3034971 0.0001410454 0.54846 0.5509057 0.01187626 0.7405808 NA 1\n\nslopes(mod, newdata = \"mean\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> cyl -0.942 0.5506 -1.71 0.0873 3.5 -2.0209 0.13763\n#> hp -0.018 0.0119 -1.52 0.1290 3.0 -0.0413 0.00525\n#> wt -3.167 0.7406 -4.28 <0.001 15.7 -4.6186 -1.71536\n#> \n#> Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, cyl, hp, wt \n#> Type: response\n\n\n24.2.4 Counterfactual Average Marginal Effects\nThe at argument of the margins package emulates Stata by fixing the values of some variables at user-specified values, and by replicating the full dataset several times for each combination of the supplied values (see the Stata section below). For example, if the dataset includes 32 rows and the user calls at=list(cyl=c(4, 6)), margins will compute 64 unit-level marginal effects estimates:\n\ndat <- mtcars\ndat$cyl <- factor(dat$cyl)\nmod <- lm(mpg ~ cyl * hp + wt, data = mtcars)\n\nmar <- margins(mod, at = list(cyl = c(4, 6, 8)))\nsummary(mar)\n#> factor cyl AME SE z p lower upper\n#> cyl 4.0000 0.0381 0.5999 0.0636 0.9493 -1.1376 1.2139\n#> cyl 6.0000 0.0381 0.5999 0.0636 0.9493 -1.1376 1.2138\n#> cyl 8.0000 0.0381 0.5999 0.0636 0.9493 -1.1376 1.2139\n#> hp 4.0000 -0.0878 0.0267 -3.2937 0.0010 -0.1400 -0.0355\n#> hp 6.0000 -0.0499 0.0154 -3.2397 0.0012 -0.0800 -0.0197\n#> hp 8.0000 -0.0120 0.0108 -1.1065 0.2685 -0.0332 0.0092\n#> wt 4.0000 -3.1198 0.6613 -4.7176 0.0000 -4.4160 -1.8236\n#> wt 6.0000 -3.1198 0.6613 -4.7175 0.0000 -4.4160 -1.8236\n#> wt 8.0000 -3.1198 0.6613 -4.7175 0.0000 -4.4160 -1.8236\n\navg_slopes(\n mod,\n by = \"cyl\",\n newdata = datagridcf(cyl = c(4, 6, 8)))\n#> \n#> Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> cyl mean(dY/dX) 4 0.0381 0.6000 0.0636 0.9493 0.1 -1.1377 1.21402\n#> cyl mean(dY/dX) 6 0.0381 0.5998 0.0636 0.9493 0.1 -1.1375 1.21378\n#> cyl mean(dY/dX) 8 0.0381 0.5999 0.0636 0.9493 0.1 -1.1377 1.21396\n#> hp mean(dY/dX) 4 -0.0878 0.0267 -3.2937 <0.001 10.0 -0.1400 -0.03555\n#> hp mean(dY/dX) 6 -0.0499 0.0154 -3.2398 0.0012 9.7 -0.0800 -0.01970\n#> hp mean(dY/dX) 8 -0.0120 0.0108 -1.1065 0.2685 1.9 -0.0332 0.00923\n#> wt mean(dY/dX) 4 -3.1198 0.6613 -4.7176 <0.001 18.7 -4.4160 -1.82366\n#> wt mean(dY/dX) 6 -3.1198 0.6613 -4.7176 <0.001 18.7 -4.4160 -1.82366\n#> wt mean(dY/dX) 8 -3.1198 0.6613 -4.7176 <0.001 18.7 -4.4160 -1.82366\n#> \n#> Columns: term, contrast, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \n#> Type: response\n\n\n24.2.5 Adjusted Predictions\nThe syntax to compute adjusted predictions using the predictions() package or marginaleffects is very similar:\n\nprediction::prediction(mod) |> head()\n#> mpg cyl disp hp drat wt qsec vs am gear carb fitted se.fitted\n#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 21.90488 0.6927034\n#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 21.10933 0.6266557\n#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 25.64753 0.6652076\n#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 20.04859 0.6041400\n#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 17.25445 0.7436172\n#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 19.53360 0.6436862\n\nmarginaleffects::predictions(mod) |> head()\n#> \n#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 21.9 0.693 31.6 <0.001 726.6 20.5 23.3\n#> 21.1 0.627 33.7 <0.001 823.9 19.9 22.3\n#> 25.6 0.665 38.6 <0.001 Inf 24.3 27.0\n#> 20.0 0.604 33.2 <0.001 799.8 18.9 21.2\n#> 17.3 0.744 23.2 <0.001 393.2 15.8 18.7\n#> 19.5 0.644 30.3 <0.001 669.5 18.3 20.8\n#> \n#> Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, cyl, hp, wt \n#> Type: response"
+ "text": "24.2 margins and prediction\n\nThe margins and prediction packages for R were designed by Thomas Leeper to emulate the behavior of the margins command from Stata. These packages are trailblazers and strongly influenced the development of marginaleffects. The main benefits of marginaleffects over these packages are:\n\nSupport more model types\nFaster\nMemory efficient\nPlots using ggplot2 instead of Base R\nMore extensive test suite\nActive development\n\nThe syntax of the two packages is very similar.\n\n24.2.1 Average Marginal Effects\n\nlibrary(margins)\nlibrary(marginaleffects)\n\nmod <- lm(mpg ~ cyl + hp + wt, data = mtcars)\n\nmar <- margins(mod)\nsummary(mar)\n#> factor AME SE z p lower upper\n#> cyl -0.9416 0.5509 -1.7092 0.0874 -2.0214 0.1382\n#> hp -0.0180 0.0119 -1.5188 0.1288 -0.0413 0.0052\n#> wt -3.1670 0.7406 -4.2764 0.0000 -4.6185 -1.7155\n\nmfx <- slopes(mod)\n\n\n24.2.2 Individual-Level Marginal Effects\nMarginal effects in a user-specified data frame:\n\nhead(data.frame(mar))\n#> mpg cyl disp hp drat wt qsec vs am gear carb fitted se.fitted dydx_cyl dydx_hp dydx_wt Var_dydx_cyl Var_dydx_hp Var_dydx_wt X_weights X_at_number\n#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 22.82043 0.6876212 -0.9416168 -0.0180381 -3.166973 0.3035104 0.0001410451 0.5484521 NA 1\n#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 22.01285 0.6056817 -0.9416168 -0.0180381 -3.166973 0.3035104 0.0001410451 0.5484521 NA 1\n#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 25.96040 0.7349593 -0.9416168 -0.0180381 -3.166973 0.3035104 0.0001410451 0.5484521 NA 1\n#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 20.93608 0.5800910 -0.9416168 -0.0180381 -3.166973 0.3035104 0.0001410451 0.5484521 NA 1\n#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 17.16780 0.8322986 -0.9416168 -0.0180381 -3.166973 0.3035104 0.0001410451 0.5484521 NA 1\n#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 20.25036 0.6638322 -0.9416168 -0.0180381 -3.166973 0.3035104 0.0001410451 0.5484521 NA 1\n\nhead(mfx)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> cyl -0.942 0.550 -1.71 0.0871 3.5 -2.02 0.137\n#> cyl -0.942 0.550 -1.71 0.0871 3.5 -2.02 0.137\n#> cyl -0.942 0.550 -1.71 0.0871 3.5 -2.02 0.137\n#> cyl -0.942 0.550 -1.71 0.0871 3.5 -2.02 0.137\n#> cyl -0.942 0.551 -1.71 0.0875 3.5 -2.02 0.138\n#> cyl -0.942 0.550 -1.71 0.0871 3.5 -2.02 0.137\n#> \n#> Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, cyl, hp, wt \n#> Type: response\nnd <- data.frame(cyl = 4, hp = 110, wt = 3)\n\n\n24.2.3 Marginal Effects at the Mean\n\nmar <- margins(mod, data = data.frame(prediction::mean_or_mode(mtcars)), unit_ses = TRUE)\ndata.frame(mar)\n#> mpg cyl disp hp drat wt qsec vs am gear carb fitted se.fitted dydx_cyl dydx_hp dydx_wt Var_dydx_cyl Var_dydx_hp Var_dydx_wt SE_dydx_cyl SE_dydx_hp SE_dydx_wt X_weights X_at_number\n#> 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125 20.09062 0.4439832 -0.9416168 -0.0180381 -3.166973 0.3035013 0.0001410453 0.54846 0.5509096 0.01187625 0.7405808 NA 1\n\nslopes(mod, newdata = \"mean\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> cyl -0.942 0.5510 -1.71 0.0875 3.5 -2.0216 0.13833\n#> hp -0.018 0.0119 -1.52 0.1290 3.0 -0.0413 0.00525\n#> wt -3.167 0.7406 -4.28 <0.001 15.7 -4.6185 -1.71549\n#> \n#> Columns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, cyl, hp, wt \n#> Type: response\n\n\n24.2.4 Counterfactual Average Marginal Effects\nThe at argument of the margins package emulates Stata by fixing the values of some variables at user-specified values, and by replicating the full dataset several times for each combination of the supplied values (see the Stata section below). For example, if the dataset includes 32 rows and the user calls at=list(cyl=c(4, 6)), margins will compute 64 unit-level marginal effects estimates:\n\ndat <- mtcars\ndat$cyl <- factor(dat$cyl)\nmod <- lm(mpg ~ cyl * hp + wt, data = mtcars)\n\nmar <- margins(mod, at = list(cyl = c(4, 6, 8)))\nsummary(mar)\n#> factor cyl AME SE z p lower upper\n#> cyl 4.0000 0.0381 0.6000 0.0636 0.9493 -1.1378 1.2141\n#> cyl 6.0000 0.0381 0.5999 0.0636 0.9493 -1.1376 1.2139\n#> cyl 8.0000 0.0381 0.5999 0.0636 0.9493 -1.1376 1.2139\n#> hp 4.0000 -0.0878 0.0267 -3.2937 0.0010 -0.1400 -0.0355\n#> hp 6.0000 -0.0499 0.0154 -3.2397 0.0012 -0.0800 -0.0197\n#> hp 8.0000 -0.0120 0.0108 -1.1065 0.2685 -0.0332 0.0092\n#> wt 4.0000 -3.1198 0.6613 -4.7175 0.0000 -4.4160 -1.8236\n#> wt 6.0000 -3.1198 0.6613 -4.7175 0.0000 -4.4160 -1.8236\n#> wt 8.0000 -3.1198 0.6613 -4.7175 0.0000 -4.4160 -1.8236\n\navg_slopes(\n mod,\n by = \"cyl\",\n newdata = datagridcf(cyl = c(4, 6, 8)))\n#> \n#> Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> cyl mean(dY/dX) 4 0.0381 0.5999 0.0636 0.9493 0.1 -1.1377 1.21401\n#> cyl mean(dY/dX) 6 0.0381 0.5998 0.0636 0.9493 0.1 -1.1375 1.21381\n#> cyl mean(dY/dX) 8 0.0381 0.5999 0.0636 0.9493 0.1 -1.1376 1.21389\n#> hp mean(dY/dX) 4 -0.0878 0.0267 -3.2936 <0.001 10.0 -0.1400 -0.03554\n#> hp mean(dY/dX) 6 -0.0499 0.0154 -3.2397 0.0012 9.7 -0.0800 -0.01970\n#> hp mean(dY/dX) 8 -0.0120 0.0108 -1.1065 0.2685 1.9 -0.0332 0.00923\n#> wt mean(dY/dX) 4 -3.1198 0.6613 -4.7175 <0.001 18.7 -4.4160 -1.82362\n#> wt mean(dY/dX) 6 -3.1198 0.6613 -4.7174 <0.001 18.7 -4.4160 -1.82362\n#> wt mean(dY/dX) 8 -3.1198 0.6613 -4.7174 <0.001 18.7 -4.4160 -1.82362\n#> \n#> Columns: term, contrast, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \n#> Type: response\n\n\n24.2.5 Adjusted Predictions\nThe syntax to compute adjusted predictions using the predictions() package or marginaleffects is very similar:\n\nprediction::prediction(mod) |> head()\n#> mpg cyl disp hp drat wt qsec vs am gear carb fitted se.fitted\n#> 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 21.90488 0.6927034\n#> 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 21.10933 0.6266557\n#> 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 25.64753 0.6652076\n#> 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 20.04859 0.6041400\n#> 5 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 17.25445 0.7436172\n#> 6 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 19.53360 0.6436862\n\nmarginaleffects::predictions(mod) |> head()\n#> \n#> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 21.9 0.693 31.6 <0.001 726.6 20.5 23.3\n#> 21.1 0.627 33.7 <0.001 823.9 19.9 22.3\n#> 25.6 0.665 38.6 <0.001 Inf 24.3 27.0\n#> 20.0 0.604 33.2 <0.001 799.8 18.9 21.2\n#> 17.3 0.744 23.2 <0.001 393.2 15.8 18.7\n#> 19.5 0.644 30.3 <0.001 669.5 18.3 20.8\n#> \n#> Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, mpg, cyl, hp, wt \n#> Type: response"
},
{
"objectID": "articles/alternative_software.html#stata",
"href": "articles/alternative_software.html#stata",
"title": "\n24 Alternative Software\n",
"section": "\n24.3 Stata\n",
- "text": "24.3 Stata\n\nStata is a good but expensive software package for statistical analysis. It is published by StataCorp LLC. This section compares Stata’s margins command to marginaleffects.\nThe results produced by marginaleffects are extensively tested against Stata. See the test suite for a list of the dozens of models where we compared estimates and standard errors.\n\n24.3.1 Average Marginal Effect (AMEs)\nMarginal effects are unit-level quantities. To compute “average marginal effects”, we first calculate marginal effects for each observation in a dataset. Then, we take the mean of those unit-level marginal effects.\n\n24.3.1.1 Stata\nBoth Stata’s margins command and the slopes() function can calculate average marginal effects (AMEs). Here is an example showing how to estimate AMEs in Stata:\nquietly reg mpg cyl hp wt\nmargins, dydx(*)\n\nAverage marginal effects Number of obs = 32\nModel VCE : OLS\n \nExpression : Linear prediction, predict()\ndy/dx w.r.t. : cyl hp wt\n \n------------------------------------------------------------------------------\n | Delta-method\n | dy/dx Std. Err. t P>|t| [95% Conf. Interval]\n------------------------------------------------------------------------------\ncyl | -.9416168 .5509164 -1.71 0.098 -2.070118 .1868842\n hp | -.0180381 .0118762 -1.52 0.140 -.0423655 .0062893\n wt | -3.166973 .7405759 -4.28 0.000 -4.683974 -1.649972\n------------------------------------------------------------------------------\n\n24.3.1.2 marginaleffects\nThe same results can be obtained with slopes() and summary() like this:\n\nlibrary(\"marginaleffects\")\nmod <- lm(mpg ~ cyl + hp + wt, data = mtcars)\navg_slopes(mod)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> cyl -0.942 0.5507 -1.71 0.0873 3.5 -2.0209 0.13770\n#> hp -0.018 0.0119 -1.52 0.1288 3.0 -0.0413 0.00524\n#> wt -3.167 0.7406 -4.28 <0.001 15.7 -4.6185 -1.71549\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\nNote that Stata reports t statistics while marginaleffects reports Z. This produces slightly different p-values because this model has low degrees of freedom: mtcars only has 32 rows\n\n24.3.2 Counterfactual Marginal Effects\nA “counterfactual marginal effect” is a special quantity obtained by replicating a dataset while fixing some regressor to user-defined values.\nConcretely, Stata computes counterfactual marginal effects in 3 steps:\n\nDuplicate the whole dataset 3 times and sets the values of cyl to the three specified values in each of those subsets.\nCalculate marginal effects for each observation in that large grid.\nTake the average of marginal effects for each value of the variable of interest.\n\n\n24.3.2.1 Stata\nWith the at argument, Stata’s margins command estimates average counterfactual marginal effects. Here is an example:\nquietly reg mpg i.cyl##c.hp wt\nmargins, dydx(hp) at(cyl = (4 6 8))\n\nAverage marginal effects Number of obs = 32\nModel VCE : OLS\n\nExpression : Linear prediction, predict()\ndy/dx w.r.t. : hp\n\n1._at : cyl = 4\n\n2._at : cyl = 6\n\n3._at : cyl = 8\n\n------------------------------------------------------------------------------\n | Delta-method\n | dy/dx Std. Err. t P>|t| [95% Conf. Interval]\n-------------+----------------------------------------------------------------\nhp |\n _at |\n 1 | -.099466 .0348665 -2.85 0.009 -.1712749 -.0276571\n 2 | -.0213768 .038822 -0.55 0.587 -.1013323 .0585787\n 3 | -.013441 .0125138 -1.07 0.293 -.0392137 .0123317\n------------------------------------------------------------------------------\n\n\n24.3.2.2 marginaleffects\nYou can estimate average counterfactual marginal effects with slopes() by using the datagridcf() to create a counterfactual dataset in which the full original dataset is replicated for each potential value of the cyl variable. Then, we tell the by argument to average within groups:\n\nmod <- lm(mpg ~ as.factor(cyl) * hp + wt, data = mtcars)\n\navg_slopes(\n mod,\n variables = \"hp\",\n by = \"cyl\",\n newdata = datagridcf(cyl = c(4, 6, 8)))\n#> \n#> Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp mean(dY/dX) 4 -0.0995 0.0349 -2.853 0.00433 7.9 -0.1678 -0.0311\n#> hp mean(dY/dX) 6 -0.0214 0.0388 -0.551 0.58188 0.8 -0.0975 0.0547\n#> hp mean(dY/dX) 8 -0.0134 0.0125 -1.074 0.28278 1.8 -0.0380 0.0111\n#> \n#> Columns: term, contrast, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \n#> Type: response\n\nThis is equivalent to taking the group-wise mean of observation-level marginal effects (without the by argument):\n\nmfx <- slopes(\n mod,\n variables = \"hp\",\n newdata = datagridcf(cyl = c(4, 6, 8)))\naggregate(estimate ~ term + cyl, data = mfx, FUN = mean)\n#> term cyl estimate\n#> 1 hp 4 -0.09946598\n#> 2 hp 6 -0.02137679\n#> 3 hp 8 -0.01344103\n\n\nNote that following Stata, the standard errors for group-averaged marginal effects are computed by taking the “Jacobian at the mean:”\n\nJ <- attr(mfx, \"jacobian\")\nJ_mean <- aggregate(J, by = list(mfx$cyl), FUN = mean)\nJ_mean <- as.matrix(J_mean[, 2:ncol(J_mean)])\nsqrt(diag(J_mean %*% vcov(mod) %*% t(J_mean)))\n#> [1] 0.03486633 0.03882199 0.01251382\n\n\n24.3.3 Average Counterfactual Adjusted Predictions\n\n24.3.3.1 Stata\nJust like Stata’s margins command computes average counterfactual marginal effects, it can also estimate average counterfactual adjusted predictions.\nHere is an example:\nquietly reg mpg i.cyl##c.hp wt\nmargins, at(cyl = (4 6 8))\n\nPredictive margins Number of obs = 32\nModel VCE : OLS\n\nExpression : Linear prediction, predict()\n\n1._at : cyl = 4\n\n2._at : cyl = 6\n\n3._at : cyl = 8\n\n------------------------------------------------------------------------------\n | Delta-method\n | Margin Std. Err. t P>|t| [95% Conf. Interval]\n-------------+----------------------------------------------------------------\n _at |\n 1 | 17.44233 2.372914 7.35 0.000 12.55522 22.32944\n 2 | 18.9149 1.291483 14.65 0.000 16.25505 21.57476\n 3 | 18.33318 1.123874 16.31 0.000 16.01852 20.64785\n------------------------------------------------------------------------------\nAgain, this is what Stata does in the background:\n\nIt duplicates the whole dataset 3 times and sets the values of cyl to the three specified values in each of those subsets.\nIt calculates predictions for that large grid.\nIt takes the average prediction for each value of cyl.\n\nIn other words, average counterfactual adjusted predictions as implemented by Stata are a hybrid between predictions at the observed values (the default in marginaleffects::predictions) and predictions at representative values.\n\n24.3.3.2 marginaleffects\nYou can estimate average counterfactual adjusted predictions with predictions() by, first, setting the grid_type argument of datagrid() to \"counterfactual\" and, second, by averaging the predictions using the by argument of summary(), or a manual function like dplyr::summarise().\n\nmod <- lm(mpg ~ as.factor(cyl) * hp + wt, data = mtcars)\n\npredictions(\n mod,\n by = \"cyl\",\n newdata = datagridcf(cyl = c(4, 6, 8)))\n#> \n#> cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 4 17.4 2.37 7.35 <0.001 42.2 12.8 22.1\n#> 6 18.9 1.29 14.65 <0.001 158.9 16.4 21.4\n#> 8 18.3 1.12 16.31 <0.001 196.3 16.1 20.5\n#> \n#> Columns: cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\npredictions(\n mod,\n newdata = datagridcf(cyl = c(4, 6, 8))) |>\n group_by(cyl) |>\n summarize(AAP = mean(estimate))\n#> # A tibble: 3 × 2\n#> cyl AAP\n#> <fct> <dbl>\n#> 1 4 17.4\n#> 2 6 18.9\n#> 3 8 18.3"
+ "text": "24.3 Stata\n\nStata is a good but expensive software package for statistical analysis. It is published by StataCorp LLC. This section compares Stata’s margins command to marginaleffects.\nThe results produced by marginaleffects are extensively tested against Stata. See the test suite for a list of the dozens of models where we compared estimates and standard errors.\n\n24.3.1 Average Marginal Effect (AMEs)\nMarginal effects are unit-level quantities. To compute “average marginal effects”, we first calculate marginal effects for each observation in a dataset. Then, we take the mean of those unit-level marginal effects.\n\n24.3.1.1 Stata\nBoth Stata’s margins command and the slopes() function can calculate average marginal effects (AMEs). Here is an example showing how to estimate AMEs in Stata:\nquietly reg mpg cyl hp wt\nmargins, dydx(*)\n\nAverage marginal effects Number of obs = 32\nModel VCE : OLS\n \nExpression : Linear prediction, predict()\ndy/dx w.r.t. : cyl hp wt\n \n------------------------------------------------------------------------------\n | Delta-method\n | dy/dx Std. Err. t P>|t| [95% Conf. Interval]\n------------------------------------------------------------------------------\ncyl | -.9416168 .5509164 -1.71 0.098 -2.070118 .1868842\n hp | -.0180381 .0118762 -1.52 0.140 -.0423655 .0062893\n wt | -3.166973 .7405759 -4.28 0.000 -4.683974 -1.649972\n------------------------------------------------------------------------------\n\n24.3.1.2 marginaleffects\nThe same results can be obtained with slopes() and summary() like this:\n\nlibrary(\"marginaleffects\")\nmod <- lm(mpg ~ cyl + hp + wt, data = mtcars)\navg_slopes(mod)\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> cyl -0.942 0.5506 -1.71 0.0872 3.5 -2.0208 0.13753\n#> hp -0.018 0.0119 -1.52 0.1288 3.0 -0.0413 0.00524\n#> wt -3.167 0.7406 -4.28 <0.001 15.7 -4.6185 -1.71546\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\nNote that Stata reports t statistics while marginaleffects reports Z. This produces slightly different p-values because this model has low degrees of freedom: mtcars only has 32 rows\n\n24.3.2 Counterfactual Marginal Effects\nA “counterfactual marginal effect” is a special quantity obtained by replicating a dataset while fixing some regressor to user-defined values.\nConcretely, Stata computes counterfactual marginal effects in 3 steps:\n\nDuplicate the whole dataset 3 times and sets the values of cyl to the three specified values in each of those subsets.\nCalculate marginal effects for each observation in that large grid.\nTake the average of marginal effects for each value of the variable of interest.\n\n\n24.3.2.1 Stata\nWith the at argument, Stata’s margins command estimates average counterfactual marginal effects. Here is an example:\nquietly reg mpg i.cyl##c.hp wt\nmargins, dydx(hp) at(cyl = (4 6 8))\n\nAverage marginal effects Number of obs = 32\nModel VCE : OLS\n\nExpression : Linear prediction, predict()\ndy/dx w.r.t. : hp\n\n1._at : cyl = 4\n\n2._at : cyl = 6\n\n3._at : cyl = 8\n\n------------------------------------------------------------------------------\n | Delta-method\n | dy/dx Std. Err. t P>|t| [95% Conf. Interval]\n-------------+----------------------------------------------------------------\nhp |\n _at |\n 1 | -.099466 .0348665 -2.85 0.009 -.1712749 -.0276571\n 2 | -.0213768 .038822 -0.55 0.587 -.1013323 .0585787\n 3 | -.013441 .0125138 -1.07 0.293 -.0392137 .0123317\n------------------------------------------------------------------------------\n\n\n24.3.2.2 marginaleffects\nYou can estimate average counterfactual marginal effects with slopes() by using the datagridcf() to create a counterfactual dataset in which the full original dataset is replicated for each potential value of the cyl variable. Then, we tell the by argument to average within groups:\n\nmod <- lm(mpg ~ as.factor(cyl) * hp + wt, data = mtcars)\n\navg_slopes(\n mod,\n variables = \"hp\",\n by = \"cyl\",\n newdata = datagridcf(cyl = c(4, 6, 8)))\n#> \n#> Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> hp mean(dY/dX) 4 -0.0995 0.0349 -2.853 0.00433 7.9 -0.1678 -0.0311\n#> hp mean(dY/dX) 6 -0.0214 0.0388 -0.551 0.58187 0.8 -0.0975 0.0547\n#> hp mean(dY/dX) 8 -0.0134 0.0125 -1.074 0.28278 1.8 -0.0380 0.0111\n#> \n#> Columns: term, contrast, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \n#> Type: response\n\nThis is equivalent to taking the group-wise mean of observation-level marginal effects (without the by argument):\n\nmfx <- slopes(\n mod,\n variables = \"hp\",\n newdata = datagridcf(cyl = c(4, 6, 8)))\naggregate(estimate ~ term + cyl, data = mfx, FUN = mean)\n#> term cyl estimate\n#> 1 hp 4 -0.09946598\n#> 2 hp 6 -0.02137679\n#> 3 hp 8 -0.01344103\n\n\nNote that following Stata, the standard errors for group-averaged marginal effects are computed by taking the “Jacobian at the mean:”\n\nJ <- attr(mfx, \"jacobian\")\nJ_mean <- aggregate(J, by = list(mfx$cyl), FUN = mean)\nJ_mean <- as.matrix(J_mean[, 2:ncol(J_mean)])\nsqrt(diag(J_mean %*% vcov(mod) %*% t(J_mean)))\n#> [1] 0.03486654 0.03882093 0.01251377\n\n\n24.3.3 Average Counterfactual Adjusted Predictions\n\n24.3.3.1 Stata\nJust like Stata’s margins command computes average counterfactual marginal effects, it can also estimate average counterfactual adjusted predictions.\nHere is an example:\nquietly reg mpg i.cyl##c.hp wt\nmargins, at(cyl = (4 6 8))\n\nPredictive margins Number of obs = 32\nModel VCE : OLS\n\nExpression : Linear prediction, predict()\n\n1._at : cyl = 4\n\n2._at : cyl = 6\n\n3._at : cyl = 8\n\n------------------------------------------------------------------------------\n | Delta-method\n | Margin Std. Err. t P>|t| [95% Conf. Interval]\n-------------+----------------------------------------------------------------\n _at |\n 1 | 17.44233 2.372914 7.35 0.000 12.55522 22.32944\n 2 | 18.9149 1.291483 14.65 0.000 16.25505 21.57476\n 3 | 18.33318 1.123874 16.31 0.000 16.01852 20.64785\n------------------------------------------------------------------------------\nAgain, this is what Stata does in the background:\n\nIt duplicates the whole dataset 3 times and sets the values of cyl to the three specified values in each of those subsets.\nIt calculates predictions for that large grid.\nIt takes the average prediction for each value of cyl.\n\nIn other words, average counterfactual adjusted predictions as implemented by Stata are a hybrid between predictions at the observed values (the default in marginaleffects::predictions) and predictions at representative values.\n\n24.3.3.2 marginaleffects\nYou can estimate average counterfactual adjusted predictions with predictions() by, first, setting the grid_type argument of datagrid() to \"counterfactual\" and, second, by averaging the predictions using the by argument of summary(), or a manual function like dplyr::summarise().\n\nmod <- lm(mpg ~ as.factor(cyl) * hp + wt, data = mtcars)\n\npredictions(\n mod,\n by = \"cyl\",\n newdata = datagridcf(cyl = c(4, 6, 8)))\n#> \n#> cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> 4 17.4 2.37 7.35 <0.001 42.2 12.8 22.1\n#> 6 18.9 1.29 14.65 <0.001 158.9 16.4 21.4\n#> 8 18.3 1.12 16.31 <0.001 196.3 16.1 20.5\n#> \n#> Columns: cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\npredictions(\n mod,\n newdata = datagridcf(cyl = c(4, 6, 8))) |>\n group_by(cyl) |>\n summarize(AAP = mean(estimate))\n#> # A tibble: 3 × 2\n#> cyl AAP\n#> <fct> <dbl>\n#> 1 4 17.4\n#> 2 6 18.9\n#> 3 8 18.3"
},
{
"objectID": "articles/alternative_software.html#brmsmargins",
@@ -942,7 +949,7 @@
"href": "articles/uncertainty.html#simulation-based-inference",
"title": "\n29 Standard Errors\n",
"section": "\n29.5 Simulation-based inference",
- "text": "29.5 Simulation-based inference\nmarginaleffects offers an experimental inferences() function to conduct simulation-based inference following the strategy proposed by Krinsky & Robb (1986):\n\nDraw iter sets of simulated coefficients from a multivariate normal distribution with mean equal to the original model’s estimated coefficients and variance equal to the model’s variance-covariance matrix (classical, “HC3”, or other).\nUse the iter sets of coefficients to compute iter sets of estimands: predictions, comparisons, or slopes.\nTake quantiles of the resulting distribution of estimands to obtain a confidence interval and the standard deviation of simulated estimates to estimate the standard error.\n\nHere are a few examples:\n\nlibrary(marginaleffects)\nlibrary(ggplot2)\nlibrary(ggdist)\n\nmod <- glm(vs ~ hp * wt + factor(gear), data = mtcars, family = binomial)\n\nmod |> predictions() |> inferences(method = \"simulation\")\n#> \n#> Estimate Std. Error 2.5 % 97.5 %\n#> 7.84e-01 0.208 2.51e-01 0.976\n#> 7.84e-01 0.179 3.10e-01 0.966\n#> 8.98e-01 0.147 4.62e-01 0.991\n#> 8.74e-01 0.229 1.76e-01 0.995\n#> 1.31e-02 0.204 4.51e-05 0.834\n#> --- 22 rows omitted. See ?avg_predictions and ?print.marginaleffects --- \n#> 3.83e-01 0.296 1.53e-02 0.947\n#> 1.21e-06 0.134 1.50e-12 0.511\n#> 6.89e-03 0.150 2.19e-05 0.563\n#> 8.07e-11 0.168 2.22e-16 0.768\n#> 7.95e-01 0.181 3.05e-01 0.971\n#> Columns: rowid, estimate, std.error, conf.low, conf.high, vs, hp, wt, gear \n#> Type: response\n\nmod |> avg_slopes(vcov = ~gear) |> inferences(method = \"simulation\")\n#> \n#> Term Contrast Estimate Std. Error 2.5 % 97.5 %\n#> gear 4 - 3 -3.92e-02 0.05434 -0.0905 0.13939\n#> gear 5 - 3 -1.93e-01 0.27127 -0.4874 0.33403\n#> hp dY/dX -5.02e-03 0.00452 -0.0115 0.00444\n#> wt dY/dX -4.06e-05 0.31413 -0.6073 0.70078\n#> \n#> Columns: term, contrast, estimate, std.error, conf.low, conf.high \n#> Type: response\n\nSince simulation based inference generates iter estimates of the quantities of interest, we can treat them similarly to draws from the posterior distribution in bayesian models. For example, we can extract draws using the posterior_draws() function, and plot their distributions using packages likeggplot2 and ggdist:\n\nmod |>\n avg_comparisons(variables = \"gear\") |>\n inferences(method = \"simulation\") |>\n posterior_draws(\"rvar\") |>\n ggplot(aes(y = contrast, xdist = rvar)) +\n stat_slabinterval()"
+ "text": "29.5 Simulation-based inference\nmarginaleffects offers an experimental inferences() function to conduct simulation-based inference following the strategy proposed by Krinsky & Robb (1986):\n\nDraw iter sets of simulated coefficients from a multivariate normal distribution with mean equal to the original model’s estimated coefficients and variance equal to the model’s variance-covariance matrix (classical, “HC3”, or other).\nUse the iter sets of coefficients to compute iter sets of estimands: predictions, comparisons, or slopes.\nTake quantiles of the resulting distribution of estimands to obtain a confidence interval and the standard deviation of simulated estimates to estimate the standard error.\n\nHere are a few examples:\n\nlibrary(marginaleffects)\nlibrary(ggplot2)\nlibrary(ggdist)\n\nmod <- glm(vs ~ hp * wt + factor(gear), data = mtcars, family = binomial)\n\nmod |> predictions() |> inferences(method = \"simulation\")\n#> \n#> Estimate Std. Error 2.5 % 97.5 %\n#> 7.84e-01 0.190 2.95e-01 0.976\n#> 7.84e-01 0.162 3.50e-01 0.965\n#> 8.98e-01 0.144 4.22e-01 0.990\n#> 8.74e-01 0.234 1.67e-01 0.995\n#> 1.31e-02 0.195 7.76e-05 0.768\n#> --- 22 rows omitted. See ?avg_predictions and ?print.marginaleffects --- \n#> 3.83e-01 0.300 1.43e-02 0.955\n#> 1.21e-06 0.125 6.44e-12 0.430\n#> 6.89e-03 0.159 2.69e-05 0.627\n#> 8.07e-11 0.160 2.22e-16 0.778\n#> 7.95e-01 0.164 3.52e-01 0.968\n#> Columns: rowid, estimate, std.error, conf.low, conf.high, vs, hp, wt, gear \n#> Type: response\n\nmod |> avg_slopes(vcov = ~gear) |> inferences(method = \"simulation\")\n#> \n#> Term Contrast Estimate Std. Error 2.5 % 97.5 %\n#> gear 4 - 3 -3.92e-02 0.05744 -0.0910 0.12941\n#> gear 5 - 3 -1.93e-01 0.27384 -0.4888 0.33715\n#> hp dY/dX -5.02e-03 0.00439 -0.0113 0.00472\n#> wt dY/dX -4.06e-05 0.31283 -0.5676 0.72261\n#> \n#> Columns: term, contrast, estimate, std.error, conf.low, conf.high \n#> Type: response\n\nSince simulation based inference generates iter estimates of the quantities of interest, we can treat them similarly to draws from the posterior distribution in bayesian models. For example, we can extract draws using the posterior_draws() function, and plot their distributions using packages likeggplot2 and ggdist:\n\nmod |>\n avg_comparisons(variables = \"gear\") |>\n inferences(method = \"simulation\") |>\n posterior_draws(\"rvar\") |>\n ggplot(aes(y = contrast, xdist = rvar)) +\n stat_slabinterval()"
},
{
"objectID": "articles/uncertainty.html#bootstrap",
@@ -963,7 +970,7 @@
"href": "articles/uncertainty.html#numerical-derivatives-sensitivity-to-step-size",
"title": "\n29 Standard Errors\n",
"section": "\n29.8 Numerical derivatives: Sensitivity to step size",
- "text": "29.8 Numerical derivatives: Sensitivity to step size\n\ndat <- read.csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv\")\ndat$large_penguin <- ifelse(dat$body_mass_g > median(dat$body_mass_g, na.rm = TRUE), 1, 0)\nmod <- glm(large_penguin ~ bill_length_mm * flipper_length_mm + species, data = dat, family = binomial)\n\nmarginaleffects uses numerical derivatives in two contexts:\n\nEstimate the partial derivatives reported by slopes() function.\n\nCentered finite difference\n\n\\(\\frac{f(x + \\varepsilon_1 / 2) - f(x - \\varepsilon_1 / 2)}{\\varepsilon_1}\\), where we take the derivative with respect to a predictor of interest, and \\(f\\) is the predict() function.\n\n\nEstimate standard errors using the delta method.\n\nForward finite difference\n\n\\(\\frac{g(\\hat{\\beta}) - g(\\hat{\\beta} + \\varepsilon_2)}{\\varepsilon_2}\\), where we take the derivative with respect to a model’s coefficients, and \\(g\\) is a marginaleffects function which returns some quantity of interest (e.g., slope, marginal means, predictions, etc.)\n\n\n\nNote that the step sizes used in those two contexts can differ. If the variables and coefficients have very different scales, it may make sense to use different values for \\(\\varepsilon_1\\) and \\(\\varepsilon_2\\).\nBy default, \\(\\varepsilon_1\\) is set to 1e-4 times the range of the variable with respect to which we are taking the derivative. By default, \\(\\varepsilon_2\\) is set to the maximum value of 1e-8, or 1e-4 times the smallest absolute coefficient estimate. (These choices are arbitrary, but I have found that in practice, smaller values can produce unstable results.)\n\\(\\varepsilon_1\\) can be controlled by the eps argument of the slopes() function. \\(\\varepsilon_2\\) can be controlled by setting a global option which tells marginaleffects to compute the jacobian using the numDeriv package instead of its own internal functions. This allows more control over the step size, and also gives access to other differentiation methods, such as Richardson’s. To use numDeriv, we define a list of arguments which will be pushed forward to numDeriv::jacobian:\n\navg_slopes(mod, variables = \"bill_length_mm\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm 0.0279 0.00595 4.69 <0.001 18.4 0.0162 0.0395\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\noptions(marginaleffects_numDeriv = list(method = \"Richardson\"))\navg_slopes(mod, variables = \"bill_length_mm\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm 0.0279 0.00595 4.68 <0.001 18.4 0.0162 0.0395\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\noptions(marginaleffects_numDeriv = list(method = \"simple\", method.args = list(eps = 1e-3)))\navg_slopes(mod, variables = \"bill_length_mm\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm 0.0279 0.568 0.049 0.961 0.1 -1.09 1.14\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\noptions(marginaleffects_numDeriv = list(method = \"simple\", method.args = list(eps = 1e-5)))\navg_slopes(mod, variables = \"bill_length_mm\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm 0.0279 0.00601 4.64 <0.001 18.1 0.0161 0.0396\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\noptions(marginaleffects_numDeriv = list(method = \"simple\", method.args = list(eps = 1e-7)))\navg_slopes(mod, variables = \"bill_length_mm\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm 0.0279 0.00595 4.68 <0.001 18.4 0.0162 0.0395\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\nNotice that the standard errors can vary considerably when using different step sizes. It is good practice for analysts to consider the sensitivity of their results to this setting.\nNow, we illustrate the full process of standard error computation, using raw R code. First, we choose two step sizes:\n\neps1 <- 1e-5 # slope\neps2 <- 1e-7 # delta method\n\ns <- slopes(mod, newdata = head(dat, 3), variables = \"bill_length_mm\", eps = eps1)\nprint(s[, 1:5], digits = 6)\n#> \n#> Term Estimate Std. Error z\n#> bill_length_mm 0.0179765 0.00851142 2.11205\n#> bill_length_mm 0.0359630 0.01284280 2.80025\n#> bill_length_mm 0.0849071 0.02137283 3.97267\n#> \n#> Columns: rowid, term, estimate, std.error, statistic\n\nWe can get the same estimates manually with these steps:\n\nlinkinv <- mod$family$linkinv\n\n## increment the variable of interest by h\ndat_hi <- transform(dat, bill_length_mm = bill_length_mm + eps1)\n\n## model matrices: first 3 rows\nmm_lo <- insight::get_modelmatrix(mod, data = dat)[1:3,]\nmm_hi <- insight::get_modelmatrix(mod, data = dat_hi)[1:3,]\n\n## predictions\np_lo <- linkinv(mm_lo %*% coef(mod))\np_hi <- linkinv(mm_hi %*% coef(mod))\n\n## slopes\n(p_hi - p_lo) / eps1\n#> [,1]\n#> 1 0.01797653\n#> 2 0.03596304\n#> 3 0.08490712\n\nTo get standard errors, we build a jacobian matrix where each column holds derivatives of the vector valued slope function, with respect to each of the coefficients. Using the same example:\n\nb_lo <- b_hi <- coef(mod)\nb_hi[1] <- b_hi[1] + eps2\n\ndydx_lo <- (linkinv(mm_hi %*% b_lo) - linkinv(mm_lo %*% b_lo)) / eps1\ndydx_hi <- (linkinv(mm_hi %*% b_hi) - linkinv(mm_lo %*% b_hi)) / eps1\n(dydx_hi - dydx_lo) / eps2\n#> [,1]\n#> 1 0.01598027\n#> 2 0.02774170\n#> 3 0.02281508\n\nThis gives us the first column of \\(J\\), which we can recover in full from the marginaleffects object attribute:\n\nJ <- attr(s, \"jacobian\")\nJ\n#> (Intercept) bill_length_mm flipper_length_mm speciesChinstrap speciesGentoo bill_length_mm:flipper_length_mm\n#> [1,] 0.01598027 0.6775344 2.897231 0 0 122.6916\n#> [2,] 0.02774170 1.1958073 5.153128 0 0 222.4993\n#> [3,] 0.02281508 1.1500800 4.440059 0 0 224.0829\n\nTo build the full matrix, we would simply iterate through the coefficients, incrementing them one after the other. Finally, we get standard errors via:\n\nsqrt(diag(J %*% vcov(mod) %*% t(J)))\n#> [1] 0.008511418 0.012842804 0.021372833\n\nWhich corresponds to our original standard errors:\n\nprint(s[, 1:5], digits = 7)\n#> \n#> Term Estimate Std. Error z\n#> bill_length_mm 0.01797653 0.008511418 2.112049\n#> bill_length_mm 0.03596304 0.012842804 2.800248\n#> bill_length_mm 0.08490712 0.021372833 3.972666\n#> \n#> Columns: rowid, term, estimate, std.error, statistic\n\nReverting to default settings:\n\noptions(marginaleffects_numDeriv = NULL)\n\nNote that our default results for this model are very similar – but not exactly identical – to those generated by the margins. As should be expected, the results in margins are also very sensitive to the value of eps for this model:\n\nlibrary(margins)\nmargins(mod, variables = \"bill_length_mm\", data = head(dat, 3), unit_ses = TRUE)$SE_dydx_bill_length_mm\n#> [1] 0.008727955 0.012567120 0.021293270\n\nmargins(mod, variables = \"bill_length_mm\", data = head(dat, 3), eps = 1e-4, unit_ses = TRUE)$SE_dydx_bill_length_mm\n#> [1] 0.2269512 0.2255849 0.6636208\n\nmargins(mod, variables = \"bill_length_mm\", data = head(dat, 3), eps = 1e-5, unit_ses = TRUE)$SE_dydx_bill_length_mm\n#> [1] 0.02317077 0.02928266 0.05480281"
+ "text": "29.8 Numerical derivatives: Sensitivity to step size\n\ndat <- read.csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/palmerpenguins/penguins.csv\")\ndat$large_penguin <- ifelse(dat$body_mass_g > median(dat$body_mass_g, na.rm = TRUE), 1, 0)\nmod <- glm(large_penguin ~ bill_length_mm * flipper_length_mm + species, data = dat, family = binomial)\n\nmarginaleffects uses numerical derivatives in two contexts:\n\nEstimate the partial derivatives reported by slopes() function.\n\nCentered finite difference\n\n\\(\\frac{f(x + \\varepsilon_1 / 2) - f(x - \\varepsilon_1 / 2)}{\\varepsilon_1}\\), where we take the derivative with respect to a predictor of interest, and \\(f\\) is the predict() function.\n\n\nEstimate standard errors using the delta method.\n\nForward finite difference\n\n\\(\\frac{g(\\hat{\\beta}) - g(\\hat{\\beta} + \\varepsilon_2)}{\\varepsilon_2}\\), where we take the derivative with respect to a model’s coefficients, and \\(g\\) is a marginaleffects function which returns some quantity of interest (e.g., slope, marginal means, predictions, etc.)\n\n\n\nNote that the step sizes used in those two contexts can differ. If the variables and coefficients have very different scales, it may make sense to use different values for \\(\\varepsilon_1\\) and \\(\\varepsilon_2\\).\nBy default, \\(\\varepsilon_1\\) is set to 1e-4 times the range of the variable with respect to which we are taking the derivative. By default, \\(\\varepsilon_2\\) is set to the maximum value of 1e-8, or 1e-4 times the smallest absolute coefficient estimate. (These choices are arbitrary, but I have found that in practice, smaller values can produce unstable results.)\n\\(\\varepsilon_1\\) can be controlled by the eps argument of the slopes() function. \\(\\varepsilon_2\\) can be controlled by setting a global option which tells marginaleffects to compute the jacobian using the numDeriv package instead of its own internal functions. This allows more control over the step size, and also gives access to other differentiation methods, such as Richardson’s. To use numDeriv, we define a list of arguments which will be pushed forward to numDeriv::jacobian:\n\navg_slopes(mod, variables = \"bill_length_mm\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm 0.0279 0.00595 4.68 <0.001 18.4 0.0162 0.0395\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\noptions(marginaleffects_numDeriv = list(method = \"Richardson\"))\navg_slopes(mod, variables = \"bill_length_mm\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm 0.0279 0.00595 4.68 <0.001 18.4 0.0162 0.0395\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\noptions(marginaleffects_numDeriv = list(method = \"simple\", method.args = list(eps = 1e-3)))\navg_slopes(mod, variables = \"bill_length_mm\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm 0.0279 0.568 0.049 0.961 0.1 -1.09 1.14\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\noptions(marginaleffects_numDeriv = list(method = \"simple\", method.args = list(eps = 1e-5)))\navg_slopes(mod, variables = \"bill_length_mm\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm 0.0279 0.00601 4.64 <0.001 18.1 0.0161 0.0396\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\noptions(marginaleffects_numDeriv = list(method = \"simple\", method.args = list(eps = 1e-7)))\navg_slopes(mod, variables = \"bill_length_mm\")\n#> \n#> Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n#> bill_length_mm 0.0279 0.00595 4.68 <0.001 18.4 0.0162 0.0395\n#> \n#> Columns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \n#> Type: response\n\nNotice that the standard errors can vary considerably when using different step sizes. It is good practice for analysts to consider the sensitivity of their results to this setting.\nNow, we illustrate the full process of standard error computation, using raw R code. First, we choose two step sizes:\n\neps1 <- 1e-5 # slope\neps2 <- 1e-7 # delta method\n\ns <- slopes(mod, newdata = head(dat, 3), variables = \"bill_length_mm\", eps = eps1)\nprint(s[, 1:5], digits = 6)\n#> \n#> Term Estimate Std. Error z\n#> bill_length_mm 0.0179765 0.0086771 2.07172\n#> bill_length_mm 0.0359630 0.0126120 2.85150\n#> bill_length_mm 0.0849071 0.0213175 3.98298\n#> \n#> Columns: rowid, term, estimate, std.error, statistic\n\nWe can get the same estimates manually with these steps:\n\nlinkinv <- mod$family$linkinv\n\n## increment the variable of interest by h\ndat_hi <- transform(dat, bill_length_mm = bill_length_mm + eps1)\n\n## model matrices: first 3 rows\nmm_lo <- insight::get_modelmatrix(mod, data = dat)[1:3,]\nmm_hi <- insight::get_modelmatrix(mod, data = dat_hi)[1:3,]\n\n## predictions\np_lo <- linkinv(mm_lo %*% coef(mod))\np_hi <- linkinv(mm_hi %*% coef(mod))\n\n## slopes\n(p_hi - p_lo) / eps1\n#> [,1]\n#> 1 0.01797653\n#> 2 0.03596304\n#> 3 0.08490712\n\nTo get standard errors, we build a jacobian matrix where each column holds derivatives of the vector valued slope function, with respect to each of the coefficients. Using the same example:\n\nb_lo <- b_hi <- coef(mod)\nb_hi[1] <- b_hi[1] + eps2\n\ndydx_lo <- (linkinv(mm_hi %*% b_lo) - linkinv(mm_lo %*% b_lo)) / eps1\ndydx_hi <- (linkinv(mm_hi %*% b_hi) - linkinv(mm_lo %*% b_hi)) / eps1\n(dydx_hi - dydx_lo) / eps2\n#> [,1]\n#> 1 0.01600109\n#> 2 0.02771394\n#> 3 0.02275957\n\nThis gives us the first column of \\(J\\), which we can recover in full from the marginaleffects object attribute:\n\nJ <- attr(s, \"jacobian\")\nJ\n#> (Intercept) bill_length_mm flipper_length_mm speciesChinstrap speciesGentoo bill_length_mm:flipper_length_mm\n#> [1,] 0.01600109 0.6775622 2.897231 0 0 122.6916\n#> [2,] 0.02771394 1.1957935 5.153128 0 0 222.4993\n#> [3,] 0.02275957 1.1500800 4.440004 0 0 224.0828\n\nTo build the full matrix, we would simply iterate through the coefficients, incrementing them one after the other. Finally, we get standard errors via:\n\nsqrt(diag(J %*% vcov(mod) %*% t(J)))\n#> [1] 0.008677096 0.012611983 0.021317511\n\nWhich corresponds to our original standard errors:\n\nprint(s[, 1:5], digits = 7)\n#> \n#> Term Estimate Std. Error z\n#> bill_length_mm 0.01797653 0.008677096 2.071722\n#> bill_length_mm 0.03596304 0.012611983 2.851498\n#> bill_length_mm 0.08490712 0.021317511 3.982975\n#> \n#> Columns: rowid, term, estimate, std.error, statistic\n\nReverting to default settings:\n\noptions(marginaleffects_numDeriv = NULL)\n\nNote that our default results for this model are very similar – but not exactly identical – to those generated by the margins. As should be expected, the results in margins are also very sensitive to the value of eps for this model:\n\nlibrary(margins)\nmargins(mod, variables = \"bill_length_mm\", data = head(dat, 3), unit_ses = TRUE)$SE_dydx_bill_length_mm\n#> [1] 0.008727977 0.012567079 0.021293275\n\nmargins(mod, variables = \"bill_length_mm\", data = head(dat, 3), eps = 1e-4, unit_ses = TRUE)$SE_dydx_bill_length_mm\n#> [1] 0.2269512 0.2255849 0.6636208\n\nmargins(mod, variables = \"bill_length_mm\", data = head(dat, 3), eps = 1e-5, unit_ses = TRUE)$SE_dydx_bill_length_mm\n#> [1] 0.02317078 0.02928267 0.05480282"
},
{
"objectID": "articles/uncertainty.html#bayesian-estimates-and-credible-intervals",
@@ -977,7 +984,7 @@
"href": "articles/supported_models.html",
"title": "\n30 Supported Models\n",
"section": "",
- "text": "This table shows the list of 87 supported model types. There are three main alternative software packages to compute such slopes (1) Stata’s margins command, (2) R’s margins::margins() function, and (3) R’s emmeans::emtrends() function. The test suite hosted on Github compares the numerical equivalence of results produced by marginaleffects::slopes() to those produced by all 3 alternative software packages:\n\n✓: a green check means that the results of at least one model are equal to a reasonable tolerance.\n✖: a red cross means that the results are not identical; extra caution is warranted.\nU: a grey U means that computing slopes for a model type is unsupported by alternative packages, but supported by marginaleffects.\nAn empty cell means means that no comparison has been made yet.\n\nI am eager to add support for new models. Feel free to file a request or submit code on Github.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNumerical equivalence\n\n\n\n\nSupported by marginaleffects\n\n\nStata\n\n\nmargins\n\n\nemtrends\n\n\n\nPackage\nFunction\ndY/dX\nSE\ndY/dX\nSE\ndY/dX\nSE\n\n\n\n\nstats\nlm\n✓\n✓\n✓\n✓\n✓\n✓\n\n\n\nglm\n✓\n✓\n✓\n✓\n✓\n✓\n\n\n\nnls\n\n\n\n\n\n\n\n\n\nloess\n\n\n✓\n\nU\nU\n\n\nAER\nivreg\n✓\n✓\n✓\n✓\nU\nU\n\n\n\ntobit\n✓\n✓\nU\nU\n✓\n✓\n\n\nafex\nafex_aov\n\n\nU\nU\n✓\n✓\n\n\naod\nbetabin\n\n\nU\nU\nU\nU\n\n\nbetareg\nbetareg\n✓\n✓\n✓\n✓\n✓\n✓\n\n\nbife\nbife\n\n\nU\nU\nU\nU\n\n\nbiglm\nbiglm\n\n\nU\nU\nU\nU\n\n\n\nbigglm\n\n\nU\nU\nU\nU\n\n\nblme\nblmer\n\n\n\n\n\n\n\n\n\nbglmer\n\n\n\n\n\n\n\n\nbrglm2\nbracl\n\n\nU\nU\nU\nU\n\n\n\nbrglmFit\n\n\n✓\n✓\n✓\n✓\n\n\n\nbrnb\n\n\n✓\n✓\nU\nU\n\n\n\nbrmultinom\n\n\nU\nU\nU\nU\n\n\nbrms\nbrm\n\n\nU\nU\n✓\n✓\n\n\ncrch\ncrch\n\n\nU\nU\nU\nU\n\n\n\nhxlr\n\n\nU\nU\nU\nU\n\n\nDCchoice\noohbchoice\n\n\n\n\n\n\n\n\nestimatr\nlm_lin\n\n\n\n\n\n\n\n\n\nlm_robust\n✓\n✓\n✓\nU\n✓\n✓\n\n\n\niv_robust\n✓\n✓\nU\nU\nU\nU\n\n\nfixest\nfeols\n✓\n✓\nU\nU\nU\nU\n\n\n\nfeglm\n\n\nU\nU\nU\nU\n\n\n\nfenegbin\n\n\nU\nU\nU\nU\n\n\n\nfepois\n✓\n✓\nU\nU\nU\nU\n\n\ngam\ngam\n\n\nU\nU\n✓\n✓\n\n\ngamlss\ngamlss\n\n\nU\nU\n✓\n✓\n\n\ngeepack\ngeeglm\n\n\nU\nU\n✓\n✓\n\n\nglmmTMB\nglmmTMB\n\n\nU\nU\n✓\n✓\n\n\nglmx\nglmx\n\n\n✓\nU\nU\nU\n\n\nivreg\nivreg\n✓\n✓\n✓\n✓\nU\nU\n\n\nlme4\nlmer\n✓\n✓\n✓\n✓\n✓\n✓\n\n\n\nglmer\n✓\n✓\n✓\n✓\n✓\n✓\n\n\n\nglmer.nb\n\n\n✓\n✓\n✓\n✓\n\n\nlmerTest\nlmer\n\n\n✓\n✓\n✓\n✓\n\n\nlogistf\nlogistf\n\n\n\n\n\n\n\n\n\nflic\n\n\n\n\n\n\n\n\n\nflac\n\n\n\n\n\n\n\n\nMASS\nglmmPQL\n\n\nU\nU\n✓\n✓\n\n\n\nglm.nb\n✓\n✓\n✓\n✓\n✓\n✓\n\n\n\npolr\n✓\n✓\n✖\n✖\n✓\n✓\n\n\n\nrlm\n\n\n✓\n✓\n✓\n✓\n\n\nmclogit\nmblogit\n\n\nU\nU\nU\nU\n\n\n\nmclogit\n\n\nU\nU\nU\nU\n\n\nMCMCglmm\nMCMCglmm\nU\nU\nU\nU\nU\nU\n\n\nmgcv\ngam\n\n\nU\nU\n✓\n✓\n\n\n\nbam\n\n\nU\nU\n✓\n✖\n\n\nmhurdle\nmhurdle\n\n\n✓\n✓\nU\nU\n\n\nmlogit\nmlogit\n\n\nU\nU\nU\nU\n\n\nnlme\ngls\n\n\nU\nU\n✓\n✓\n\n\n\nlme\n\n\n\n\n\n\n\n\nnnet\nmultinom\n✓\n✓\nU\nU\nU\nU\n\n\nordbetareg\nordbetareg\n\n\nU\nU\n\n\n\n\nordinal\nclm\n✓\n✓\nU\nU\nU\nU\n\n\nplm\nplm\n✓\n✓\n✓\n✓\nU\nU\n\n\nphylolm\nphylolm\n\n\n\n\n\n\n\n\n\nphyloglm\n\n\n\n\n\n\n\n\npscl\nhurdle\n\n\n✓\nU\n✓\n✖\n\n\n\nhurdle\n\n\n✓\nU\n✓\n✖\n\n\n\nzeroinfl\n✓\n✓\n✓\nU\n✓\n✓\n\n\nquantreg\nrq\n✓\n✓\nU\nU\n✓\n✓\n\n\nRchoice\nhetprob\n\n\n\n\n\n\n\n\n\nivpml\n\n\n\n\n\n\n\n\nrms\nols\n\n\n\n\n\n\n\n\n\nlrm\n\n\n\n\n\n\n\n\n\norm\n\n\n\n\n\n\n\n\nrobust\nlmRob\n\n\nU\nU\nU\nU\n\n\nrobustbase\nglmrob\n\n\n✓\n✓\nU\nU\n\n\n\nlmrob\n\n\n✓\n✓\nU\nU\n\n\nrobustlmm\nrlmer\n\n\nU\nU\n\n\n\n\nrstanarm\nstan_glm\n\n\n✖\nU\n✓\n✓\n\n\nsampleSelection\nselection\n\n\nU\nU\nU\nU\n\n\n\nheckit\n\n\nU\nU\nU\nU\n\n\nscam\nscam\n\n\nU\nU\nU\nU\n\n\nspeedglm\nspeedglm\n✓\n✓\n✓\n✓\nU\nU\n\n\n\nspeedlm\n✓\n✓\n✓\n✓\nU\nU\n\n\nsurvey\nsvyglm\n\n\n✓\n✓\n✓\n✓\n\n\n\nsvyolr\n\n\n\n\n\n\n\n\nsurvival\nclogit\n\n\n\n\n\n\n\n\n\ncoxph\n✓\n✓\nU\nU\n✓\n✓\n\n\n\nsurvreg\n\n\n\n\n\n\n\n\ntobit1\ntobit1\n\n\n✓\n✓\nU\nU\n\n\ntruncreg\ntruncreg\n✓\n✓\n✓\n✓\nU\nU"
+ "text": "This table shows the list of 88 supported model types. There are three main alternative software packages to compute such slopes (1) Stata’s margins command, (2) R’s margins::margins() function, and (3) R’s emmeans::emtrends() function. The test suite hosted on Github compares the numerical equivalence of results produced by marginaleffects::slopes() to those produced by all 3 alternative software packages:\n\n✓: a green check means that the results of at least one model are equal to a reasonable tolerance.\n✖: a red cross means that the results are not identical; extra caution is warranted.\nU: a grey U means that computing slopes for a model type is unsupported by alternative packages, but supported by marginaleffects.\nAn empty cell means means that no comparison has been made yet.\n\nI am eager to add support for new models. Feel free to file a request or submit code on Github.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNumerical equivalence\n\n\n\n\nSupported by marginaleffects\n\n\nStata\n\n\nmargins\n\n\nemtrends\n\n\n\nPackage\nFunction\ndY/dX\nSE\ndY/dX\nSE\ndY/dX\nSE\n\n\n\n\nstats\nlm\n✓\n✓\n✓\n✓\n✓\n✓\n\n\n\nglm\n✓\n✓\n✓\n✓\n✓\n✓\n\n\n\nnls\n\n\n\n\n\n\n\n\n\nloess\n\n\n✓\n\nU\nU\n\n\nAER\nivreg\n✓\n✓\n✓\n✓\nU\nU\n\n\n\ntobit\n✓\n✓\nU\nU\n✓\n✓\n\n\nafex\nafex_aov\n\n\nU\nU\n✓\n✓\n\n\naod\nbetabin\n\n\nU\nU\nU\nU\n\n\nbetareg\nbetareg\n✓\n✓\n✓\n✓\n✓\n✓\n\n\nbife\nbife\n\n\nU\nU\nU\nU\n\n\nbiglm\nbiglm\n\n\nU\nU\nU\nU\n\n\n\nbigglm\n\n\nU\nU\nU\nU\n\n\nblme\nblmer\n\n\n\n\n\n\n\n\n\nbglmer\n\n\n\n\n\n\n\n\nbrglm2\nbracl\n\n\nU\nU\nU\nU\n\n\n\nbrglmFit\n\n\n✓\n✓\n✓\n✓\n\n\n\nbrnb\n\n\n✓\n✓\nU\nU\n\n\n\nbrmultinom\n\n\nU\nU\nU\nU\n\n\nbrms\nbrm\n\n\nU\nU\n✓\n✓\n\n\ncrch\ncrch\n\n\nU\nU\nU\nU\n\n\n\nhxlr\n\n\nU\nU\nU\nU\n\n\nDCchoice\noohbchoice\n\n\n\n\n\n\n\n\nestimatr\nlm_lin\n\n\n\n\n\n\n\n\n\nlm_robust\n✓\n✓\n✓\nU\n✓\n✓\n\n\n\niv_robust\n✓\n✓\nU\nU\nU\nU\n\n\nfixest\nfeols\n✓\n✓\nU\nU\nU\nU\n\n\n\nfeglm\n\n\nU\nU\nU\nU\n\n\n\nfenegbin\n\n\nU\nU\nU\nU\n\n\n\nfepois\n✓\n✓\nU\nU\nU\nU\n\n\ngam\ngam\n\n\nU\nU\n✓\n✓\n\n\ngamlss\ngamlss\n\n\nU\nU\n✓\n✓\n\n\ngeepack\ngeeglm\n\n\nU\nU\n✓\n✓\n\n\nglmmTMB\nglmmTMB\n\n\nU\nU\n✓\n✓\n\n\nglmx\nglmx\n\n\n✓\nU\nU\nU\n\n\nivreg\nivreg\n✓\n✓\n✓\n✓\nU\nU\n\n\nmlr3\nLearner\n\n\n\n\n\n\n\n\nlme4\nlmer\n✓\n✓\n✓\n✓\n✓\n✓\n\n\n\nglmer\n✓\n✓\n✓\n✓\n✓\n✓\n\n\n\nglmer.nb\n\n\n✓\n✓\n✓\n✓\n\n\nlmerTest\nlmer\n\n\n✓\n✓\n✓\n✓\n\n\nlogistf\nlogistf\n\n\n\n\n\n\n\n\n\nflic\n\n\n\n\n\n\n\n\n\nflac\n\n\n\n\n\n\n\n\nMASS\nglmmPQL\n\n\nU\nU\n✓\n✓\n\n\n\nglm.nb\n✓\n✓\n✓\n✓\n✓\n✓\n\n\n\npolr\n✓\n✓\n✖\n✖\n✓\n✓\n\n\n\nrlm\n\n\n✓\n✓\n✓\n✓\n\n\nmclogit\nmblogit\n\n\nU\nU\nU\nU\n\n\n\nmclogit\n\n\nU\nU\nU\nU\n\n\nMCMCglmm\nMCMCglmm\nU\nU\nU\nU\nU\nU\n\n\nmgcv\ngam\n\n\nU\nU\n✓\n✓\n\n\n\nbam\n\n\nU\nU\n✓\n✖\n\n\nmhurdle\nmhurdle\n\n\n✓\n✓\nU\nU\n\n\nmlogit\nmlogit\n\n\nU\nU\nU\nU\n\n\nnlme\ngls\n\n\nU\nU\n✓\n✓\n\n\n\nlme\n\n\n\n\n\n\n\n\nnnet\nmultinom\n✓\n✓\nU\nU\nU\nU\n\n\nordbetareg\nordbetareg\n\n\nU\nU\n\n\n\n\nordinal\nclm\n✓\n✓\nU\nU\nU\nU\n\n\nplm\nplm\n✓\n✓\n✓\n✓\nU\nU\n\n\nphylolm\nphylolm\n\n\n\n\n\n\n\n\n\nphyloglm\n\n\n\n\n\n\n\n\npscl\nhurdle\n\n\n✓\nU\n✓\n✖\n\n\n\nhurdle\n\n\n✓\nU\n✓\n✖\n\n\n\nzeroinfl\n✓\n✓\n✓\nU\n✓\n✓\n\n\nquantreg\nrq\n✓\n✓\nU\nU\n✓\n✓\n\n\nRchoice\nhetprob\n\n\n\n\n\n\n\n\n\nivpml\n\n\n\n\n\n\n\n\nrms\nols\n\n\n\n\n\n\n\n\n\nlrm\n\n\n\n\n\n\n\n\n\norm\n\n\n\n\n\n\n\n\nrobust\nlmRob\n\n\nU\nU\nU\nU\n\n\nrobustbase\nglmrob\n\n\n✓\n✓\nU\nU\n\n\n\nlmrob\n\n\n✓\n✓\nU\nU\n\n\nrobustlmm\nrlmer\n\n\nU\nU\n\n\n\n\nrstanarm\nstan_glm\n\n\n✖\nU\n✓\n✓\n\n\nsampleSelection\nselection\n\n\nU\nU\nU\nU\n\n\n\nheckit\n\n\nU\nU\nU\nU\n\n\nscam\nscam\n\n\nU\nU\nU\nU\n\n\nspeedglm\nspeedglm\n✓\n✓\n✓\n✓\nU\nU\n\n\n\nspeedlm\n✓\n✓\n✓\n✓\nU\nU\n\n\nsurvey\nsvyglm\n\n\n✓\n✓\n✓\n✓\n\n\n\nsvyolr\n\n\n\n\n\n\n\n\nsurvival\nclogit\n\n\n\n\n\n\n\n\n\ncoxph\n✓\n✓\nU\nU\n✓\n✓\n\n\n\nsurvreg\n\n\n\n\n\n\n\n\ntobit1\ntobit1\n\n\n✓\n✓\nU\nU\n\n\ntruncreg\ntruncreg\n✓\n✓\n✓\n✓\nU\nU"
},
{
"objectID": "articles/tables.html#marginal-effects",
@@ -991,7 +998,7 @@
"href": "articles/tables.html#contrasts",
"title": "\n31 Tables\n",
"section": "\n31.2 Contrasts",
- "text": "31.2 Contrasts\nWhen using the comparisons() function (or the slopes() function with categorical variables), the output will include two columns to uniquely identify the quantities of interest: term and contrast.\n\ndat <- mtcars\ndat$gear <- as.factor(dat$gear)\nmod <- glm(vs ~ gear + mpg, data = dat, family = binomial)\n\ncmp <- comparisons(mod)\nget_estimates(cmp)\n#> # A tibble: 3 × 8\n#> term contrast estimate std.error statistic p.value conf.low conf.high\n#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n#> 1 gear mean(4) - mean(3) 0.0372 0.137 0.272 0.785 -0.230 0.305 \n#> 2 gear mean(5) - mean(3) -0.340 0.0988 -3.44 0.000588 -0.533 -0.146 \n#> 3 mpg mean(+1) 0.0608 0.0128 4.74 0.00000219 0.0356 0.0860\n\nWe can use the shape argument of the modelsummary function to structure the table properly:\n\nmodelsummary(cmp, shape = term + contrast ~ model)\n\n\n\n\n\n (1)\n\n\n\ngear\nmean(4) - mean(3)\n0.037\n\n\n\nmean(4) - mean(3)\n(0.137)\n\n\n\nmean(5) - mean(3)\n−0.340\n\n\n\nmean(5) - mean(3)\n(0.099)\n\n\nmpg\nmean(+1)\n0.061\n\n\n\nmean(+1)\n(0.013)\n\n\nNum.Obs.\n\n32\n\n\nAIC\n\n26.2\n\n\nBIC\n\n32.1\n\n\nLog.Lik.\n\n−9.101\n\n\nF\n\n2.389\n\n\nRMSE\n\n0.31\n\n\n\n\n\nCross-contrasts can be a bit trickier, since there are multiple simultaneous groups. Consider this example:\n\nmod <- lm(mpg ~ factor(cyl) + factor(gear), data = mtcars)\ncmp <- comparisons(\n mod,\n variables = c(\"gear\", \"cyl\"),\n cross = TRUE)\nget_estimates(cmp)\n#> # A tibble: 4 × 9\n#> term contrast_cyl contrast_gear estimate std.error statistic p.value conf.low conf.high\n#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n#> 1 cross mean(6) - mean(4) mean(4) - mean(3) -5.33 2.77 -1.93 0.0542 -10.8 0.0953 \n#> 2 cross mean(6) - mean(4) mean(5) - mean(3) -5.16 2.63 -1.96 0.0500 -10.3 0.000166\n#> 3 cross mean(8) - mean(4) mean(4) - mean(3) -9.22 3.62 -2.55 0.0108 -16.3 -2.13 \n#> 4 cross mean(8) - mean(4) mean(5) - mean(3) -9.04 3.19 -2.84 0.00453 -15.3 -2.80\n\nAs we can see above, there are two relevant grouping columns: contrast_gear and contrast_cyl. We can simply plug those names in the shape argument:\n\nmodelsummary(\n cmp,\n shape = contrast_gear + contrast_cyl ~ model)\n\n\n\ngear\ncyl\n (1)\n\n\n\nmean(4) - mean(3)\nmean(6) - mean(4)\n−5.332\n\n\nmean(4) - mean(3)\nmean(6) - mean(4)\n(2.769)\n\n\nmean(4) - mean(3)\nmean(8) - mean(4)\n−9.218\n\n\nmean(4) - mean(3)\nmean(8) - mean(4)\n(3.618)\n\n\nmean(5) - mean(3)\nmean(6) - mean(4)\n−5.156\n\n\nmean(5) - mean(3)\nmean(6) - mean(4)\n(2.631)\n\n\nmean(5) - mean(3)\nmean(8) - mean(4)\n−9.042\n\n\nmean(5) - mean(3)\nmean(8) - mean(4)\n(3.185)\n\n\nNum.Obs.\n\n32\n\n\nR2\n\n0.740\n\n\nR2 Adj.\n\n0.701\n\n\nAIC\n\n173.7\n\n\nBIC\n\n182.5\n\n\nLog.Lik.\n\n−80.838\n\n\nF\n\n19.190\n\n\nRMSE\n\n3.03"
+ "text": "31.2 Contrasts\nWhen using the comparisons() function (or the slopes() function with categorical variables), the output will include two columns to uniquely identify the quantities of interest: term and contrast.\n\ndat <- mtcars\ndat$gear <- as.factor(dat$gear)\nmod <- glm(vs ~ gear + mpg, data = dat, family = binomial)\n\ncmp <- comparisons(mod)\nget_estimates(cmp)\n#> # A tibble: 3 × 8\n#> term contrast estimate std.error statistic p.value conf.low conf.high\n#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n#> 1 gear mean(4) - mean(3) 0.0372 0.137 0.272 0.785 -0.230 0.305 \n#> 2 gear mean(5) - mean(3) -0.340 0.0988 -3.44 0.000588 -0.533 -0.146 \n#> 3 mpg mean(+1) 0.0608 0.0128 4.74 0.00000219 0.0356 0.0860\n\nWe can use the shape argument of the modelsummary function to structure the table properly:\n\nmodelsummary(cmp, shape = term + contrast ~ model)\n\n\n\n\n\n (1)\n\n\n\ngear\nmean(4) - mean(3)\n0.037\n\n\n\nmean(4) - mean(3)\n(0.137)\n\n\n\nmean(5) - mean(3)\n−0.340\n\n\n\nmean(5) - mean(3)\n(0.099)\n\n\nmpg\nmean(+1)\n0.061\n\n\n\nmean(+1)\n(0.013)\n\n\nNum.Obs.\n\n32\n\n\nAIC\n\n26.2\n\n\nBIC\n\n32.1\n\n\nLog.Lik.\n\n−9.101\n\n\nF\n\n2.389\n\n\nRMSE\n\n0.31\n\n\n\n\n\nCross-contrasts can be a bit trickier, since there are multiple simultaneous groups. Consider this example:\n\nmod <- lm(mpg ~ factor(cyl) + factor(gear), data = mtcars)\ncmp <- comparisons(\n mod,\n variables = c(\"gear\", \"cyl\"),\n cross = TRUE)\nget_estimates(cmp)\n#> # A tibble: 4 × 9\n#> term contrast_cyl contrast_gear estimate std.error statistic p.value conf.low conf.high\n#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n#> 1 cross mean(6) - mean(4) mean(4) - mean(3) -5.33 2.77 -1.93 0.0542 -10.8 0.0953 \n#> 2 cross mean(6) - mean(4) mean(5) - mean(3) -5.16 2.63 -1.96 0.0500 -10.3 0.000165\n#> 3 cross mean(8) - mean(4) mean(4) - mean(3) -9.22 3.62 -2.55 0.0108 -16.3 -2.13 \n#> 4 cross mean(8) - mean(4) mean(5) - mean(3) -9.04 3.19 -2.84 0.00453 -15.3 -2.80\n\nAs we can see above, there are two relevant grouping columns: contrast_gear and contrast_cyl. We can simply plug those names in the shape argument:\n\nmodelsummary(\n cmp,\n shape = contrast_gear + contrast_cyl ~ model)\n\n\n\ngear\ncyl\n (1)\n\n\n\nmean(4) - mean(3)\nmean(6) - mean(4)\n−5.332\n\n\nmean(4) - mean(3)\nmean(6) - mean(4)\n(2.769)\n\n\nmean(4) - mean(3)\nmean(8) - mean(4)\n−9.218\n\n\nmean(4) - mean(3)\nmean(8) - mean(4)\n(3.618)\n\n\nmean(5) - mean(3)\nmean(6) - mean(4)\n−5.156\n\n\nmean(5) - mean(3)\nmean(6) - mean(4)\n(2.631)\n\n\nmean(5) - mean(3)\nmean(8) - mean(4)\n−9.042\n\n\nmean(5) - mean(3)\nmean(8) - mean(4)\n(3.185)\n\n\nNum.Obs.\n\n32\n\n\nR2\n\n0.740\n\n\nR2 Adj.\n\n0.701\n\n\nAIC\n\n173.7\n\n\nBIC\n\n182.5\n\n\nLog.Lik.\n\n−80.838\n\n\nF\n\n19.190\n\n\nRMSE\n\n3.03"
},
{
"objectID": "articles/tables.html#marginal-means",
@@ -1271,7 +1278,7 @@
"href": "articles/reference/predictions.html#prediction-types",
"title": "predictions",
"section": "Prediction types",
- "text": "Prediction types\n\nThe type argument determines the scale of the predictions used to compute quantities of interest with functions from the marginaleffects package. Admissible values for type depend on the model object. When users specify an incorrect value for type, marginaleffects will raise an informative error with a list of valid type values for the specific model object. The first entry in the list in that error message is the default type.\n\n\nThe invlink(link) is a special type defined by marginaleffects. It is available for some (but not all) models and functions. With this link type, we first compute predictions on the link scale, then we use the inverse link function to backtransform the predictions to the response scale. This is useful for models with non-linear link functions as it can ensure that confidence intervals stay within desirable bounds, ex: 0 to 1 for a logit model. Note that an average of estimates with type=“invlink(link)” will not always be equivalent to the average of estimates with type=“response”.\n\n\nSome of the most common type values are:\n\n\nresponse, link, E, Ep, average, class, conditional, count, cum.prob, cumprob, density, disp, expected, expvalue, fitted, invlink(link), latent, linear.predictor, linpred, location, lp, mean, p, pr, precision, prediction, prob, probability, probs, quantile, risk, scale, survival, unconditional, utility, variance, xb, zero, zlink, zprob"
+ "text": "Prediction types\n\nThe type argument determines the scale of the predictions used to compute quantities of interest with functions from the marginaleffects package. Admissible values for type depend on the model object. When users specify an incorrect value for type, marginaleffects will raise an informative error with a list of valid type values for the specific model object. The first entry in the list in that error message is the default type.\n\n\nThe invlink(link) is a special type defined by marginaleffects. It is available for some (but not all) models and functions. With this link type, we first compute predictions on the link scale, then we use the inverse link function to backtransform the predictions to the response scale. This is useful for models with non-linear link functions as it can ensure that confidence intervals stay within desirable bounds, ex: 0 to 1 for a logit model. Note that an average of estimates with type=“invlink(link)” will not always be equivalent to the average of estimates with type=“response”.\n\n\nSome of the most common type values are:\n\n\nresponse, link, E, Ep, average, class, conditional, count, cum.prob, cumprob, density, disp, expected, expvalue, fitted, invlink(link), latent, linear.predictor, linpred, location, lp, mean, numeric, p, pr, precision, prediction, prob, probability, probs, quantile, risk, scale, survival, unconditional, utility, variance, xb, zero, zlink, zprob"
},
{
"objectID": "articles/reference/predictions.html#references",
@@ -1362,7 +1369,7 @@
"href": "articles/reference/comparisons.html#prediction-types",
"title": "comparisons",
"section": "Prediction types",
- "text": "Prediction types\n\nThe type argument determines the scale of the predictions used to compute quantities of interest with functions from the marginaleffects package. Admissible values for type depend on the model object. When users specify an incorrect value for type, marginaleffects will raise an informative error with a list of valid type values for the specific model object. The first entry in the list in that error message is the default type.\n\n\nThe invlink(link) is a special type defined by marginaleffects. It is available for some (but not all) models and functions. With this link type, we first compute predictions on the link scale, then we use the inverse link function to backtransform the predictions to the response scale. This is useful for models with non-linear link functions as it can ensure that confidence intervals stay within desirable bounds, ex: 0 to 1 for a logit model. Note that an average of estimates with type=“invlink(link)” will not always be equivalent to the average of estimates with type=“response”.\n\n\nSome of the most common type values are:\n\n\nresponse, link, E, Ep, average, class, conditional, count, cum.prob, cumprob, density, disp, expected, expvalue, fitted, invlink(link), latent, linear.predictor, linpred, location, lp, mean, p, pr, precision, prediction, prob, probability, probs, quantile, risk, scale, survival, unconditional, utility, variance, xb, zero, zlink, zprob"
+ "text": "Prediction types\n\nThe type argument determines the scale of the predictions used to compute quantities of interest with functions from the marginaleffects package. Admissible values for type depend on the model object. When users specify an incorrect value for type, marginaleffects will raise an informative error with a list of valid type values for the specific model object. The first entry in the list in that error message is the default type.\n\n\nThe invlink(link) is a special type defined by marginaleffects. It is available for some (but not all) models and functions. With this link type, we first compute predictions on the link scale, then we use the inverse link function to backtransform the predictions to the response scale. This is useful for models with non-linear link functions as it can ensure that confidence intervals stay within desirable bounds, ex: 0 to 1 for a logit model. Note that an average of estimates with type=“invlink(link)” will not always be equivalent to the average of estimates with type=“response”.\n\n\nSome of the most common type values are:\n\n\nresponse, link, E, Ep, average, class, conditional, count, cum.prob, cumprob, density, disp, expected, expvalue, fitted, invlink(link), latent, linear.predictor, linpred, location, lp, mean, numeric, p, pr, precision, prediction, prob, probability, probs, quantile, risk, scale, survival, unconditional, utility, variance, xb, zero, zlink, zprob"
},
{
"objectID": "articles/reference/comparisons.html#references",
@@ -1376,7 +1383,7 @@
"href": "articles/reference/comparisons.html#examples",
"title": "comparisons",
"section": "Examples",
- "text": "Examples\n\nlibrary(marginaleffects)\n\nlibrary(marginaleffects)\n\n# Linear model\ntmp <- mtcars\ntmp$am <- as.logical(tmp$am)\nmod <- lm(mpg ~ am + factor(cyl), tmp)\navg_comparisons(mod, variables = list(cyl = \"reference\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n cyl 6 - 4 -6.16 1.54 -4.01 <0.001 14.0 -9.17 -3.15\n cyl 8 - 4 -10.07 1.45 -6.93 <0.001 37.8 -12.91 -7.22\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(cyl = \"sequential\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n cyl 6 - 4 -6.16 1.54 -4.01 < 0.001 14.0 -9.17 -3.15\n cyl 8 - 6 -3.91 1.47 -2.66 0.00781 7.0 -6.79 -1.03\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(cyl = \"pairwise\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n cyl 6 - 4 -6.16 1.54 -4.01 < 0.001 14.0 -9.17 -3.15\n cyl 8 - 4 -10.07 1.45 -6.93 < 0.001 37.8 -12.91 -7.22\n cyl 8 - 6 -3.91 1.47 -2.66 0.00781 7.0 -6.79 -1.03\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# GLM with different scale types\nmod <- glm(am ~ factor(gear), data = mtcars)\navg_comparisons(mod, type = \"response\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n gear 4 - 3 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 5 - 3 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, type = \"link\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n gear 4 - 3 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 5 - 3 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: link \n\n# Contrasts at the mean\ncomparisons(mod, newdata = \"mean\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % gear\n gear 4 - 3 0.667 0.117 5.68 <0.001 26.1 0.436 0.897 3\n gear 5 - 3 1.000 0.157 6.39 <0.001 32.5 0.693 1.307 3\n\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, am, gear \nType: response \n\n# Contrasts between marginal means\ncomparisons(mod, newdata = \"marginalmeans\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n gear 4 - 3 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 5 - 3 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\n# Contrasts at user-specified values\ncomparisons(mod, newdata = datagrid(am = 0, gear = tmp$gear))\n\n\n Term Contrast gear Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n gear 4 - 3 3 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 4 - 3 4 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 4 - 3 5 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 5 - 3 3 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n gear 5 - 3 4 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n gear 5 - 3 5 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, am, gear, predicted_lo, predicted_hi, predicted \nType: response \n\ncomparisons(mod, newdata = datagrid(am = unique, gear = max))\n\n\n Term Contrast gear Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n gear 4 - 3 5 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 4 - 3 5 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 5 - 3 5 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n gear 5 - 3 5 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, am, gear, predicted_lo, predicted_hi, predicted \nType: response \n\nm <- lm(mpg ~ hp + drat + factor(cyl) + factor(am), data = mtcars)\ncomparisons(m, variables = \"hp\", newdata = datagrid(FUN_factor = unique, FUN_numeric = median))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % hp drat\n hp +1 -0.0452 0.0149 -3.04 0.00239 8.7 -0.0744 -0.016 123 3.7\n hp +1 -0.0452 0.0149 -3.04 0.00239 8.7 -0.0744 -0.016 123 3.7\n hp +1 -0.0452 0.0149 -3.04 0.00239 8.7 -0.0744 -0.016 123 3.7\n hp +1 -0.0452 0.0149 -3.04 0.00239 8.7 -0.0744 -0.016 123 3.7\n hp +1 -0.0452 0.0149 -3.04 0.00239 8.7 -0.0744 -0.016 123 3.7\n hp +1 -0.0452 0.0149 -3.04 0.00239 8.7 -0.0744 -0.016 123 3.7\n cyl am\n 6 1\n 6 0\n 4 1\n 4 0\n 8 1\n 8 0\n\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, hp, drat, cyl, am \nType: response \n\n# Numeric contrasts\nmod <- lm(mpg ~ hp, data = mtcars)\navg_comparisons(mod, variables = list(hp = 1))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp +1 -0.0682 0.0101 -6.74 <0.001 35.9 -0.0881 -0.0484\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(hp = 5))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp +5 -0.341 0.0506 -6.74 <0.001 35.9 -0.44 -0.242\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(hp = c(90, 100)))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 100 - 90 -0.682 0.101 -6.74 <0.001 35.9 -0.881 -0.484\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(hp = \"iqr\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp Q3 - Q1 -5.7 0.845 -6.74 <0.001 35.9 -7.35 -4.04\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(hp = \"sd\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 %\n hp (x + sd/2) - (x - sd/2) -4.68 0.694 -6.74 <0.001 35.9 -6.04\n 97.5 %\n -3.32\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(hp = \"minmax\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp Max - Min -19.3 2.86 -6.74 <0.001 35.9 -24.9 -13.7\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# using a function to specify a custom difference in one regressor\ndat <- mtcars\ndat$new_hp <- 49 * (dat$hp - min(dat$hp)) / (max(dat$hp) - min(dat$hp)) + 1\nmodlog <- lm(mpg ~ log(new_hp) + factor(cyl), data = dat)\nfdiff <- \\(x) data.frame(x, x + 10)\navg_comparisons(modlog, variables = list(new_hp = fdiff))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n new_hp custom -1.97 0.711 -2.78 0.00547 7.5 -3.37 -0.581\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# Adjusted Risk Ratio: see the contrasts vignette\nmod <- glm(vs ~ mpg, data = mtcars, family = binomial)\navg_comparisons(mod, comparison = \"lnratioavg\", transform = exp)\n\n\n Term Contrast Estimate Pr(>|z|) S 2.5 % 97.5 %\n mpg mean(+1) 1.14 <0.001 31.9 1.09 1.18\n\nColumns: term, contrast, estimate, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\n# Adjusted Risk Ratio: Manual specification of the `comparison`\navg_comparisons(\n mod,\n comparison = function(hi, lo) log(mean(hi) / mean(lo)),\n transform = exp)\n\n\n Term Contrast Estimate Pr(>|z|) S 2.5 % 97.5 %\n mpg +1 1.14 <0.001 31.9 1.09 1.18\n\nColumns: term, contrast, estimate, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\n# cross contrasts\nmod <- lm(mpg ~ factor(cyl) * factor(gear) + hp, data = mtcars)\navg_comparisons(mod, variables = c(\"cyl\", \"gear\"), cross = TRUE)\n\n\n Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % C: cyl C: gear\n -0.631 3.40 -0.185 0.853 0.2 -7.30 6.04 6 - 4 4 - 3\n 2.678 4.62 0.580 0.562 0.8 -6.37 11.73 6 - 4 5 - 3\n 3.348 6.43 0.521 0.602 0.7 -9.25 15.95 8 - 4 4 - 3\n 5.525 5.87 0.942 0.346 1.5 -5.98 17.03 8 - 4 5 - 3\n\nColumns: term, contrast_cyl, contrast_gear, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# variable-specific contrasts\navg_comparisons(mod, variables = list(gear = \"sequential\", hp = 10))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n gear 4 - 3 3.409 2.587 1.318 0.1876 2.4 -1.66 8.481\n gear 5 - 4 2.628 2.747 0.957 0.3387 1.6 -2.76 8.011\n hp +10 -0.574 0.225 -2.552 0.0107 6.5 -1.02 -0.133\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# hypothesis test: is the `hp` marginal effect at the mean equal to the `drat` marginal effect\nmod <- lm(mpg ~ wt + drat, data = mtcars)\n\ncomparisons(\n mod,\n newdata = \"mean\",\n hypothesis = \"wt = drat\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n wt=drat -6.23 1.05 -5.92 <0.001 28.2 -8.29 -4.16\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# same hypothesis test using row indices\ncomparisons(\n mod,\n newdata = \"mean\",\n hypothesis = \"b1 - b2 = 0\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n b1-b2=0 6.23 1.05 5.92 <0.001 28.2 4.16 8.29\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# same hypothesis test using numeric vector of weights\ncomparisons(\n mod,\n newdata = \"mean\",\n hypothesis = c(1, -1))\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n custom 6.23 1.05 5.92 <0.001 28.2 4.16 8.29\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# two custom contrasts using a matrix of weights\nlc <- matrix(c(\n 1, -1,\n 2, 3),\n ncol = 2)\ncomparisons(\n mod,\n newdata = \"mean\",\n hypothesis = lc)\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n custom 6.23 1.05 5.92 <0.001 28.2 4.16 8.29\n custom -11.46 4.92 -2.33 0.0197 5.7 -21.10 -1.83\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# Effect of a 1 group-wise standard deviation change\n# First we calculate the SD in each group of `cyl`\n# Second, we use that SD as the treatment size in the `variables` argument\nlibrary(dplyr)\nmod <- lm(mpg ~ hp + factor(cyl), mtcars)\ntmp <- mtcars %>%\n group_by(cyl) %>%\n mutate(hp_sd = sd(hp))\navg_comparisons(mod, variables = list(hp = tmp$hp_sd), by = \"cyl\")\n\n\n Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp custom 4 -0.678 0.435 -1.56 0.119 3.1 -1.53 0.174\n hp custom 6 -1.122 0.719 -1.56 0.119 3.1 -2.53 0.288\n hp custom 8 -0.818 0.525 -1.56 0.119 3.1 -1.85 0.210\n\nColumns: term, contrast, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# `by` argument\nmod <- lm(mpg ~ hp * am * vs, data = mtcars)\ncomparisons(mod, by = TRUE)\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n am 1 - 0 4.6980 1.0601 4.432 <0.001 16.7 2.620 6.7758\n hp +1 -0.0688 0.0182 -3.780 <0.001 12.6 -0.104 -0.0331\n vs 1 - 0 -0.2943 2.3379 -0.126 0.9 0.2 -4.877 4.2879\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\nmod <- lm(mpg ~ hp * am * vs, data = mtcars)\navg_comparisons(mod, variables = \"hp\", by = c(\"vs\", \"am\"))\n\n\n Term Contrast vs am Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp mean(+1) 0 0 -0.0422 0.0248 -1.70 0.08879 3.5 -0.0907 0.00639\n hp mean(+1) 0 1 -0.0368 0.0124 -2.97 0.00297 8.4 -0.0612 -0.01254\n hp mean(+1) 1 0 -0.0994 0.0534 -1.86 0.06289 4.0 -0.2042 0.00534\n hp mean(+1) 1 1 -0.1112 0.0463 -2.40 0.01645 5.9 -0.2020 -0.02033\n\nColumns: term, contrast, vs, am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\nlibrary(nnet)\nmod <- multinom(factor(gear) ~ mpg + am * vs, data = mtcars, trace = FALSE)\nby <- data.frame(\n group = c(\"3\", \"4\", \"5\"),\n by = c(\"3,4\", \"3,4\", \"5\"))\ncomparisons(mod, type = \"probs\", by = by)\n\n\n Term By Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n am 3,4 -0.222793 0.07959 -2.80 0.00512 7.6 -0.3788 -0.0668\n mpg 3,4 0.000463 0.00579 0.08 0.93624 0.1 -0.0109 0.0118\n vs 3,4 0.102102 0.07335 1.39 0.16394 2.6 -0.0417 0.2459\n am 5 0.445585 0.15918 2.80 0.00512 7.6 0.1336 0.7576\n mpg 5 -0.000927 0.01159 -0.08 0.93624 0.1 -0.0236 0.0218\n vs 5 -0.204204 0.14671 -1.39 0.16394 2.6 -0.4917 0.0833\n\nColumns: term, by, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: probs"
+ "text": "Examples\n\nlibrary(marginaleffects)\n\nlibrary(marginaleffects)\n\n# Linear model\ntmp <- mtcars\ntmp$am <- as.logical(tmp$am)\nmod <- lm(mpg ~ am + factor(cyl), tmp)\navg_comparisons(mod, variables = list(cyl = \"reference\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n cyl 6 - 4 -6.16 1.54 -4.01 <0.001 14.0 -9.17 -3.15\n cyl 8 - 4 -10.07 1.45 -6.93 <0.001 37.8 -12.91 -7.22\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(cyl = \"sequential\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n cyl 6 - 4 -6.16 1.54 -4.01 < 0.001 14.0 -9.17 -3.15\n cyl 8 - 6 -3.91 1.47 -2.66 0.00781 7.0 -6.79 -1.03\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(cyl = \"pairwise\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n cyl 6 - 4 -6.16 1.54 -4.01 < 0.001 14.0 -9.17 -3.15\n cyl 8 - 4 -10.07 1.45 -6.93 < 0.001 37.8 -12.91 -7.22\n cyl 8 - 6 -3.91 1.47 -2.66 0.00781 7.0 -6.79 -1.03\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# GLM with different scale types\nmod <- glm(am ~ factor(gear), data = mtcars)\navg_comparisons(mod, type = \"response\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n gear 4 - 3 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 5 - 3 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, type = \"link\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n gear 4 - 3 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 5 - 3 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: link \n\n# Contrasts at the mean\ncomparisons(mod, newdata = \"mean\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % gear\n gear 4 - 3 0.667 0.117 5.68 <0.001 26.1 0.436 0.897 3\n gear 5 - 3 1.000 0.157 6.39 <0.001 32.5 0.693 1.307 3\n\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, am, gear \nType: response \n\n# Contrasts between marginal means\ncomparisons(mod, newdata = \"marginalmeans\")\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n gear 4 - 3 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 5 - 3 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\n# Contrasts at user-specified values\ncomparisons(mod, newdata = datagrid(am = 0, gear = tmp$gear))\n\n\n Term Contrast gear Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n gear 4 - 3 3 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 4 - 3 4 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 4 - 3 5 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 5 - 3 3 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n gear 5 - 3 4 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n gear 5 - 3 5 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, am, gear, predicted_lo, predicted_hi, predicted \nType: response \n\ncomparisons(mod, newdata = datagrid(am = unique, gear = max))\n\n\n Term Contrast gear Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n gear 4 - 3 5 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 4 - 3 5 0.667 0.117 5.68 <0.001 26.1 0.436 0.897\n gear 5 - 3 5 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n gear 5 - 3 5 1.000 0.157 6.39 <0.001 32.5 0.693 1.307\n\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, am, gear, predicted_lo, predicted_hi, predicted \nType: response \n\nm <- lm(mpg ~ hp + drat + factor(cyl) + factor(am), data = mtcars)\ncomparisons(m, variables = \"hp\", newdata = datagrid(FUN_factor = unique, FUN_numeric = median))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % hp drat\n hp +1 -0.0452 0.0149 -3.04 0.00239 8.7 -0.0744 -0.016 123 3.7\n hp +1 -0.0452 0.0149 -3.04 0.00239 8.7 -0.0744 -0.016 123 3.7\n hp +1 -0.0452 0.0149 -3.04 0.00239 8.7 -0.0744 -0.016 123 3.7\n hp +1 -0.0452 0.0149 -3.04 0.00239 8.7 -0.0744 -0.016 123 3.7\n hp +1 -0.0452 0.0149 -3.04 0.00239 8.7 -0.0744 -0.016 123 3.7\n hp +1 -0.0452 0.0149 -3.04 0.00239 8.7 -0.0744 -0.016 123 3.7\n cyl am\n 6 1\n 6 0\n 4 1\n 4 0\n 8 1\n 8 0\n\nColumns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, mpg, hp, drat, cyl, am \nType: response \n\n# Numeric contrasts\nmod <- lm(mpg ~ hp, data = mtcars)\navg_comparisons(mod, variables = list(hp = 1))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp +1 -0.0682 0.0101 -6.74 <0.001 35.9 -0.0881 -0.0484\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(hp = 5))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp +5 -0.341 0.0506 -6.74 <0.001 35.9 -0.44 -0.242\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(hp = c(90, 100)))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 100 - 90 -0.682 0.101 -6.74 <0.001 35.9 -0.881 -0.484\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(hp = \"iqr\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp Q3 - Q1 -5.7 0.845 -6.74 <0.001 35.9 -7.35 -4.04\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(hp = \"sd\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 %\n hp (x + sd/2) - (x - sd/2) -4.68 0.694 -6.74 <0.001 35.9 -6.04\n 97.5 %\n -3.32\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\navg_comparisons(mod, variables = list(hp = \"minmax\"))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp Max - Min -19.3 2.86 -6.74 <0.001 35.9 -24.9 -13.7\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# using a function to specify a custom difference in one regressor\ndat <- mtcars\ndat$new_hp <- 49 * (dat$hp - min(dat$hp)) / (max(dat$hp) - min(dat$hp)) + 1\nmodlog <- lm(mpg ~ log(new_hp) + factor(cyl), data = dat)\nfdiff <- \\(x) data.frame(x, x + 10)\navg_comparisons(modlog, variables = list(new_hp = fdiff))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n new_hp custom -1.97 0.711 -2.78 0.00547 7.5 -3.37 -0.581\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# Adjusted Risk Ratio: see the contrasts vignette\nmod <- glm(vs ~ mpg, data = mtcars, family = binomial)\navg_comparisons(mod, comparison = \"lnratioavg\", transform = exp)\n\n\n Term Contrast Estimate Pr(>|z|) S 2.5 % 97.5 %\n mpg mean(+1) 1.14 <0.001 31.9 1.09 1.18\n\nColumns: term, contrast, estimate, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\n# Adjusted Risk Ratio: Manual specification of the `comparison`\navg_comparisons(\n mod,\n comparison = function(hi, lo) log(mean(hi) / mean(lo)),\n transform = exp)\n\n\n Term Contrast Estimate Pr(>|z|) S 2.5 % 97.5 %\n mpg +1 1.14 <0.001 31.9 1.09 1.18\n\nColumns: term, contrast, estimate, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\n# cross contrasts\nmod <- lm(mpg ~ factor(cyl) * factor(gear) + hp, data = mtcars)\navg_comparisons(mod, variables = c(\"cyl\", \"gear\"), cross = TRUE)\n\n\n Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % C: cyl C: gear\n -0.631 3.40 -0.185 0.853 0.2 -7.30 6.04 6 - 4 4 - 3\n 2.678 4.62 0.580 0.562 0.8 -6.37 11.73 6 - 4 5 - 3\n 3.348 6.43 0.521 0.602 0.7 -9.25 15.95 8 - 4 4 - 3\n 5.525 5.87 0.942 0.346 1.5 -5.98 17.03 8 - 4 5 - 3\n\nColumns: term, contrast_cyl, contrast_gear, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# variable-specific contrasts\navg_comparisons(mod, variables = list(gear = \"sequential\", hp = 10))\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n gear 4 - 3 3.409 2.587 1.318 0.1876 2.4 -1.66 8.481\n gear 5 - 4 2.628 2.747 0.957 0.3387 1.6 -2.76 8.011\n hp +10 -0.574 0.225 -2.552 0.0107 6.5 -1.02 -0.133\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# hypothesis test: is the `hp` marginal effect at the mean equal to the `drat` marginal effect\nmod <- lm(mpg ~ wt + drat, data = mtcars)\n\ncomparisons(\n mod,\n newdata = \"mean\",\n hypothesis = \"wt = drat\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n wt=drat -6.23 1.05 -5.92 <0.001 28.2 -8.29 -4.16\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# same hypothesis test using row indices\ncomparisons(\n mod,\n newdata = \"mean\",\n hypothesis = \"b1 - b2 = 0\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n b1-b2=0 6.23 1.05 5.92 <0.001 28.2 4.16 8.29\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# same hypothesis test using numeric vector of weights\ncomparisons(\n mod,\n newdata = \"mean\",\n hypothesis = c(1, -1))\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n custom 6.23 1.05 5.92 <0.001 28.2 4.16 8.29\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# two custom contrasts using a matrix of weights\nlc <- matrix(c(\n 1, -1,\n 2, 3),\n ncol = 2)\ncomparisons(\n mod,\n newdata = \"mean\",\n hypothesis = lc)\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n custom 6.23 1.05 5.92 <0.001 28.2 4.16 8.29\n custom -11.46 4.92 -2.33 0.0197 5.7 -21.10 -1.83\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# Effect of a 1 group-wise standard deviation change\n# First we calculate the SD in each group of `cyl`\n# Second, we use that SD as the treatment size in the `variables` argument\nlibrary(dplyr)\nmod <- lm(mpg ~ hp + factor(cyl), mtcars)\ntmp <- mtcars %>%\n group_by(cyl) %>%\n mutate(hp_sd = sd(hp))\navg_comparisons(mod, variables = list(hp = tmp$hp_sd), by = \"cyl\")\n\n\n Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp custom 4 -0.678 0.435 -1.56 0.119 3.1 -1.53 0.174\n hp custom 6 -1.122 0.719 -1.56 0.119 3.1 -2.53 0.288\n hp custom 8 -0.818 0.525 -1.56 0.119 3.1 -1.85 0.210\n\nColumns: term, contrast, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# `by` argument\nmod <- lm(mpg ~ hp * am * vs, data = mtcars)\ncomparisons(mod, by = TRUE)\n\n\n Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n am 1 - 0 4.6980 1.0601 4.432 <0.001 16.7 2.620 6.7758\n hp +1 -0.0688 0.0182 -3.780 <0.001 12.6 -0.104 -0.0331\n vs 1 - 0 -0.2943 2.3379 -0.126 0.9 0.2 -4.877 4.2879\n\nColumns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\nmod <- lm(mpg ~ hp * am * vs, data = mtcars)\navg_comparisons(mod, variables = \"hp\", by = c(\"vs\", \"am\"))\n\n\n Term Contrast vs am Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp mean(+1) 0 0 -0.0422 0.0248 -1.70 0.08879 3.5 -0.0907 0.00639\n hp mean(+1) 0 1 -0.0368 0.0124 -2.97 0.00297 8.4 -0.0612 -0.01254\n hp mean(+1) 1 0 -0.0994 0.0534 -1.86 0.06289 4.0 -0.2042 0.00534\n hp mean(+1) 1 1 -0.1112 0.0463 -2.40 0.01645 5.9 -0.2020 -0.02034\n\nColumns: term, contrast, vs, am, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\nlibrary(nnet)\nmod <- multinom(factor(gear) ~ mpg + am * vs, data = mtcars, trace = FALSE)\nby <- data.frame(\n group = c(\"3\", \"4\", \"5\"),\n by = c(\"3,4\", \"3,4\", \"5\"))\ncomparisons(mod, type = \"probs\", by = by)\n\n\n Term By Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n am 3,4 -0.222793 0.07959 -2.80 0.00512 7.6 -0.3788 -0.0668\n mpg 3,4 0.000463 0.00579 0.08 0.93624 0.1 -0.0109 0.0118\n vs 3,4 0.102102 0.07335 1.39 0.16394 2.6 -0.0417 0.2459\n am 5 0.445585 0.15918 2.80 0.00512 7.6 0.1336 0.7576\n mpg 5 -0.000927 0.01159 -0.08 0.93624 0.1 -0.0236 0.0218\n vs 5 -0.204204 0.14671 -1.39 0.16394 2.6 -0.4917 0.0833\n\nColumns: term, by, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: probs"
},
{
"objectID": "articles/reference/slopes.html#description",
@@ -1453,7 +1460,7 @@
"href": "articles/reference/slopes.html#prediction-types",
"title": "slopes",
"section": "Prediction types",
- "text": "Prediction types\n\nThe type argument determines the scale of the predictions used to compute quantities of interest with functions from the marginaleffects package. Admissible values for type depend on the model object. When users specify an incorrect value for type, marginaleffects will raise an informative error with a list of valid type values for the specific model object. The first entry in the list in that error message is the default type.\n\n\nThe invlink(link) is a special type defined by marginaleffects. It is available for some (but not all) models and functions. With this link type, we first compute predictions on the link scale, then we use the inverse link function to backtransform the predictions to the response scale. This is useful for models with non-linear link functions as it can ensure that confidence intervals stay within desirable bounds, ex: 0 to 1 for a logit model. Note that an average of estimates with type=“invlink(link)” will not always be equivalent to the average of estimates with type=“response”.\n\n\nSome of the most common type values are:\n\n\nresponse, link, E, Ep, average, class, conditional, count, cum.prob, cumprob, density, disp, expected, expvalue, fitted, invlink(link), latent, linear.predictor, linpred, location, lp, mean, p, pr, precision, prediction, prob, probability, probs, quantile, risk, scale, survival, unconditional, utility, variance, xb, zero, zlink, zprob"
+ "text": "Prediction types\n\nThe type argument determines the scale of the predictions used to compute quantities of interest with functions from the marginaleffects package. Admissible values for type depend on the model object. When users specify an incorrect value for type, marginaleffects will raise an informative error with a list of valid type values for the specific model object. The first entry in the list in that error message is the default type.\n\n\nThe invlink(link) is a special type defined by marginaleffects. It is available for some (but not all) models and functions. With this link type, we first compute predictions on the link scale, then we use the inverse link function to backtransform the predictions to the response scale. This is useful for models with non-linear link functions as it can ensure that confidence intervals stay within desirable bounds, ex: 0 to 1 for a logit model. Note that an average of estimates with type=“invlink(link)” will not always be equivalent to the average of estimates with type=“response”.\n\n\nSome of the most common type values are:\n\n\nresponse, link, E, Ep, average, class, conditional, count, cum.prob, cumprob, density, disp, expected, expvalue, fitted, invlink(link), latent, linear.predictor, linpred, location, lp, mean, numeric, p, pr, precision, prediction, prob, probability, probs, quantile, risk, scale, survival, unconditional, utility, variance, xb, zero, zlink, zprob"
},
{
"objectID": "articles/reference/slopes.html#references",
@@ -1467,7 +1474,7 @@
"href": "articles/reference/slopes.html#examples",
"title": "slopes",
"section": "Examples",
- "text": "Examples\n\nlibrary(marginaleffects)\n\n\n\n\n# Unit-level (conditional) Marginal Effects\nmod <- glm(am ~ hp * wt, data = mtcars, family = binomial)\nmfx <- slopes(mod)\nhead(mfx)\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 0.006983 0.005846 1.194 0.232 2.1 -0.004475 0.018442\n hp 0.016404 0.012293 1.334 0.182 2.5 -0.007689 0.040497\n hp 0.002828 0.003764 0.751 0.452 1.1 -0.004549 0.010206\n hp 0.001935 0.002442 0.792 0.428 1.2 -0.002851 0.006721\n hp 0.002993 0.003213 0.931 0.352 1.5 -0.003305 0.009291\n hp 0.000148 0.000321 0.459 0.646 0.6 -0.000482 0.000778\n\nColumns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, am, hp, wt \nType: response \n\n# Average Marginal Effect (AME)\navg_slopes(mod, by = TRUE)\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 0.00265 0.0021 1.26 0.2069 2.3 -0.00147 0.00677\n wt -0.43578 0.1436 -3.04 0.0024 8.7 -0.71716 -0.15440\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# Marginal Effect at the Mean (MEM)\nslopes(mod, newdata = datagrid())\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 0.00853 0.00824 1.03 0.301 1.7 -0.00763 0.0247\n wt -1.74453 1.55637 -1.12 0.262 1.9 -4.79496 1.3059\n\nColumns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, am, hp, wt \nType: response \n\n# Marginal Effect at User-Specified Values\n# Variables not explicitly included in `datagrid()` are held at their means\nslopes(mod, newdata = datagrid(hp = c(100, 110)))\n\n\n Term hp Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 100 0.00117 0.00171 0.684 0.494 1.0 -0.00218 0.00451\n hp 110 0.00190 0.00240 0.788 0.430 1.2 -0.00282 0.00661\n wt 100 -0.19468 0.29895 -0.651 0.515 1.0 -0.78060 0.39125\n wt 110 -0.33154 0.42909 -0.773 0.440 1.2 -1.17253 0.50946\n\nColumns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, hp, predicted_lo, predicted_hi, predicted, am, wt \nType: response \n\n# Group-Average Marginal Effects (G-AME)\n# Calculate marginal effects for each observation, and then take the average\n# marginal effect within each subset of observations with different observed\n# values for the `cyl` variable:\nmod2 <- lm(mpg ~ hp * cyl, data = mtcars)\navg_slopes(mod2, variables = \"hp\", by = \"cyl\")\n\n\n Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp mean(dY/dX) 4 -0.0917 0.0353 -2.596 0.00942 6.7 -0.1610 -0.0225\n hp mean(dY/dX) 6 -0.0523 0.0204 -2.561 0.01045 6.6 -0.0923 -0.0123\n hp mean(dY/dX) 8 -0.0128 0.0143 -0.891 0.37280 1.4 -0.0409 0.0153\n\nColumns: term, contrast, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\n# Marginal Effects at User-Specified Values (counterfactual)\n# Variables not explicitly included in `datagrid()` are held at their\n# original values, and the whole dataset is duplicated once for each\n# combination of the values in `datagrid()`\nmfx <- slopes(mod,\n newdata = datagrid(hp = c(100, 110),\n grid_type = \"counterfactual\"))\nhead(mfx)\n\n\n Term wt hp Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 2.62 100 0.012035 0.00994 1.211 0.226 2.1 -0.00744 0.03151\n hp 2.62 110 0.006983 0.00585 1.194 0.232 2.1 -0.00448 0.01844\n hp 2.88 100 0.014161 0.01052 1.346 0.178 2.5 -0.00645 0.03477\n hp 2.88 110 0.016404 0.01229 1.334 0.182 2.5 -0.00769 0.04050\n hp 2.32 100 0.001564 0.00220 0.712 0.476 1.1 -0.00274 0.00587\n hp 2.32 110 0.000656 0.00118 0.557 0.577 0.8 -0.00165 0.00296\n\nColumns: rowid, rowidcf, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, am, wt, hp, predicted_lo, predicted_hi, predicted \nType: response \n\n# Heteroskedasticity robust standard errors\nmfx <- slopes(mod, vcov = sandwich::vcovHC(mod))\nhead(mfx)\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 0.006983 0.009046 0.772 0.440 1.2 -0.010746 0.024712\n hp 0.016404 0.012418 1.321 0.186 2.4 -0.007934 0.040742\n hp 0.002828 0.004876 0.580 0.562 0.8 -0.006729 0.012386\n hp 0.001935 0.002035 0.951 0.342 1.5 -0.002055 0.005924\n hp 0.002993 0.002928 1.022 0.307 1.7 -0.002747 0.008732\n hp 0.000148 0.000235 0.629 0.529 0.9 -0.000312 0.000607\n\nColumns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, am, hp, wt \nType: response \n\n# hypothesis test: is the `hp` marginal effect at the mean equal to the `drat` marginal effect\nmod <- lm(mpg ~ wt + drat, data = mtcars)\n\nslopes(\n mod,\n newdata = \"mean\",\n hypothesis = \"wt = drat\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n wt=drat -6.23 1.05 -5.91 <0.001 28.2 -8.29 -4.16\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# same hypothesis test using row indices\nslopes(\n mod,\n newdata = \"mean\",\n hypothesis = \"b1 - b2 = 0\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n b1-b2=0 6.23 1.05 5.91 <0.001 28.2 4.16 8.29\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# same hypothesis test using numeric vector of weights\nslopes(\n mod,\n newdata = \"mean\",\n hypothesis = c(1, -1))\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n custom 6.23 1.05 5.91 <0.001 28.2 4.16 8.29\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# two custom contrasts using a matrix of weights\nlc <- matrix(c(\n 1, -1,\n 2, 3),\n ncol = 2)\ncolnames(lc) <- c(\"Contrast A\", \"Contrast B\")\nslopes(\n mod,\n newdata = \"mean\",\n hypothesis = lc)\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n Contrast A 6.23 1.05 5.91 <0.001 28.2 4.16 8.29\n Contrast B -11.46 4.92 -2.33 0.0198 5.7 -21.10 -1.82\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response"
+ "text": "Examples\n\nlibrary(marginaleffects)\n\n\n\n\n# Unit-level (conditional) Marginal Effects\nmod <- glm(am ~ hp * wt, data = mtcars, family = binomial)\nmfx <- slopes(mod)\nhead(mfx)\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 0.006983 0.005847 1.194 0.232 2.1 -0.004476 0.018442\n hp 0.016404 0.012295 1.334 0.182 2.5 -0.007693 0.040501\n hp 0.002828 0.003764 0.751 0.452 1.1 -0.004550 0.010206\n hp 0.001935 0.002442 0.792 0.428 1.2 -0.002851 0.006721\n hp 0.002993 0.003213 0.931 0.352 1.5 -0.003305 0.009291\n hp 0.000148 0.000321 0.459 0.646 0.6 -0.000482 0.000778\n\nColumns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, am, hp, wt \nType: response \n\n# Average Marginal Effect (AME)\navg_slopes(mod, by = TRUE)\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 0.00265 0.0021 1.26 0.2069 2.3 -0.00147 0.00677\n wt -0.43578 0.1435 -3.04 0.0024 8.7 -0.71712 -0.15445\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# Marginal Effect at the Mean (MEM)\nslopes(mod, newdata = datagrid())\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 0.00853 0.00824 1.03 0.301 1.7 -0.00763 0.0247\n wt -1.74453 1.55594 -1.12 0.262 1.9 -4.79411 1.3051\n\nColumns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, am, hp, wt \nType: response \n\n# Marginal Effect at User-Specified Values\n# Variables not explicitly included in `datagrid()` are held at their means\nslopes(mod, newdata = datagrid(hp = c(100, 110)))\n\n\n Term hp Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 100 0.00117 0.00171 0.684 0.494 1.0 -0.00218 0.00451\n hp 110 0.00190 0.00240 0.788 0.431 1.2 -0.00282 0.00661\n wt 100 -0.19468 0.29895 -0.651 0.515 1.0 -0.78061 0.39126\n wt 110 -0.33154 0.42907 -0.773 0.440 1.2 -1.17250 0.50942\n\nColumns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, hp, predicted_lo, predicted_hi, predicted, am, wt \nType: response \n\n# Group-Average Marginal Effects (G-AME)\n# Calculate marginal effects for each observation, and then take the average\n# marginal effect within each subset of observations with different observed\n# values for the `cyl` variable:\nmod2 <- lm(mpg ~ hp * cyl, data = mtcars)\navg_slopes(mod2, variables = \"hp\", by = \"cyl\")\n\n\n Term Contrast cyl Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp mean(dY/dX) 4 -0.0917 0.0353 -2.596 0.00943 6.7 -0.1610 -0.0225\n hp mean(dY/dX) 6 -0.0523 0.0204 -2.561 0.01045 6.6 -0.0923 -0.0123\n hp mean(dY/dX) 8 -0.0128 0.0143 -0.891 0.37280 1.4 -0.0409 0.0153\n\nColumns: term, contrast, cyl, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted \nType: response \n\n# Marginal Effects at User-Specified Values (counterfactual)\n# Variables not explicitly included in `datagrid()` are held at their\n# original values, and the whole dataset is duplicated once for each\n# combination of the values in `datagrid()`\nmfx <- slopes(mod,\n newdata = datagrid(hp = c(100, 110),\n grid_type = \"counterfactual\"))\nhead(mfx)\n\n\n Term wt hp Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 2.62 100 0.012035 0.00994 1.211 0.226 2.1 -0.00744 0.03151\n hp 2.62 110 0.006983 0.00585 1.194 0.232 2.1 -0.00448 0.01844\n hp 2.88 100 0.014161 0.01051 1.347 0.178 2.5 -0.00644 0.03476\n hp 2.88 110 0.016404 0.01229 1.334 0.182 2.5 -0.00769 0.04050\n hp 2.32 100 0.001564 0.00220 0.712 0.476 1.1 -0.00274 0.00587\n hp 2.32 110 0.000656 0.00118 0.557 0.577 0.8 -0.00165 0.00296\n\nColumns: rowid, rowidcf, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, am, wt, hp, predicted_lo, predicted_hi, predicted \nType: response \n\n# Heteroskedasticity robust standard errors\nmfx <- slopes(mod, vcov = sandwich::vcovHC(mod))\nhead(mfx)\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n hp 0.006983 0.009047 0.772 0.440 1.2 -0.010748 0.024715\n hp 0.016404 0.012419 1.321 0.187 2.4 -0.007936 0.040744\n hp 0.002828 0.004876 0.580 0.562 0.8 -0.006728 0.012385\n hp 0.001935 0.002035 0.951 0.342 1.5 -0.002054 0.005924\n hp 0.002993 0.002928 1.022 0.307 1.7 -0.002747 0.008732\n hp 0.000148 0.000235 0.629 0.529 0.9 -0.000312 0.000607\n\nColumns: rowid, term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, am, hp, wt \nType: response \n\n# hypothesis test: is the `hp` marginal effect at the mean equal to the `drat` marginal effect\nmod <- lm(mpg ~ wt + drat, data = mtcars)\n\nslopes(\n mod,\n newdata = \"mean\",\n hypothesis = \"wt = drat\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n wt=drat -6.23 1.05 -5.92 <0.001 28.2 -8.29 -4.16\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# same hypothesis test using row indices\nslopes(\n mod,\n newdata = \"mean\",\n hypothesis = \"b1 - b2 = 0\")\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n b1-b2=0 6.23 1.05 5.92 <0.001 28.2 4.16 8.29\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# same hypothesis test using numeric vector of weights\nslopes(\n mod,\n newdata = \"mean\",\n hypothesis = c(1, -1))\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n custom 6.23 1.05 5.92 <0.001 28.2 4.16 8.29\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response \n\n# two custom contrasts using a matrix of weights\nlc <- matrix(c(\n 1, -1,\n 2, 3),\n ncol = 2)\ncolnames(lc) <- c(\"Contrast A\", \"Contrast B\")\nslopes(\n mod,\n newdata = \"mean\",\n hypothesis = lc)\n\n\n Term Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %\n Contrast A 6.23 1.05 5.92 <0.001 28.2 4.16 8.29\n Contrast B -11.46 4.92 -2.33 0.0197 5.7 -21.10 -1.83\n\nColumns: term, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high \nType: response"
},
{
"objectID": "articles/reference/marginal_means.html#description",
@@ -1537,7 +1544,7 @@
"href": "articles/reference/marginal_means.html#prediction-types",
"title": "marginal_means",
"section": "Prediction types",
- "text": "Prediction types\n\nThe type argument determines the scale of the predictions used to compute quantities of interest with functions from the marginaleffects package. Admissible values for type depend on the model object. When users specify an incorrect value for type, marginaleffects will raise an informative error with a list of valid type values for the specific model object. The first entry in the list in that error message is the default type.\n\n\nThe invlink(link) is a special type defined by marginaleffects. It is available for some (but not all) models and functions. With this link type, we first compute predictions on the link scale, then we use the inverse link function to backtransform the predictions to the response scale. This is useful for models with non-linear link functions as it can ensure that confidence intervals stay within desirable bounds, ex: 0 to 1 for a logit model. Note that an average of estimates with type=“invlink(link)” will not always be equivalent to the average of estimates with type=“response”.\n\n\nSome of the most common type values are:\n\n\nresponse, link, E, Ep, average, class, conditional, count, cum.prob, cumprob, density, disp, expected, expvalue, fitted, invlink(link), latent, linear.predictor, linpred, location, lp, mean, p, pr, precision, prediction, prob, probability, probs, quantile, risk, scale, survival, unconditional, utility, variance, xb, zero, zlink, zprob"
+ "text": "Prediction types\n\nThe type argument determines the scale of the predictions used to compute quantities of interest with functions from the marginaleffects package. Admissible values for type depend on the model object. When users specify an incorrect value for type, marginaleffects will raise an informative error with a list of valid type values for the specific model object. The first entry in the list in that error message is the default type.\n\n\nThe invlink(link) is a special type defined by marginaleffects. It is available for some (but not all) models and functions. With this link type, we first compute predictions on the link scale, then we use the inverse link function to backtransform the predictions to the response scale. This is useful for models with non-linear link functions as it can ensure that confidence intervals stay within desirable bounds, ex: 0 to 1 for a logit model. Note that an average of estimates with type=“invlink(link)” will not always be equivalent to the average of estimates with type=“response”.\n\n\nSome of the most common type values are:\n\n\nresponse, link, E, Ep, average, class, conditional, count, cum.prob, cumprob, density, disp, expected, expvalue, fitted, invlink(link), latent, linear.predictor, linpred, location, lp, mean, numeric, p, pr, precision, prediction, prob, probability, probs, quantile, risk, scale, survival, unconditional, utility, variance, xb, zero, zlink, zprob"
},
{
"objectID": "articles/reference/marginal_means.html#references",