Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errata and suggestions: Model to Meaning website #1304

Open
DrJerryTAO opened this issue Dec 15, 2024 · 3 comments
Open

Errata and suggestions: Model to Meaning website #1304

DrJerryTAO opened this issue Dec 15, 2024 · 3 comments

Comments

@DrJerryTAO
Copy link

DrJerryTAO commented Dec 15, 2024

I have found some errors and areas to improve on the website https://marginaleffects.com/.

  1. The navigation panel repeats "9 Categorical and ordinal outcomes".
  2. At https://marginaleffects.com/chapters/experiments.html#factorial-experiments.
    "In plant phyisiology, they could be used to ?? how combinations of temperature and humidity affect photosynthesis."
  3. https://marginaleffects.com/chapters/experiments.html#conjoint-experiments
    "To analyze this dataset, we estimate a linear regression model with choice as the outcome, and in which all predictors are interacted:" Should it not use multinomial regression?
  4. https://marginaleffects.com/chapters/experiments.html#marginal-means
    "Compute the predicted (i.e., fitted) values for each row in the original dataset. Marginalize (average) those predictions with respect to the variable of interest." This description seems to create average predictions in the empirical grid, not marginal means.
    "To see if the average probability of selection is higher when a candidate is fluent in English, relative to when they require an interpreter, we use the hypothesis argument." The script hypothesis = "b1 = b3" needs updating due to version change.
  5. https://marginaleffects.com/chapters/experiments.html#average-marginal-component-effects
    "?sec-gcomputation" is not linked and defined.
  6. https://marginaleffects.com/chapters/experiments.html#average-feature-choice-probability
    "AMCE incorporates comparisons and averages over both direct and indirect attribute comparisons." What are direct and indirect comparisons? Better to supply a definition and give an example.
    "These data are structured in what could be called a “long” format...we convert long to wide data using the [reshape()] function from base R." The data are still in the long format. No reshape was needed.
    "Moreover, since the data is in “long” format, with one profile per row, we must also allow each variable to have different coefficients based on profile number: the effect of language on the probability of selection is obviously different for profile=1 and for profile=2. Indeed, when profile=1 the language column records the profile’s own language skills. When profile=2, the same column records the alternative profile’s language skills." This is not true. No profile predictor was used in any model. It should be used if the position or sequence of presenting the profiles matter when respondents tend to select the first or last option.
    mod <- lm(choice ~ language * language.alt + job * job.alt, data = dat) I think the proper model to use is simply a standard logistic regression where each person-task has one row and profile is the dummy response. Even in a linear probability model, the second row in each task offers no additional information if language * language.alt + job * job.alt are used as predictors.
    "Since we are not interested comparison pairs where both profiles have the same language skills, we use the subset to supply an appropriate grid." The description does not match the following script. The script should use newdata = subset(language.alt != language) instead.
    "Is the AFCP for “used intepreter vs. fluent” different from the AFCP for “broken vs. fluent”?" The question does not match the following script.
@vincentarelbundock
Copy link
Owner

Thanks so much for doing such a close read, and for reporting these issues. I really appreciate your time!

I'll fix those issues as soon as I find a few minutes.

@DrJerryTAO
Copy link
Author

In 12 Mixed effects regression and post stratification https://marginaleffects.com/chapters/mrp.html

  1. "Compute a weighted average of these predicted probabilities [is calculated] using population weights from the poststratification frame to produce the final MRP estimates."
  2. https://marginaleffects.com/chapters/mrp.html#posterior-summaries "By default, marginaleffects functions report the mean of posterior draws, along with compute the mean of the posterior distribution of draws, along with equal-tailed intervals."

In 18 Alternative Software https://marginaleffects.com/bonus/alternative_software.html#fmeffects

  1. The arguments of fme() has been updated https://holgstr.github.io/fmeffects/reference/fme.html. The older script on the website throws an error: Error in fmeffects::fme(model = forest, data = bikes, target = "count", : unused arguments (target = "count", step.size = 1)
  2. avg_comparisons() for the lrn() model has some strange behavior. It appears that avg_comparisons() extract the first obs of predicted_lo predicted_hi predicted from comparisons, not their average. These three columns, however, are not mentioned in the help file. Also for all observations, predicted == predicted_lo in this particular model. But in some other model applications, I have seen predicted == predicted_hi. See the demonstration below.
library(marginaleffects)
library("mlr3verse")
library("fmeffects")

data("bikes", package = "fmeffects")
task <- as_task_regr(x = bikes, id = "bikes", target = "count")
forest <- lrn("regr.ranger")$train(task)

avg_comparisons(forest, variables = list(temp = 1), newdata = bikes) |> 
  data.frame()
"  term contrast estimate predicted_lo predicted_hi predicted
1 temp       +1 57.41196     1443.132     1606.049  1443.132"
comparisons <- comparisons(
  forest, variables = list(temp = 1), newdata = bikes)
comparisons[1, ] |> data.frame()
"  rowid term contrast estimate predicted_lo predicted_hi predicted ...
1     1 temp       +1 162.9178     1443.132     1606.049  1443.132 ...
It appears that avg_comparisons() extract the first obs for 
predicted_lo predicted_hi predicted, not their average"
with(comparisons, mean(predicted_hi - predicted_lo))
"57.41196 matches avg_comparisons()"
with(comparisons, all(predicted == predicted_lo))
"TRUE"

effects <- fme(
  model = forest, data = bikes, 
  features = list("temp" = 1), ep.method = "envelope")
summary(effects)
"Forward Marginal Effects Object
Step type: numerical
Features & step lengths: temp, 1
Extrapolation point detection: envelope, EPs: 3 of 731 obs. (0 %)
Average Marginal Effect (AME): 57.6256"
plot(effects)
effects$results
"Key: <obs.id>
     obs.id       fme
      <int>     <num>
  1:      1 162.91781
  2:      2 283.90225
  3:      3  49.42845
  4:      4  98.74693
  5:      5 195.97501
 ---                 
724:    727 374.40931
725:    728 230.96809
726:    729 377.85819
727:    730 565.06474
728:    731  28.26367"
effects$ame
"57.6256"

@vincentarelbundock
Copy link
Owner

thanks for these!

But I don't understand your last point. You can ignore the predicted columns here. What matters is the estimate. The estimate from avg_comparisons() is 57, which is the same as the $ame in fmeffects(). Then, the first estimate from comparisons() is 162, which is the same as the first estimate in $results. So that's consistent.

What am I missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants