Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Kish-method to rescale_weights #575

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open

Conversation

strengejacke
Copy link
Member

No description provided.

@strengejacke
Copy link
Member Author

@strengejacke
Copy link
Member Author

@bwiernik This is the relevant part of the code:

.rescale_weights_kish <- function(nest, probability_weights, data_tmp, data, by, weight_non_na) {
  p_weights <- data_tmp[[probability_weights]]
  # design effect according to Kish
  deff <- mean(p_weights^2) / (mean(p_weights)^2)
  # rescale weights, so their mean is 1
  z_weights <- p_weights * (1 / mean(p_weights))
  # divide weights by design effect
  data$pweights <- NA_real_
  data$pweights[weight_non_na] <- z_weights / deff
  # return result
  data
}

@strengejacke
Copy link
Member Author

library(datawizard)
data(nhanes_sample)
x1 <- rescale_weights(nhanes_sample, "SDMVSTRA", "WTINT2YR")
x2 <- rescale_weights(nhanes_sample, "SDMVSTRA", "WTINT2YR", method = "klish")

# two variants of Carle weights
sum(x1$pweights_a)
#> [1] 2992
sum(x1$pweights_b)
#> [1] 2244.715

# klish weights
sum(x2$pweights)
#> [1] 2162.54

# Rescaled weights
head(cbind(Carle_A = x1$pweights_a, Carle_B = x1$pweights_b, Klish = x2$pweights), 15)
#>         Carle_A   Carle_B     Klish
#>  [1,] 1.5733612 1.2005159 1.3952529
#>  [2,] 0.6231745 0.5246593 0.5661343
#>  [3,] 0.8976966 0.5439111 0.3805718
#>  [4,] 0.7083628 0.5498944 0.5003582
#>  [5,] 0.4217782 0.3119698 0.2108234
#>  [6,] 0.6877550 0.5155503 0.4036216
#>  [7,] 1.8855878 1.4637614 1.3319014
#>  [8,] 1.2947757 1.0900898 1.1762627
#>  [9,] 0.7072244 0.5231011 0.3535021
#> [10,] 0.7600105 0.5937167 0.5703616
#> [11,] 0.5465290 0.4242645 0.3860455
#> [12,] 0.3560001 0.2702126 0.2686613
#> [13,] 1.4453354 1.1713903 1.0993270
#> [14,] 1.2947757 1.0900898 1.1762627
#> [15,] 1.4853544 1.1274198 1.1209469

Created on 2024-12-18 with reprex v2.1.1

@strengejacke
Copy link
Member Author

strengejacke commented Dec 18, 2024

@etiennebacher Regardless of whether we add the Kish method or not, we should consider changing the names of the returned columns pweights into rescaled_weights. That's clearer. Would be a breaking change, though. WDYT?

@etiennebacher
Copy link
Member

No opinions regarding the name change but it's a good time to do it before 1.0 so feel free to go ahead with this.

@strengejacke
Copy link
Member Author

Here are some comparisons:

library(datawizard)
data(nhanes_sample)

# compare different methods, using multilevel-Poisson regression

d <- rescale_weights(nhanes_sample, "SDMVSTRA", "WTINT2YR")
result1 <- lme4::glmer(
  total ~ factor(RIAGENDR) + log(age) + factor(RIDRETH1) + (1 | SDMVPSU),
  family = poisson(),
  data = d,
  weights = pweights_a
)
result2 <- lme4::glmer(
  total ~ factor(RIAGENDR) + log(age) + factor(RIDRETH1) + (1 | SDMVPSU),
  family = poisson(),
  data = d,
  weights = pweights_b
)

d <- rescale_weights(
  nhanes_sample,
  probability_weights = "WTINT2YR",
  method = "kish"
)
result3 <- lme4::glmer(
  total ~ factor(RIAGENDR) + log(age) + factor(RIDRETH1) + (1 | SDMVPSU),
  family = poisson(),
  data = d,
  weights = pweights
)
result4 <- lme4::glmer(
  total ~ factor(RIAGENDR) + log(age) + factor(RIDRETH1) + (1 | SDMVPSU),
  family = poisson(),
  data = d
)
parameters::compare_parameters(
  list(result1, result2, result3, result4),
  exponentiate = TRUE,
  column_names = c("Carle (A)", "Carle (B)", "Kish", "unweighted")
) |> print(table_width = Inf)
#> Number of weighted observations differs from number of unweighted
#>   observations.
#> Parameter    |            Carle (A) |            Carle (B) |                Kish |           unweighted
#> -------------------------------------------------------------------------------------------------------
#> (Intercept)  | 12.20 (10.52, 14.14) | 11.95 (10.27, 13.92) | 11.72 (9.89, 13.87) | 13.62 (12.52, 14.83)
#> RIAGENDR [2] |  0.41 ( 0.40,  0.42) |  0.42 ( 0.40,  0.43) |  0.42 (0.41,  0.43) |  0.35 ( 0.34,  0.36)
#> age [log]    |  1.69 ( 1.63,  1.75) |  1.66 ( 1.60,  1.73) |  1.66 (1.59,  1.73) |  1.49 ( 1.44,  1.54)
#> RIDRETH1 [2] |  0.90 ( 0.84,  0.97) |  0.90 ( 0.83,  0.98) |  0.98 (0.90,  1.07) |  0.95 ( 0.88,  1.02)
#> RIDRETH1 [3] |  1.19 ( 1.14,  1.24) |  1.21 ( 1.16,  1.27) |  1.23 (1.17,  1.29) |  1.22 ( 1.19,  1.26)
#> RIDRETH1 [4] |  2.16 ( 2.07,  2.26) |  2.16 ( 2.06,  2.28) |  2.11 (1.99,  2.23) |  2.32 ( 2.25,  2.40)
#> RIDRETH1 [5] |  1.01 ( 0.95,  1.07) |  1.05 ( 0.97,  1.12) |  1.09 (1.01,  1.18) |  1.05 ( 0.99,  1.11)
#> -------------------------------------------------------------------------------------------------------
#> Observations |                 2617 |                 1965 |                1903 |                 2595

Created on 2024-12-18 with reprex v2.1.1

@strengejacke strengejacke marked this pull request as ready for review December 18, 2024 10:41
Copy link
Member

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, but we need @bwiernik's review anyway

NEWS.md Outdated Show resolved Hide resolved
R/rescale_weights.R Outdated Show resolved Hide resolved
R/rescale_weights.R Show resolved Hide resolved
R/rescale_weights.R Show resolved Hide resolved
R/rescale_weights.R Outdated Show resolved Hide resolved
# check for existing variable names
if ((method == "carle" && any(c("rescaled_weights_a", "rescaled_weights_b") %in% colnames(data))) ||
(method == "kish" && "rescaled_weights" %in% colnames(data))) {
insight::format_warning("The variable name for the rescaled weights already exists in the data. Returned columns will be renamed into unique names.") # nolint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an error? No particular pref, but it seems like it could be a footgun

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no particular preference either. Since we "decide" on the name of the new column, I thought it would be fair to just warn, and give the rescaled weights column a different name than the default one, if it already exists in the data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants