Merge pull request #3 from ngreifer/main

Updates
IQSS · Nov 4, 2024 · f5e1a8c · f5e1a8c
2 parents f8a4a96 + 6c1ddc6
commit f5e1a8c
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 3 deletions.
diff --git a/_freeze/example/execute-results/html.json b/_freeze/example/execute-results/html.json
@@ -1,8 +1,11 @@
 {
-  "hash": "328e6606120e7ddb69d054eaafbd270c",
+  "hash": "2d3a2efc8eba838f475c579a52817e77",
   "result": {
-    "markdown": "# Example Data {#sec-example}\n\nBelow, we'll demonstrate how to perform matching and weighting in R. We'll use the famous right-heart catheterization (RHC) dataset analyzed in @connorsEffectivenessRightHeart1996a, which examines the effect of RHC on death by 60 days. @connorsEffectivenessRightHeart1996a used 1:1 matching with a caliper to estimate the effect, which corresponds to an ATO (though they provided no justification for this choice of estimand). It turns out this matters quite a bit; the ATT, ATC, and ATE differ from each other and lead to different conclusions about the risk of RHC.\n\nThe choice of estimand depends on the policy implied by the analysis. Are we interested in examining whether RHC is harmful and should be withheld from patients receiving it? If so, we are interested in the ATT of RHC. Are we interested in examining whether RHC would benefit patients not receiving it? If so, we are interested in the ATC of RHC. Are we interested in the average effect of RHC for the whole study population? If so, we are interested in the ATE of RHC.\n\nWe'll assume that if we are making a causal inference about the effect of RHC, we have collected a sufficient set of variables to remove confounding. This may be a long list, but to keep the example short, we'll use a list of 13 covariates thought to be related to receipt of RHC and death at 60 days, all measured prior to receipt of RHC.\n\nLet's take a look at our dataset:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsummary(rhc)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n      aps1           meanbp1           pafi1           crea1         \n Min.   :  3.00   Min.   :  0.00   Min.   : 11.6   Min.   : 0.09999  \n 1st Qu.: 41.00   1st Qu.: 50.00   1st Qu.:133.3   1st Qu.: 1.00000  \n Median : 54.00   Median : 63.00   Median :202.5   Median : 1.50000  \n Mean   : 54.67   Mean   : 78.52   Mean   :222.3   Mean   : 2.13302  \n 3rd Qu.: 67.00   3rd Qu.:115.00   3rd Qu.:316.6   3rd Qu.: 2.39990  \n Max.   :147.00   Max.   :259.00   Max.   :937.5   Max.   :25.09766  \n     hema1           paco21          surv2md1          resp1         card     \n Min.   : 2.00   Min.   :  1.00   Min.   :0.0000   Min.   :  0.00   No :3804  \n 1st Qu.:26.10   1st Qu.: 31.00   1st Qu.:0.4709   1st Qu.: 14.00   Yes:1931  \n Median :30.00   Median : 37.00   Median :0.6280   Median : 30.00             \n Mean   :31.87   Mean   : 38.75   Mean   :0.5925   Mean   : 28.09             \n 3rd Qu.:36.30   3rd Qu.: 42.00   3rd Qu.:0.7430   3rd Qu.: 38.00             \n Max.   :66.19   Max.   :156.00   Max.   :0.9620   Max.   :100.00             \n      edu             age            race          sex            RHC        \n Min.   : 0.00   Min.   : 18.04   white:4460   Female:2543   Min.   :0.0000  \n 1st Qu.:10.00   1st Qu.: 50.15   black: 920   Male  :3192   1st Qu.:0.0000  \n Median :12.00   Median : 64.05   other: 355                 Median :0.0000  \n Mean   :11.68   Mean   : 61.38                              Mean   :0.3808  \n 3rd Qu.:13.00   3rd Qu.: 73.93                              3rd Qu.:1.0000  \n Max.   :30.00   Max.   :101.85                              Max.   :1.0000  \n     death      \n Min.   :0.000  \n 1st Qu.:0.000  \n Median :1.000  \n Mean   :0.649  \n 3rd Qu.:1.000  \n Max.   :1.000  \n```\n:::\n:::\n\n\nOur treatment variable is `RHC` (1 for receipt, 0 for non-receipt), our outcome is `death` (1 for died at 60 days, 0 otherwise), and the other variables are covariates thought to remove confounding, which include a mix of continuous and categorical variables.\n\nLet's examine balance on the variables between the treatment groups using `cobalt`, which provides the function `bal.tab()` for creating a balance table containing balance statistics for each variables.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"cobalt\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\n cobalt (Version 4.5.1.9001, Build Date: 2023-08-01)\n```\n:::\n:::\n\n\nWe'll request the standardized mean difference by including `\"m\"` in the `stats` argument and setting `binary = \"std\"` (by default binary variables are not standardized) and we'll request KS statistics by including `\"ks\"` in `stats`. Supplying the treatment and covariates in the first argument using a formula and supplying the data set gives us the following:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbal.tab(RHC ~ aps1 + meanbp1 + pafi1 + crea1 + hema1 +\n          paco21 + surv2md1 + resp1 + card + edu +\n          age + race + sex, data = rhc,\n        stats = c(\"m\", \"ks\"), binary = \"std\")\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nNote: `s.d.denom` not specified; assuming pooled.\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nBalance Measures\n              Type Diff.Un  KS.Un\naps1       Contin.  0.5014 0.2127\nmeanbp1    Contin. -0.4551 0.2117\npafi1      Contin. -0.4332 0.1816\ncrea1      Contin.  0.2696 0.2011\nhema1      Contin. -0.2693 0.1479\npaco21     Contin. -0.2486 0.1081\nsurv2md1   Contin. -0.1985 0.0957\nresp1      Contin. -0.1655 0.0910\ncard_Yes    Binary  0.2950 0.1395\nedu        Contin.  0.0914 0.0511\nage        Contin. -0.0614 0.0703\nrace_white  Binary  0.0152 0.0063\nrace_black  Binary -0.0310 0.0114\nrace_other  Binary  0.0208 0.0050\nsex_Male    Binary  0.0931 0.0462\n\nSample sizes\n    Control Treated\nAll    3551    2184\n```\n:::\n:::\n\n\nWe can see significant imbalances in many of the covariates, with high SMDs (greater than .1) and KS statistics (greater than .1, but there is no accepted threshold for these). We can also see the sample sizes for each treatment group. Note that because they are somewhat close in size (the control group is not even twice the size of the treatment group), this will limit the available matching options available and might affect our ability to achieve balance using methods that require a large pool of controls relative to the treated group.\n\nOther balance statistics can be requested, too, using the `stats` argument. It is straightforward to assess balance on particular transformations of covariates using the `addl` argument, e.g., `addl = ~age:educ` to assess balance on the interaction (i.e., product) of `age` and `educ`. We can also supply `int = TRUE` and `poly = 3`, for example, to assess balance on all pairwise interactions of covariates and all squares and cubes of the continuous covariates. This can make for large tables, but there are ways to keep them short and summarize them. For example, we can hide the balance table and request the number of covariates that fail to satisfy balance criteria and the covariates with the worst imbalance using code below:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbal.tab(RHC ~ aps1 + meanbp1 + pafi1 + crea1 + hema1 +\n          paco21 + surv2md1 + resp1 + card + edu +\n          age + race + sex, data = rhc,\n        int = TRUE, poly = 3,\n        stats = c(\"m\", \"ks\"), binary = \"std\",\n        thresholds = c(m = .1, ks = .1),\n        disp.bal.tab = FALSE)\n```\n\n::: {.cell-output .cell-output-stderr}\n```\nNote: `s.d.denom` not specified; assuming pooled.\n```\n:::\n\n::: {.cell-output .cell-output-stdout}\n```\nBalance tally for mean differences\n                   count\nBalanced, <0.1        61\nNot Balanced, >0.1   105\n\nVariable with the greatest mean difference\n        Variable Diff.Un     M.Threshold.Un\n meanbp1 * pafi1 -0.5965 Not Balanced, >0.1\n\nBalance tally for KS statistics\n                   count\nBalanced, <0.1        80\nNot Balanced, >0.1    86\n\nVariable with the greatest KS statistic\n        Variable  KS.Un    KS.Threshold.Un\n meanbp1 * pafi1 0.2562 Not Balanced, >0.1\n\nSample sizes\n    Control Treated\nAll    3551    2184\n```\n:::\n:::\n\n\nWe can see that many covariates and their transformations (interactions, squares, and cubes) are not balanced based on our criteria for SMDs or KS statistics. We'll use matching and weighting in the next sections to attempt to achieve balance on the covariates.\n",
-    "supporting": [],
+    "engine": "knitr",
+    "markdown": "# Example Data {#sec-example}\n\nBelow, we'll demonstrate how to perform matching and weighting in R. We'll use the famous right-heart catheterization (RHC) dataset analyzed in @connorsEffectivenessRightHeart1996a, which examines the effect of RHC on death by 60 days. This dataset can be downloaded [here](https://hbiostat.org/data/) or using `Hmisc::getHdata(\"rhc\")`[^example-1]. @connorsEffectivenessRightHeart1996a used 1:1 matching with a caliper to estimate the effect, which corresponds to an ATO (though they provided no justification for this choice of estimand). It turns out this matters quite a bit; the ATT, ATC, and ATE differ from each other and lead to different conclusions about the risk of RHC.\n\n[^example-1]: The version we use here has slight modifications and can be downloaded [here](https://github.com/IQSS/dss-ps/blob/main/rhc.rds) or brought into R using `rhc <- readRDS(url(\"https://github.com/IQSS/dss-ps/raw/refs/heads/main/rhc.rds\"))`\n\nThe choice of estimand depends on the policy implied by the analysis. Are we interested in examining whether RHC is harmful and should be withheld from patients receiving it? If so, we are interested in the ATT of RHC. Are we interested in examining whether RHC would benefit patients not receiving it? If so, we are interested in the ATC of RHC. Are we interested in the average effect of RHC for the whole study population? If so, we are interested in the ATE of RHC.\n\nWe'll assume that if we are making a causal inference about the effect of RHC, we have collected a sufficient set of variables to remove confounding. This may be a long list, but to keep the example short, we'll use a list of 13 covariates thought to be related to receipt of RHC and death at 60 days, all measured prior to receipt of RHC.\n\nLet's take a look at our dataset:\n\n\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nsummary(rhc)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n      aps1           meanbp1           pafi1           crea1         \n Min.   :  3.00   Min.   :  0.00   Min.   : 11.6   Min.   : 0.09999  \n 1st Qu.: 41.00   1st Qu.: 50.00   1st Qu.:133.3   1st Qu.: 1.00000  \n Median : 54.00   Median : 63.00   Median :202.5   Median : 1.50000  \n Mean   : 54.67   Mean   : 78.52   Mean   :222.3   Mean   : 2.13302  \n 3rd Qu.: 67.00   3rd Qu.:115.00   3rd Qu.:316.6   3rd Qu.: 2.39990  \n Max.   :147.00   Max.   :259.00   Max.   :937.5   Max.   :25.09766  \n     hema1           paco21          surv2md1          resp1         card     \n Min.   : 2.00   Min.   :  1.00   Min.   :0.0000   Min.   :  0.00   No :3804  \n 1st Qu.:26.10   1st Qu.: 31.00   1st Qu.:0.4709   1st Qu.: 14.00   Yes:1931  \n Median :30.00   Median : 37.00   Median :0.6280   Median : 30.00             \n Mean   :31.87   Mean   : 38.75   Mean   :0.5925   Mean   : 28.09             \n 3rd Qu.:36.30   3rd Qu.: 42.00   3rd Qu.:0.7430   3rd Qu.: 38.00             \n Max.   :66.19   Max.   :156.00   Max.   :0.9620   Max.   :100.00             \n      edu             age            race          sex            RHC        \n Min.   : 0.00   Min.   : 18.04   white:4460   Female:2543   Min.   :0.0000  \n 1st Qu.:10.00   1st Qu.: 50.15   black: 920   Male  :3192   1st Qu.:0.0000  \n Median :12.00   Median : 64.05   other: 355                 Median :0.0000  \n Mean   :11.68   Mean   : 61.38                              Mean   :0.3808  \n 3rd Qu.:13.00   3rd Qu.: 73.93                              3rd Qu.:1.0000  \n Max.   :30.00   Max.   :101.85                              Max.   :1.0000  \n     death      \n Min.   :0.000  \n 1st Qu.:0.000  \n Median :1.000  \n Mean   :0.649  \n 3rd Qu.:1.000  \n Max.   :1.000  \n```\n\n\n:::\n:::\n\n\n\n\nOur treatment variable is `RHC` (1 for receipt, 0 for non-receipt), our outcome is `death` (1 for died at 60 days, 0 otherwise), and the other variables are covariates thought to remove confounding, which include a mix of continuous and categorical variables.\n\nLet's examine balance on the variables between the treatment groups using `cobalt`, which provides the function `bal.tab()` for creating a balance table containing balance statistics for each variables.\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"cobalt\")\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\n cobalt (Version 4.5.5, Build Date: 2024-04-02)\n```\n\n\n:::\n:::\n\n\n\n\nWe'll request the standardized mean difference by including `\"m\"` in the `stats` argument and setting `binary = \"std\"` (by default binary variables are not standardized) and we'll request KS statistics by including `\"ks\"` in `stats`. Supplying the treatment and covariates in the first argument using a formula and supplying the data set gives us the following:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbal.tab(RHC ~ aps1 + meanbp1 + pafi1 + crea1 + hema1 +\n          paco21 + surv2md1 + resp1 + card + edu +\n          age + race + sex, data = rhc,\n        stats = c(\"m\", \"ks\"), binary = \"std\")\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nNote: `s.d.denom` not specified; assuming \"pooled\".\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\nBalance Measures\n              Type Diff.Un  KS.Un\naps1       Contin.  0.5014 0.2127\nmeanbp1    Contin. -0.4551 0.2117\npafi1      Contin. -0.4332 0.1816\ncrea1      Contin.  0.2696 0.2011\nhema1      Contin. -0.2693 0.1479\npaco21     Contin. -0.2486 0.1081\nsurv2md1   Contin. -0.1985 0.0957\nresp1      Contin. -0.1655 0.0910\ncard_Yes    Binary  0.2950 0.1395\nedu        Contin.  0.0914 0.0511\nage        Contin. -0.0614 0.0703\nrace_white  Binary  0.0152 0.0063\nrace_black  Binary -0.0310 0.0114\nrace_other  Binary  0.0208 0.0050\nsex_Male    Binary  0.0931 0.0462\n\nSample sizes\n    Control Treated\nAll    3551    2184\n```\n\n\n:::\n:::\n\n\n\n\nWe can see significant imbalances in many of the covariates, with high SMDs (greater than .1) and KS statistics (greater than .1, but there is no accepted threshold for these). We can also see the sample sizes for each treatment group. Note that because they are somewhat close in size (the control group is not even twice the size of the treatment group), this will limit the available matching options available and might affect our ability to achieve balance using methods that require a large pool of controls relative to the treated group.\n\nOther balance statistics can be requested, too, using the `stats` argument. It is straightforward to assess balance on particular transformations of covariates using the `addl` argument, e.g., `addl = ~age:educ` to assess balance on the interaction (i.e., product) of `age` and `educ`. We can also supply `int = TRUE` and `poly = 3`, for example, to assess balance on all pairwise interactions of covariates and all squares and cubes of the continuous covariates. This can make for large tables, but there are ways to keep them short and summarize them. For example, we can hide the balance table and request the number of covariates that fail to satisfy balance criteria and the covariates with the worst imbalance using code below:\n\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbal.tab(RHC ~ aps1 + meanbp1 + pafi1 + crea1 + hema1 +\n          paco21 + surv2md1 + resp1 + card + edu +\n          age + race + sex, data = rhc,\n        int = TRUE, poly = 3,\n        stats = c(\"m\", \"ks\"), binary = \"std\",\n        thresholds = c(m = .1, ks = .1),\n        disp.bal.tab = FALSE)\n```\n\n::: {.cell-output .cell-output-stderr}\n\n```\nNote: `s.d.denom` not specified; assuming \"pooled\".\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stdout}\n\n```\nBalance tally for mean differences\n                   count\nBalanced, <0.1        61\nNot Balanced, >0.1   105\n\nVariable with the greatest mean difference\n        Variable Diff.Un     M.Threshold.Un\n meanbp1 * pafi1 -0.5965 Not Balanced, >0.1\n\nBalance tally for KS statistics\n                   count\nBalanced, <0.1        80\nNot Balanced, >0.1    86\n\nVariable with the greatest KS statistic\n        Variable  KS.Un    KS.Threshold.Un\n meanbp1 * pafi1 0.2562 Not Balanced, >0.1\n\nSample sizes\n    Control Treated\nAll    3551    2184\n```\n\n\n:::\n:::\n\n\n\n\nWe can see that many covariates and their transformations (interactions, squares, and cubes) are not balanced based on our criteria for SMDs or KS statistics. We'll use matching and weighting in the next sections to attempt to achieve balance on the covariates.\n",
+    "supporting": [
+      "example_files"
+    ],
     "filters": [
       "rmarkdown/pagebreak.lua"
     ],

diff --git a/example.qmd b/example.qmd
@@ -27,6 +27,8 @@ if (!file.exists("rhc.rds")) {
   rhc <- rhc[c(covs, treat, outcome)]
   saveRDS(rhc, "rhc.rds")
 }
+
+rhc <- readRDS("rhc.rds")
 ```
 
 ```{r}
-Original file line number
+Diff line change
@@ Expand Up / @@ -27,6 +27,8 @@ if (!file.exists("rhc.rds")) { @@
       rhc <- rhc[c(covs, treat, outcome)]
       saveRDS(rhc, "rhc.rds")
     }
+    rhc <- readRDS("rhc.rds")
     ```
     ```{r}
@@ Expand Down @@