Merge pull request #4 from ihmeuw-msca/addfunctions

Addfunctions
ihmeuw-msca · Sep 16, 2024 · b406e79 · b406e79
2 parents 4e45ec1 + a444c1f
commit b406e79
Show file tree

Hide file tree

Showing 7 changed files with 201 additions and 263 deletions.
diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
@@ -4,6 +4,7 @@ on:
   push:
     tags:
       - "v[0-9]+.[0-9]+.[0-9]+"
+  workflow_dispatch:
 
 permissions:
   contents: write

diff --git a/docs/user_guide/percentage_change.rst → docs/user_guide/bivariate.rst b/docs/user_guide/percentage_change.rst → docs/user_guide/bivariate.rst
@@ -28,7 +28,18 @@ in various state counties. The data may look something like the following,
 
 and our goal is to find the percentage change in the prevalence of cancer with its appropriate SE.
 
-The first step is to import the required function from the distrx package.
+Since we have counts and distrx expects mean/standard error (SE), we must first convert the data
+appropriately. Counts data is common at IHME, so a function is provided to return sample mean and
+SE given incidence count and sample size. We can import it and save the necessary variables like so.
+
+.. code-block:: python
+
+    from distrx import process_counts
+    mu_x, sigma_x = process_counts(cases_1, sample_1)
+    mu_y, sigma_y = process_counts(cases_2, sample_2)
+
+
+Then, we can import the required function from the distrx package.
 
 .. code-block:: python
 
@@ -39,13 +50,12 @@ transform you would like to apply to your data. In this case, it is the followin
 
 .. code-block:: python
 
-    mu_tx, sigma_tx = transform_bivariate(c_x=df["cases_1"],
-                                          n_x=df["sample_1"],
-                                          c_y=df["cases_2"],
-                                          n_y=df["sample_2"],
+    mu_tx, sigma_tx = transform_bivariate(mu_x=mu_x,
+                                          sigma_x=sigma_x,
+                                          mu_y=mu_y,
+                                          sigma_y=sigma_y,
                                           transform="percentage_change")
 
 ``mu_tx`` and ``sigma_tx`` are simply the percentage change for each county and their corresponding
-standard errors, respectively. ``sigma_tx`` has already been scaled the appropriate sample size so
-we **should not** scale it additionally with some function of othe sample size to obtain a
-confidence interval.
+standard errors, respectively. If a CI for the mean is desired, simply use
+``mu_tx +/- Q * sigma_tx``.
diff --git a/docs/user_guide/index.rst b/docs/user_guide/index.rst
@@ -5,8 +5,8 @@ User guide
    :hidden:
    :numbered:
 
-   simple_transformations
-   percentage_change
+   univariate
+   bivariate
 
 .. note::
 

diff --git a/docs/user_guide/simple_transformations.rst → docs/user_guide/univariate.rst b/docs/user_guide/simple_transformations.rst → docs/user_guide/univariate.rst
@@ -16,11 +16,11 @@ order Taylor expansion of the transformation function.
 Example: Log Transform
 ----------------------
 
-Suppose that we have some means and standard errors (SEs) of systolic blood pressure (SBP) from
+Suppose that we have some means and standard deviations (SDs) of systolic blood pressure (SBP) from
 several different samples. The data may look something like the following,
 
 .. csv-table::
-   :header: mean, se, n
+   :header: mean, SD, n
    :widths: 10, 10, 10
    :align: center
 
@@ -30,9 +30,18 @@ several different samples. The data may look something like the following,
    124, 15, 226
    134, 7, 509
 
-and our goal is to obtain the appropriate SEs for the data after applying the log transform.
+and our goal is to obtain the appropriate standard errors (SEs) for the mean after applying the log
+transform.
 
-The first step is to import the required function from the distrx package.
+Since we are interested in the transformed SEs and *not* the transformed SDs, we must provide the
+SEs to distrx. **If you already have SEs and are performing the same task, you should skip this
+step!**
+
+.. code-block:: python
+
+    df["SE"] = df["SD"] / df["n"]
+
+Now, import the appropriate function from distrx.
 
 .. code-block:: python
 
@@ -44,10 +53,9 @@ transform you would like to apply to your data. In this case, it is the followin
 .. code-block:: python
 
     mu_tx, sigma_tx = transform_univariate(mu=df["means"],
-                                           sigma=df["se"],
-                                           n=df["n"],
+                                           sigma=df["SE"],
                                            transform="log")
 
 ``mu_tx`` and ``sigma_tx`` are simply the means with the transformation function applied and their
-corresponding standard errors, respectively. ``sigma_tx`` has already been scaled by :math:`\sqrt{n}`
-so the we **should not** scale it by square root of the sample size to obtain a confidence interval.
+appropriately transformed standard errors, respectively. If a CI for the mean is desired, simply
+use ``mu_tx +/- Q * sigma_tx``.
diff --git a/simulations.ipynb b/simulations.ipynb