Merge pull request #3 from ihmeuw-msca/reorganize

Restructure distrx
ihmeuw-msca · Jul 19, 2024 · e8e7c3d · e8e7c3d
2 parents 0467f80 + 76bd5e1
commit e8e7c3d
Show file tree

Hide file tree

Showing 5 changed files with 530 additions and 282 deletions.
diff --git a/docs/user_guide/percentage_change.rst b/docs/user_guide/percentage_change.rst
@@ -1,7 +1,51 @@
-Percentage Change
-=============
+=========================
+Bivariate Transformations
+=========================
 
-Currently percentage change is implemented in two ways. You can either provide raw data in the form
-of observations from 2 separate, equally sized samples (as you would have from an experiment) or
-raw counts with the separate, not necessarily equal sample sizes (as you would have from incidence
-counts at two separate times)
+There are currently 2 bivariate transformations implemented in distrx:
+    * percentage change
+    * ratio
+
+These transformations are implemented using the first order delta method. See INSERT CONCEPTS for
+derivation if desired. Note that all functions are in terms of sample statistics (e.g. mean), not
+raw counts, even though some functions do take counts as input.
+
+Example: Percentage Change
+--------------------------
+
+Suppose we have samples in 2 different years measuring the incidence of cancer cases in each year
+in various state counties. The data may look something like the following,
+
+.. csv-table::
+   :header: county, cases_1, sample_1, cases_2, sample_2
+   :widths: 10, 10, 10, 10, 10
+   :align: center
+
+   "King", 252, 400, 258, 250
+   "Snohomish", 12, 300, 90, 500
+   "Pierce", 505, 1000, 219, 1000
+   "Kitsap", 88, 124, 67, 204
+
+and our goal is to find the percentage change in the prevalence of cancer with its appropriate SE.
+
+The first step is to import the required function from the distrx package.
+
+.. code-block:: python
+
+    from distrx import transform_bivariate
+
+Different transformation functions can be chosen through specifying a string parameter of which
+transform you would like to apply to your data. In this case, it is the following.
+
+.. code-block:: python
+
+    mu_tx, sigma_tx = transform_bivariate(c_x=df["cases_1"],
+                                          n_x=df["sample_1"],
+                                          c_y=df["cases_2"],
+                                          n_y=df["sample_2"],
+                                          transform="percentage_change")
+
+``mu_tx`` and ``sigma_tx`` are simply the percentage change for each county and their corresponding
+standard errors, respectively. ``sigma_tx`` has already been scaled the appropriate sample size so
+we **should not** scale it additionally with some function of othe sample size to obtain a
+confidence interval.
diff --git a/docs/user_guide/simple_transformations.rst b/docs/user_guide/simple_transformations.rst
@@ -1,11 +1,53 @@
-Simple Transformations
-=====
+==========================
+Univariate Transformations
+==========================
 
-There are currently 4 simple transformations implemented in distrx:
+There are currently 4 univariate transformations implemented in distrx:
     * log
     * logit
     * exp
     * expit
 
 These transformations are implemented using the first order delta method, which works in these
-cases as all of the transformations listed are continuous and differentiable.
+cases as all of the transformations listed are continuous and differentiable. To briefly summarize,
+the delta method transforms the variance by multiplying the original standard error by the first
+order Taylor expansion of the transformation function.
+
+Example: Log Transform
+----------------------
+
+Suppose that we have some means and standard errors (SEs) of systolic blood pressure (SBP) from
+several different samples. The data may look something like the following,
+
+.. csv-table::
+   :header: mean, se, n
+   :widths: 10, 10, 10
+   :align: center
+
+   122, 10, 106
+   140, 14, 235
+   113, 8, 462
+   124, 15, 226
+   134, 7, 509
+
+and our goal is to obtain the appropriate SEs for the data after applying the log transform.
+
+The first step is to import the required function from the distrx package.
+
+.. code-block:: python
+
+    from distrx import transform_univariate
+
+Different transformation functions can be chosen through specifying a string parameter of which
+transform you would like to apply to your data. In this case, it is the following.
+
+.. code-block:: python
+
+    mu_tx, sigma_tx = transform_univariate(mu=df["means"],
+                                           sigma=df["se"],
+                                           n=df["n"],
+                                           transform="log")
+
+``mu_tx`` and ``sigma_tx`` are simply the means with the transformation function applied and their
+corresponding standard errors, respectively. ``sigma_tx`` has already been scaled by :math:`\sqrt{n}`
+so the we **should not** scale it by square root of the sample size to obtain a confidence interval.
diff --git a/simulations.ipynb b/simulations.ipynb