Skip to content

Latest commit

 

History

History
162 lines (90 loc) · 10.3 KB

README.md

File metadata and controls

162 lines (90 loc) · 10.3 KB

StatisticalRethinking v4

Project Status Documentation Build Status

Note

After many years I have decided to step away from my work with Stan and Julia. My plan is to be around until the end of 2024 for support if someone decides to step in and take over further development and maintenance work.

At the end of 2024 I'll archive the different packages and projects included in the Github organisations StanJulia, StatisticalRethingJulia and RegressionAndOtherStoriesJulia if no one is interested (and time-wise able!) to take on this work.

I have thoroughly enjoyed working on both Julia and Stan and see both projects mature during the last 15 or so years. And I will always be grateful for the many folks who have helped me on numerous occasions. Both the Julia and the Stan community are awesome to work with! Thanks a lot!

Purpose of this package

The StatisticalRethinking.jl package contains functions comparable to the functions in the R package "rethinking" associated with the book Statistical Rethinking by Richard McElreath.

These functions are used in Jupyter and Pluto notebook projects specifically intended for hands-on use while studying the book or taking the course.

Currently there are 3 of these notebook projects:

  1. Max Lapan's rethinking-2ed-julia which uses Turing.jl and Jupyter notebooks.

  2. The SR2TuringPluto.jl project, also Turing.jl based but using Pluto.jl instead of Jupyter. It is based on Max Lapan's work above.

  3. The SR2StanPluto.jl project, which uses Stan as implemented in StanSample.jl and StanQuap.jl. See StanJulia.

There is a fourth option to study the Turing.jl versions of the models in the Statistical Rethinking book which is in the form of a package and Franklin web pages: TuringModels.jl.

Why a StatisticalRethinking v4?

Over time more options become available to express the material covered in Statistical Rethinking, e.g. the use of KeyedArrays (provided by AxisKeys.jl) for the representation of mcmc chains.

Other examples are the recently developed ParetoSmooth.jl which could be used in the PSIS related examples as a replacement for ParetoSmoothedImportanceSampling.jl and the preliminary work by SHMUMA on Dagitty.jl (a potential replacement for StructuralCausalModels.jl).

While StatisticalRethinking v3 focused on making StatisticalRethinking.jl mcmc package independent, StatisticalRethinking v4 aims at de-coupling it from a specific graphical package and thus enables new choices for graphics, e.g. using Makie.jl and AlgebraOfGraphics.jl.

StatisticalRethinking.jl v4 also fits better with the new setup of Pluto notebooks which keep track of used package versions in the notebooks themselves (see here).

Workflow of StatisticalRethinkingJulia (v4):

  1. Data preparation, typically using CSV.jl, DataFrames.jl and some statistical methods from StatsBase.jl and Statistics.jl. In some cases simulations are used which need Distributions.jl and a few special methods (available in StatisticalRethinking.jl).

  2. Define the mcmc model, e.g. using StanSample.jl or Turing.jl, and obtain draws from the model.

  3. Capture the draws for further processing. In Turing that is usually done using MCMCChains.jl, in StanSample.jl v4 it's mostly in the form of a DataFrame, a StanTable, a KeyedArray chains (obtained from AxisKeys.jl).

  4. For further processing, the projects nearly always convert chains to a DataFrame.

  5. Inspect the chains using statistical and visual methods. In many cases this will need one or more statistical packages and one of the graphical options.

Currently visual options are StatsPlots/Plots based, e.g. in MCMCChains.jl and StatisticalRethinkingPlots.jl.

  1. The above 5 steps could all be done by just using StanSample.jl or Turing.jl.

The book Statistical Rethinking has a different objective and studies how models compare, how models can help (or mislead) and why multilevel modeling might help in some cases.

  1. For this, additional packages are available, explained and demonstrated, e.g. StructuralCausalModels.jl, ParetoSmoothedImportanceSampling.jl and quite a few more.

Using StatisticalRethinking v4

To work through the StatisticalRethinking book using Julia and Turing or Stan, download either one of the above mentioned projects and start Pluto (or Jupyter).

An early, experimental version of StructuralCausalModels.jl is also included as a dependency in the StatisticalRethinking.jl package.

In the meantime I will definitely keep my eyes on Dagitty.jl, Omega.jl and CausalInference.jl. In particular Dagitty.jl has very similar objectives as StructuralCausalModels.jl and over time might replace it in the StatisticalRethinkingJulia ecosystem. For now, StructuralCausalModels does provide ways to convert DAGs to Dagitty and ggm formats.

Similarly, a dependency ParetoSmoothedImportanceSampling.jl is used which provides PSIS and WAIC statistics for model comparison.

Versions

As listed in issue #145 recently it was noticed that some very old Jupyter notebook files are still present which makes an initial download, e.g. when dev-ing the package, rather long. This is not a problem when you just add the package.

I am planning to address that in v5.

Version 4

  • Drop the heavy use of @reexport.
  • Enable a future switch to Makie.jl and AlgebraOfGraphics.jl by moving all graphics to StatisticalRethinkingPlots and StatisticalRethinkingMakie (in the future).
  • Many more improvements by Max Lapan (@shmuma).

Versions 3.2.1 - 3.3.6

  • Improvements by Max Lapan.
  • Added trankplot.jl.
  • Add compare() and plot_models() abstractions.

Version 3.2.0

  • Option to retieve sampling results as a NamedTuple.
  • Added new method to plotbounds() to handle NamedTuples.
  • Added plotlines().

Versions v3.1.1 - 3.1.8

  • Updates from CompatHelper.
  • Switch to Github actions (CI, Documenter).
  • Updates from Rik Huijzer (link function).
  • Redo quap() based on StanOptimize.
  • Start Updating notebooks in ch 2-8 using new quap().
  • Redoing and updating the models in the models subdirectory.

Version 3.1.0

Align (stanbased) quap with Turing quap. quap() now returns a NamedTuple that includes a field distr which represents the quadratic Normal (MvNormal) approximation.

Version 3.0.0

StatisticalRethinking.jl v3 is independent of the underlying mcmc package. All scripts previously in StatisticalRethinking.jl v2 holding the snippets have been replaced by Pluto notebooks in the above mentioned mcmc specific project repositories.

Initially SR2TuringPluto.jl will lag SR2StanPluto.jl somewhat but later this year both will cover the same chapters.

It is the intention to develop tests for StatisticalRethinking.jl v3 that work across the different mcmc implementations. This will limit dependencies to the test/Project.toml.

Version 2.2.9

Currently the latest release available in the StatisticalRethinking.jl v2 format.

Installation

To install the package (from the REPL):

] add StatisticalRethinking

but in most cases this package will be a dependency of another package or project, e.g. SR2StanPluto.jl or SR2TuringPluto.jl.

Documentation

  • STABLEdocumentation of the most recently tagged version.
  • DEVELdocumentation of the in-development version.

Acknowledgements

Of course, without the excellent textbook by Richard McElreath, this package would not have been possible. The author has also been supportive of this work and gave permission to use the datasets.

Questions and issues

Question and contributions are very welcome, as are feature requests and suggestions. Please open an issue if you encounter any problems or have a question.

codecov

Coverage Status