Introduction to Reproducibility in Cancer Informatics

This course was created from this GitHub template.

You can see the rendered course material here: https://jhudatascience.org/Reproducibility_in_Cancer_Informatics

If you would like to contribute to this course material, take a look at the getting started GitHub wiki pages.

About this course

This course introduces the concepts of reproducibility and replicability in the context of cancer informatics. It is the first course in a two part course on reproducibility (See the advanced course here). It uses hands-on exercises to demonstrate in practical terms how to increase the reproducibility of data analyses. The course also introduces tools relevant to reproducibility including analysis notebooks, package management, git and GitHub.

Learning Objectives

This course will teach learners to:

Understand the fundamental concepts of reproducibility vs replicability
Write a basic analysis that could be passed to another person/computer and be re-ran to obtain the same output
Write analysis notebook that adheres to best practices for reproducibility and readability
Create a reproducible project on GitHub that could be re-run by someone else

Encountering problems?

If you are encountering any problems with this course, please file a GitHub issue or contact us at {Some email or web address with a contact form}.

All materials in this course are licensed CC-BY and can be repurposed freely with attribution.

About the chapter example files

Each chapter has its own zip file to be downloaded as a learner is going through a particular chapter.

There are the Python and R versions. For example, for chapter 4 there are:

python-examples/python-heatmap-chapt-4
r-examples/r-heatmap-chapt-4

These files are zipped up by a GitHub action so they are ready for easy downloading by the learner. The user will download these according to the chapter and follow along with the chapter to make them more reproducible and eventually hopefully have something that looks like the "final" reproducible example versions.

This is the URL pattern they can find the chapter files at:

For Python:

https://github.com/jhudsl/Reproducibility_in_Cancer_Informatics/raw/main/chapter-zips/python-heatmap-chapt-4.zip

For R:

https://github.com/jhudsl/Reproducibility_in_Cancer_Informatics/raw/main/chapter-zips/r-heatmap-chapt-4.zip

Obtaining the "final" versions of the example reproducible analyses

Both the "final" versions of the example analyses have their own repositories that are submodules of this one (located in their respective directories with the less reproducible versions of them in the r-examples and python-examples directories). Final here is in quotes because we may continue to make improvements to this notebook too -- this course tries to emphasize that work on data analyses should be iterative and we never have to say we're done with an analysis if we find other ways it can be improved!

Running the R docker image:

With your current directory being the top of this repository, you can do:

cd r-examples/reproducible-r-example
docker build -f docker/Dockerfile . -t jhudsl/reproducible-r
docker run -it -v $PWD:/home/rstudio -e PASSWORD=password -p 8787:8787 jhudsl/reproducible-r

Then, in the browser of your choice, navigate to localhost:8787 ; using rstudio as your username and password as your password (or whatever you choose for your password in the command above). This docker image has the renv included in it.

Running the Python docker image:

With your current directory being the top of this repository, you can do:

cd python-examples/reproducible-python-example
docker build -f docker/Dockerfile . -t jhudsl/reproducible-python
docker run --rm -v $(pwd):/home/jovyan/work -e JUPYTER_ENABLE_LAB=yes -p 8888:8888 jhudsl/reproducible-python

Then, in the browser of your choice, navigate to the port that the output tells you. This docker image will automatically have your conda environment set up and working.

Name		Name	Last commit message	Last commit date
Latest commit History 664 Commits
.github		.github
_bookdown_files/02-chapter_of_course_files/figure-html		_bookdown_files/02-chapter_of_course_files/figure-html
assets		assets
chapter-zips		chapter-zips
docker		docker
docs		docs
python-examples		python-examples
r-examples		r-examples
resources		resources
scripts		scripts
style-sets		style-sets
.gitignore		.gitignore
.gitmodules		.gitmodules
01-intro.Rmd		01-intro.Rmd
02-defining-reproducibility.Rmd		02-defining-reproducibility.Rmd
03-project-organization.Rmd		03-project-organization.Rmd
04-open-source-with-github.Rmd		04-open-source-with-github.Rmd
05-scientific-notebooks.Rmd		05-scientific-notebooks.Rmd
06-package-management.Rmd		06-package-management.Rmd
07-durable-code.Rmd		07-durable-code.Rmd
08-readmes.Rmd		08-readmes.Rmd
09-code-review.Rmd		09-code-review.Rmd
About.Rmd		About.Rmd
GA_Script.Rhtml		GA_Script.Rhtml
Introduction to Reproducibility.rds		Introduction to Reproducibility.rds
LICENSE		LICENSE
README.md		README.md
References.Rmd		References.Rmd
_bookdown.yml		_bookdown.yml
_output.yml		_output.yml
book.bib		book.bib
code_of_conduct.md		code_of_conduct.md
config_automation.yml		config_automation.yml
environment.yml		environment.yml
index.Rmd		index.Rmd
make_screenshots.R		make_screenshots.R
packages.bib		packages.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to Reproducibility in Cancer Informatics

About this course

Learning Objectives

Encountering problems?

About the chapter example files

Obtaining the "final" versions of the example reproducible analyses

Running the R docker image:

Running the Python docker image:

About

Releases 1

Packages

Contributors 3

Languages

License

jhudsl/Reproducibility_in_Cancer_Informatics

Folders and files

Latest commit

History

Repository files navigation

Introduction to Reproducibility in Cancer Informatics

About this course

Learning Objectives

Encountering problems?

About the chapter example files

Obtaining the "final" versions of the example reproducible analyses

Running the R docker image:

Running the Python docker image:

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages