From 9f070a48298d99d7cac8ff54eb0f024d1f34f324 Mon Sep 17 00:00:00 2001 From: Louise Deconinck Date: Wed, 11 Sep 2024 11:50:29 +0200 Subject: [PATCH] Slides --- book/in_memory/reticulate.qmd | 24 +--- book/in_memory/rpy2.qmd | 17 +++ slides/slides.qmd | 211 ++++++++++++++++++++++++---------- 3 files changed, 169 insertions(+), 83 deletions(-) diff --git a/book/in_memory/reticulate.qmd b/book/in_memory/reticulate.qmd index c973ab7..8300f23 100644 --- a/book/in_memory/reticulate.qmd +++ b/book/in_memory/reticulate.qmd @@ -55,27 +55,11 @@ result_r <- py_to_r(result) r_to_py(result_r) ``` - -# Interactive sessions -One of the most useful ways to take advantage of in-memory interoperability is to use it in interactive sessions, where you're exploring the data and want to try out some functions non-native to your language of choice. - -Jupyter notebooks (and some other notebooks) make this possible from the Python side: using IPython line and cell magic and rpy2, you can easily run an R jupyter cell in your notebooks. - -```{python show_magic, eval=FALSE} -%load_ext rpy2.ipython # line magic that loads the rpy2 ipython extension. - # this extension allows the use of the following cell magic - -%%R -i input -o output # this line allows to specify inputs - # (which will be converted to R objects) and outputs - # (which will be converted back to Python objects) - # this line is put at the start of a cell - # the rest of the cell will be run as R code - -``` - +# Interactivity +You can easily include Python chunks in Rmarkdown notebooks using the Python engine in `knitr`. # Usecase -We will not showcase the usefulness of reticulate by using the DE analysis: it would involve loading in `pandas` to create a Python dataframe, adding rownames and columnnames and then grouping them, but that is easier to do just in R. +We will not showcase the usefulness of reticulate by using the DE analysis: it would involve loading in `pandas` to create a Python dataframe, adding rownames and columnnames and then grouping them, but that is easier to do natively in R. A more interesting thing you can do using `reticulate` is interacting with anndata-based Python packages, such as `scanpy`! @@ -104,7 +88,7 @@ adata We can't easily show the result of the plot in this Quarto notebook, but we can save it and show it: -```{r scanpy_plot} +```{r scanpy_plot, warning=TRUE} path <- "umap.png" sc$pl$umap(adata, color="leiden_res1", save=path) ``` diff --git a/book/in_memory/rpy2.qmd b/book/in_memory/rpy2.qmd index aea7c01..8740319 100644 --- a/book/in_memory/rpy2.qmd +++ b/book/in_memory/rpy2.qmd @@ -86,6 +86,23 @@ with anndata2ri.converter.context(): ad2 = anndata2ri.rpy2py(sce) ``` +## Interactive sessions +One of the most useful ways to take advantage of in-memory interoperability is to use it in interactive sessions, where you're exploring the data and want to try out some functions non-native to your language of choice. + +Jupyter notebooks (and some other notebooks) make this possible from the Python side: using IPython line and cell magic and rpy2, you can easily run an R jupyter cell in your notebooks. + +```{python show_magic, eval=FALSE} +%load_ext rpy2.ipython # line magic that loads the rpy2 ipython extension. + # this extension allows the use of the following cell magic + +%%R -i input -o output # this line allows to specify inputs + # (which will be converted to R objects) and outputs + # (which will be converted back to Python objects) + # this line is put at the start of a cell + # the rest of the cell will be run as R code + +``` + ## Usecase: ran in Python We will perform the Compute DE step not in R, but in Python diff --git a/slides/slides.qmd b/slides/slides.qmd index 681cbca..73b6edb 100644 --- a/slides/slides.qmd +++ b/slides/slides.qmd @@ -22,10 +22,6 @@ exectute: echo: true --- -# test - -{{< include ../book/in_memory/pitfalls.qmd#rpy2_mapping echo=true >}} - # Introduction 1. How do you interact with a package in another language? @@ -43,97 +39,186 @@ We will be focusing on R & Python 1. Package-based interoperability 2. Best practices -## Package-based interoperability +# Package-based interoperability or: the question of reimplementation. -Consider the pros: -- Discoverability -- Can your package be useful in other domains? -- Very user friendly +- Consider the pros: -Consider the cons: -- Think twice: is it worth it? -- It's a lot of work -- How will you keep it up to date? -- How will you ensure parity? + 1. Discoverability + 2. Can your package be useful in other domains? + 3. Very user friendly -## Best practices -1. Work with the standards -2. Work with matrices, arrays and dataframes -3. Provide vignettes on interoperability +- Consider the cons: -# File format based interoperability + 1. Think twice: is it worth it? + 2. **It's a lot of work** + 3. How will you keep it up to date? + 4. How will you ensure parity? -# In-memory interoperability -Calling Python in an R environment and vice versa. -- No need to write out datasets. -- Best suited to calling functions +# Package-based interoperability -rpy2 and reticulate +Please learn both R & Python -## Overview +# Best practices +1. Work with the standards +2. Work with matrices, arrays and dataframes +3. Provide vignettes on interoperability -advantages & disadvantaes +# In-memory interoperability +![](../book/in_memory/images/imm_overview.png) -rpy2 -1. overview -2. usage -3. pitfalls +# Overview -reticulate: -1. overview -2. usage -3. pitfalls +1. Advantages & disadvantages +2. Pitfalls when using Python & R +2. Rpy2 +3. Reticulate -## in-memory interoperability advantages +# in-memory interoperability advantages - no need to write & read results - useful when you need a limited amount of functions in another language -## in-memory interoperability drawbacks -- no access to classes -- you need to extract necessary matrices & arrays for the method -- ensure that the method accepts this -- you need to be familiar with using & managing both environments +# in-memory interoperability drawbacks +- not always access to all classes - data duplication - you need to manage the environments -## rpy2 -Accessing R from Python +# Pitfalls when using Python and R +**Column major vs row major matrices** +In R: every dense matrix is stored as column major + +![](../book/in_memory/images/inmemorymatrix.png) + +# Pitfalls when using Python and R +**Indexing** + +![](../book/in_memory/images/indexing.png) + +# Pitfalls when using Python and R +**dots and underscores** + +- mapping in rpy2 + +```python +from rpy2.robjects.packages import importr + +d = {'package.dependencies': 'package_dot_dependencies', + 'package_dependencies': 'package_uscore_dependencies'} +tools = importr('tools', robject_translations = d) +``` + +# Pitfalls when using Python and R +**Integers** + +```r +library(reticulate) +bi <- reticulate::import_builtins() + +bi$list(bi$range(0, 5)) +# TypeError: 'float' object cannot be interpreted as an integer +``` + +```r +library(reticulate) +bi <- reticulate::import_builtins() + +bi$list(bi$range(0L, 5L)) +# [1] 0 1 2 3 4 +``` + +# Rpy2: basics +- Accessing R from Python + - `rpy2.rinterface`, the low-level interface + - `rpy2.robjects`, the high-level interface + +```python +import rpy2 +import rpy2.robjects as robjects + +vector = robjects.IntVector([1,2,3]) +rsum = robjects.r['sum'] + +rsum(vector) +``` + +# Rpy2: basics + +```python +str_vector = robjects.StrVector(['abc', 'def', 'ghi']) +flt_vector = robjects.FloatVector([0.3, 0.8, 0.7]) +int_vector = robjects.IntVector([1, 2, 3]) +mtx = robjects.r.matrix(robjects.IntVector(range(10)), nrow=5) +``` + +# Rpy2: numpy + +```python +import numpy as np + +from rpy2.robjects import numpy2ri +from rpy2.robjects import default_converter + +rd_m = np.random.random((10, 7)) + +with (default_converter + numpy2ri.converter).context(): + mtx2 = robjects.r.matrix(rd_m, nrow = 10) +``` + +# Rpy2: pandas +```python +import pandas as pd + +from rpy2.robjects import pandas2ri + +pd_df = pd.DataFrame({'int_values': [1,2,3], + 'str_values': ['abc', 'def', 'ghi']}) + +with (default_converter + pandas2ri.converter).context(): + pd_df_r = robjects.DataFrame(pd_df) +``` -Example: code block +# Rpy2: sparse matrices -`rpy2.rinterface`, the low-level interface -`rpy2.robjects`, the high-level interface +```python +import scipy as sp -Example for calling R functions +from anndata2ri import scipy2ri -Example for conversion of arrays +sparse_matrix = sp.sparse.csc_matrix(rd_m) -## rpy2 -Conversion: -numpy & pandas +with (default_converter + scipy2ri.converter).context(): + sp_r = scipy2ri.py2rpy(sparse_matrix) +``` -Example: code block +# Rpy2: anndata -sparse matrices: anndata2ri +```python +import anndata as ad +import scanpy.datasets as scd -## rpy2 +import anndata2ri -Jupyter(like) notebooks: -make use of the Magic command interface +adata_paul = scd.paul15() -`%load_ext rmagic` -`%R -i input -o output` +with anndata2ri.converter.context(): + sce = anndata2ri.py2rpy(adata_paul) + ad2 = anndata2ri.rpy2py(sce) +``` -example +# Rpy2: interactivity -## rpy2 +```python +%load_ext rpy2.ipython # line magic that loads the rpy2 ipython extension. + # this extension allows the use of the following cell magic -1. let your method be run with matrices and arrays as input -2. anndata2ri -? +%%R -i input -o output # this line allows to specify inputs + # (which will be converted to R objects) and outputs + # (which will be converted back to Python objects) + # this line is put at the start of a cell + # the rest of the cell will be run as R code +``` -## reticulate +# Reticulate # Workflows