Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/saeyslab/polygloty
Browse files Browse the repository at this point in the history
  • Loading branch information
berombau committed Sep 11, 2024
2 parents cc11e2e + e9e9ae5 commit a13a67b
Show file tree
Hide file tree
Showing 6 changed files with 174 additions and 87 deletions.
4 changes: 2 additions & 2 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,9 @@ book:
chapters:
- text: Pitfalls
href: book/in_memory/pitfalls.qmd
- text: rpy2
- text: Rpy2
href: book/in_memory/rpy2.qmd
- text: reticulate
- text: Reticulate
href: book/in_memory/reticulate.qmd
- text: "Disk-based interoperability"
part: book/disk_based/index.qmd
Expand Down
2 changes: 1 addition & 1 deletion book/in_memory/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ One language will act as the main language, and you will intereact with the othe

![A schematic overview](images/imm_overview.png){#fig-im-overview}

When evaluating R code within a Python program, we will make use of rpy2 to accomplish this. When evaluating Python code within an R program, we will make use of reticulate.
When evaluating R code within a Python program, we will make use of `Rpy2` to accomplish this. When evaluating Python code within an R program, we will make use of `Reticulate`.

4 changes: 2 additions & 2 deletions book/in_memory/pitfalls.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Take care to remember that arrays and matrices in Python are indexed starting fr
## Dots in variable names
In R it is very common to use dots in symbols and variable names. This is invalid in Python: dots are used for function calls.

When using rpy2, these dots are usually translated to underscores `_`. If this automatic translation can result in errors, this does not happen automatically. In this case, you can specify mappings for these symbols.
When using rpy2, these dots are usually translated to underscores `_`. If this translation can result in errors, this does not happen automatically. In this case, you can specify mappings for these symbols.

```{python rpy2_mapping}
from rpy2.robjects.packages import importr
Expand All @@ -48,7 +48,7 @@ is.integer(float_ex)
is.integer(int_ex)
```

This can often lead to errors when using `reticulate`! If you're calling a Python function and provide it with just a number in R, it probably won't be recognised as an integer, leading to weird errors:
This can often lead to errors when using `reticulate`! If you're calling a Python function and provide it with just a number in R, it probably won't be recognised as an integer, leading to errors:

```{r float_integer_error, error=TRUE}
library(reticulate)
Expand Down
24 changes: 4 additions & 20 deletions book/in_memory/reticulate.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -55,27 +55,11 @@ result_r <- py_to_r(result)
r_to_py(result_r)
```


# Interactive sessions
One of the most useful ways to take advantage of in-memory interoperability is to use it in interactive sessions, where you're exploring the data and want to try out some functions non-native to your language of choice.

Jupyter notebooks (and some other notebooks) make this possible from the Python side: using IPython line and cell magic and rpy2, you can easily run an R jupyter cell in your notebooks.

```{python show_magic, eval=FALSE}
%load_ext rpy2.ipython # line magic that loads the rpy2 ipython extension.
# this extension allows the use of the following cell magic
%%R -i input -o output # this line allows to specify inputs
# (which will be converted to R objects) and outputs
# (which will be converted back to Python objects)
# this line is put at the start of a cell
# the rest of the cell will be run as R code
```

# Interactivity
You can easily include Python chunks in Rmarkdown notebooks using the Python engine in `knitr`.

# Usecase
We will not showcase the usefulness of reticulate by using the DE analysis: it would involve loading in `pandas` to create a Python dataframe, adding rownames and columnnames and then grouping them, but that is easier to do just in R.
We will not showcase the usefulness of reticulate by using the DE analysis: it would involve loading in `pandas` to create a Python dataframe, adding rownames and columnnames and then grouping them, but that is easier to do natively in R.

A more interesting thing you can do using `reticulate` is interacting with anndata-based Python packages, such as `scanpy`!

Expand Down Expand Up @@ -104,7 +88,7 @@ adata

We can't easily show the result of the plot in this Quarto notebook, but we can save it and show it:

```{r scanpy_plot}
```{r scanpy_plot, warning=TRUE, output=TRUE}
path <- "umap.png"
sc$pl$umap(adata, color="leiden_res1", save=path)
```
Expand Down
19 changes: 18 additions & 1 deletion book/in_memory/rpy2.qmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: In-memory interoperability
title: Rpy2
engine: knitr
---

Expand Down Expand Up @@ -86,6 +86,23 @@ with anndata2ri.converter.context():
ad2 = anndata2ri.rpy2py(sce)
```

## Interactive sessions
One of the most useful ways to take advantage of in-memory interoperability is to use it in interactive sessions, where you're exploring the data and want to try out some functions non-native to your language of choice.

Jupyter notebooks (and some other notebooks) make this possible from the Python side: using IPython line and cell magic and rpy2, you can easily run an R jupyter cell in your notebooks.

```{python show_magic, eval=FALSE}
%load_ext rpy2.ipython # line magic that loads the rpy2 ipython extension.
# this extension allows the use of the following cell magic
%%R -i input -o output # this line allows to specify inputs
# (which will be converted to R objects) and outputs
# (which will be converted back to Python objects)
# this line is put at the start of a cell
# the rest of the cell will be run as R code
```

## Usecase: ran in Python

We will perform the Compute DE step not in R, but in Python
Expand Down
208 changes: 147 additions & 61 deletions slides/slides.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,6 @@ exectute:
echo: true
---

# test

{{< include ../book/in_memory/pitfalls.qmd#rpy2_mapping echo=true >}}

# Introduction

1. How do you interact with a package in another language?
Expand Down Expand Up @@ -56,96 +52,186 @@ While interoperability is currently possible developers continue to improve the
1. Package-based interoperability
2. Best practices

## Package-based interoperability
# Package-based interoperability
or: the question of reimplementation.

Consider the pros:
- Discoverability
- Can your package be useful in other domains?
- Very user friendly
- Consider the pros:

1. Discoverability
2. Can your package be useful in other domains?
3. Very user friendly

- Consider the cons:

1. Think twice: is it worth it?
2. **It's a lot of work**
3. How will you keep it up to date?
4. How will you ensure parity?

Consider the cons:
- Think twice: is it worth it?
- It's a lot of work
- How will you keep it up to date?
- How will you ensure parity?
# Package-based interoperability

## Best practices
Please learn both R & Python

# Best practices
1. Work with the standards
2. Work with matrices, arrays and dataframes
3. Provide vignettes on interoperability

# In-memory interoperability
Calling Python in an R environment and vice versa.
- No need to write out datasets.
- Best suited to calling functions
![](../book/in_memory/images/imm_overview.png)

rpy2 and reticulate
# Overview

## Overview
1. Advantages & disadvantages
2. Pitfalls when using Python & R
2. Rpy2
3. Reticulate

advantages & disadvantages

rpy2
1. overview
2. usage
3. pitfalls

reticulate:
1. overview
2. usage
3. pitfalls

## in-memory interoperability advantages
# in-memory interoperability advantages
- no need to write & read results
- useful when you need a limited amount of functions in another language

## in-memory interoperability drawbacks
- no access to classes
- you need to extract necessary matrices & arrays for the method
- ensure that the method accepts this
- you need to be familiar with using & managing both environments
# in-memory interoperability drawbacks
- not always access to all classes
- data duplication
- you need to manage the environments

## rpy2
Accessing R from Python
# Pitfalls when using Python and R
**Column major vs row major matrices**
In R: every dense matrix is stored as column major

![](../book/in_memory/images/inmemorymatrix.png)

# Pitfalls when using Python and R
**Indexing**

![](../book/in_memory/images/indexing.png)

# Pitfalls when using Python and R
**dots and underscores**

- mapping in rpy2

```python
from rpy2.robjects.packages import importr

d = {'package.dependencies': 'package_dot_dependencies',
'package_dependencies': 'package_uscore_dependencies'}
tools = importr('tools', robject_translations = d)
```

# Pitfalls when using Python and R
**Integers**

```r
library(reticulate)
bi <- reticulate::import_builtins()

bi$list(bi$range(0, 5))
# TypeError: 'float' object cannot be interpreted as an integer
```

```r
library(reticulate)
bi <- reticulate::import_builtins()

bi$list(bi$range(0L, 5L))
# [1] 0 1 2 3 4
```

# Rpy2: basics
- Accessing R from Python
- `rpy2.rinterface`, the low-level interface
- `rpy2.robjects`, the high-level interface

```python
import rpy2
import rpy2.robjects as robjects

vector = robjects.IntVector([1,2,3])
rsum = robjects.r['sum']

rsum(vector)
```

# Rpy2: basics

```python
str_vector = robjects.StrVector(['abc', 'def', 'ghi'])
flt_vector = robjects.FloatVector([0.3, 0.8, 0.7])
int_vector = robjects.IntVector([1, 2, 3])
mtx = robjects.r.matrix(robjects.IntVector(range(10)), nrow=5)
```

# Rpy2: numpy

```python
import numpy as np

from rpy2.robjects import numpy2ri
from rpy2.robjects import default_converter

rd_m = np.random.random((10, 7))

with (default_converter + numpy2ri.converter).context():
mtx2 = robjects.r.matrix(rd_m, nrow = 10)
```

# Rpy2: pandas
```python
import pandas as pd

from rpy2.robjects import pandas2ri

pd_df = pd.DataFrame({'int_values': [1,2,3],
'str_values': ['abc', 'def', 'ghi']})

with (default_converter + pandas2ri.converter).context():
pd_df_r = robjects.DataFrame(pd_df)
```

Example: code block
# Rpy2: sparse matrices

`rpy2.rinterface`, the low-level interface
`rpy2.robjects`, the high-level interface
```python
import scipy as sp

Example for calling R functions
from anndata2ri import scipy2ri

Example for conversion of arrays
sparse_matrix = sp.sparse.csc_matrix(rd_m)

## rpy2
Conversion:
numpy & pandas
with (default_converter + scipy2ri.converter).context():
sp_r = scipy2ri.py2rpy(sparse_matrix)
```

Example: code block
# Rpy2: anndata

sparse matrices: anndata2ri
```python
import anndata as ad
import scanpy.datasets as scd

## rpy2
import anndata2ri

Jupyter(like) notebooks:
make use of the Magic command interface
adata_paul = scd.paul15()

`%load_ext rmagic`
`%R -i input -o output`
with anndata2ri.converter.context():
sce = anndata2ri.py2rpy(adata_paul)
ad2 = anndata2ri.rpy2py(sce)
```

example
# Rpy2: interactivity

## rpy2
```python
%load_ext rpy2.ipython # line magic that loads the rpy2 ipython extension.
# this extension allows the use of the following cell magic

1. let your method be run with matrices and arrays as input
2. anndata2ri
?
%%R -i input -o output # this line allows to specify inputs
# (which will be converted to R objects) and outputs
# (which will be converted back to Python objects)
# this line is put at the start of a cell
# the rest of the cell will be run as R code
```

## reticulate
# Reticulate

# Disk-based interoperability

Expand Down

0 comments on commit a13a67b

Please sign in to comment.