Merge branch 'main' of https://github.com/saeyslab/polygloty

saeyslab · Sep 11, 2024 · a13a67b · a13a67b
2 parents cc11e2e + e9e9ae5
commit a13a67b
Show file tree

Hide file tree

Showing 6 changed files with 174 additions and 87 deletions.
diff --git a/_quarto.yml b/_quarto.yml
@@ -53,9 +53,9 @@ book:
       chapters:
         - text: Pitfalls
           href: book/in_memory/pitfalls.qmd
-        - text: rpy2
+        - text: Rpy2
           href: book/in_memory/rpy2.qmd
-        - text: reticulate
+        - text: Reticulate
           href: book/in_memory/reticulate.qmd
     - text: "Disk-based interoperability"
       part: book/disk_based/index.qmd

diff --git a/book/in_memory/index.qmd b/book/in_memory/index.qmd
@@ -9,5 +9,5 @@ One language will act as the main language, and you will intereact with the othe
 
 ![A schematic overview](images/imm_overview.png){#fig-im-overview}
 
-When evaluating R code within a Python program, we will make use of rpy2 to accomplish this. When evaluating Python code within an R program, we will make use of reticulate.
+When evaluating R code within a Python program, we will make use of `Rpy2` to accomplish this. When evaluating Python code within an R program, we will make use of `Reticulate`.
 
diff --git a/book/in_memory/pitfalls.qmd b/book/in_memory/pitfalls.qmd
@@ -25,7 +25,7 @@ Take care to remember that arrays and matrices in Python are indexed starting fr
 ## Dots in variable names
 In R it is very common to use dots in symbols and variable names. This is invalid in Python: dots are used for function calls.
 
-When using rpy2, these dots are usually translated to underscores `_`. If this automatic translation can result in errors, this does not happen automatically. In this case, you can specify mappings for these symbols.
+When using rpy2, these dots are usually translated to underscores `_`. If this translation can result in errors, this does not happen automatically. In this case, you can specify mappings for these symbols.
 
 ```{python rpy2_mapping}
 from rpy2.robjects.packages import importr
@@ -48,7 +48,7 @@ is.integer(float_ex)
 is.integer(int_ex)
 ```
 
-This can often lead to errors when using `reticulate`! If you're calling a Python function and provide it with just a number in R, it probably won't be recognised as an integer, leading to weird errors:
+This can often lead to errors when using `reticulate`! If you're calling a Python function and provide it with just a number in R, it probably won't be recognised as an integer, leading to errors:
 
 ```{r float_integer_error, error=TRUE}
 library(reticulate)

diff --git a/book/in_memory/reticulate.qmd b/book/in_memory/reticulate.qmd
@@ -55,27 +55,11 @@ result_r <- py_to_r(result)
 r_to_py(result_r)
 ```
 
-
-# Interactive sessions
-One of the most useful ways to take advantage of in-memory interoperability is to use it in interactive sessions, where you're exploring the data and want to try out some functions non-native to your language of choice.
-
-Jupyter notebooks (and some other notebooks) make this possible from the Python side: using IPython line and cell magic and rpy2, you can easily run an R jupyter cell in your notebooks.
-
-```{python show_magic, eval=FALSE}
-%load_ext rpy2.ipython  # line magic that loads the rpy2 ipython extension.
-                        # this extension allows the use of the following cell magic
-
-%%R -i input -o output  # this line allows to specify inputs 
-                        # (which will be converted to R objects) and outputs 
-                        # (which will be converted back to Python objects) 
-                        # this line is put at the start of a cell
-                        # the rest of the cell will be run as R code
-
-```
-
+# Interactivity
+You can easily include Python chunks in Rmarkdown notebooks using the Python engine in `knitr`.
 
 # Usecase
-We will not showcase the usefulness of reticulate by using the DE analysis: it would involve loading in `pandas` to create a Python dataframe, adding rownames and columnnames and then grouping them, but that is easier to do just in R.
+We will not showcase the usefulness of reticulate by using the DE analysis: it would involve loading in `pandas` to create a Python dataframe, adding rownames and columnnames and then grouping them, but that is easier to do natively in R.
 
 A more interesting thing you can do using `reticulate` is interacting with anndata-based Python packages, such as `scanpy`! 
 
@@ -104,7 +88,7 @@ adata
 
 We can't easily show the result of the plot in this Quarto notebook, but we can save it and show it:
 
-```{r scanpy_plot}
+```{r scanpy_plot, warning=TRUE, output=TRUE}
 path <- "umap.png"
 sc$pl$umap(adata, color="leiden_res1", save=path)
 ```

diff --git a/book/in_memory/rpy2.qmd b/book/in_memory/rpy2.qmd
@@ -1,5 +1,5 @@
 ---
-title: In-memory interoperability
+title: Rpy2
 engine: knitr
 ---
 
@@ -86,6 +86,23 @@ with anndata2ri.converter.context():
     ad2 = anndata2ri.rpy2py(sce)
 ```
 
+## Interactive sessions
+One of the most useful ways to take advantage of in-memory interoperability is to use it in interactive sessions, where you're exploring the data and want to try out some functions non-native to your language of choice.
+
+Jupyter notebooks (and some other notebooks) make this possible from the Python side: using IPython line and cell magic and rpy2, you can easily run an R jupyter cell in your notebooks.
+
+```{python show_magic, eval=FALSE}
+%load_ext rpy2.ipython  # line magic that loads the rpy2 ipython extension.
+                        # this extension allows the use of the following cell magic
+
+%%R -i input -o output  # this line allows to specify inputs 
+                        # (which will be converted to R objects) and outputs 
+                        # (which will be converted back to Python objects) 
+                        # this line is put at the start of a cell
+                        # the rest of the cell will be run as R code
+
+```
+
 ## Usecase: ran in Python
 
 We will perform the Compute DE step not in R, but in Python

diff --git a/slides/slides.qmd b/slides/slides.qmd
@@ -22,10 +22,6 @@ exectute:
     echo: true
 ---
 
-# test
-
-{{< include ../book/in_memory/pitfalls.qmd#rpy2_mapping echo=true >}}
-
 # Introduction
 
 1. How do you interact with a package in another language?
@@ -56,96 +52,186 @@ While interoperability is currently possible developers continue to improve the
 1. Package-based interoperability
 2. Best practices
 
-## Package-based interoperability
+# Package-based interoperability
 or: the question of reimplementation.
 
-Consider the pros:
-- Discoverability
-- Can your package be useful in other domains?
-- Very user friendly
+- Consider the pros:
+
+  1. Discoverability
+  2. Can your package be useful in other domains?
+  3. Very user friendly
+
+- Consider the cons:
+
+  1. Think twice: is it worth it?
+  2. **It's a lot of work**
+  3. How will you keep it up to date?
+  4. How will you ensure parity?
 
-Consider the cons:
-- Think twice: is it worth it?
-- It's a lot of work
-- How will you keep it up to date?
-- How will you ensure parity?
+# Package-based interoperability
 
-## Best practices
+Please learn both R & Python
 
+# Best practices
 1. Work with the standards
 2. Work with matrices, arrays and dataframes
 3. Provide vignettes on interoperability
 
 # In-memory interoperability
-Calling Python in an R environment and vice versa.
-- No need to write out datasets.
-- Best suited to calling functions
+![](../book/in_memory/images/imm_overview.png)
 
-rpy2 and reticulate
+# Overview
 
-## Overview
+1. Advantages & disadvantages
+2. Pitfalls when using Python & R
+2. Rpy2
+3. Reticulate
 
-advantages & disadvantages
-
-rpy2
-1. overview
-2. usage
-3. pitfalls
-
-reticulate:
-1. overview
-2. usage
-3. pitfalls
-
-## in-memory interoperability advantages
+# in-memory interoperability advantages
 - no need to write & read results
 - useful when you need a limited amount of functions in another language
 
-## in-memory interoperability drawbacks
-- no access to classes
-- you need to extract necessary matrices & arrays for the method
-- ensure that the method accepts this
-- you need to be familiar with using & managing both environments
+# in-memory interoperability drawbacks
+- not always access to all classes
 - data duplication
 - you need to manage the environments
 
-## rpy2
-Accessing R from Python
+# Pitfalls when using Python and R
+**Column major vs row major matrices**
+In R: every dense matrix is stored as column major
+
+![](../book/in_memory/images/inmemorymatrix.png)
+
+# Pitfalls when using Python and R
+**Indexing**
+
+![](../book/in_memory/images/indexing.png)
+
+# Pitfalls when using Python and R
+**dots and underscores**
+
+- mapping in rpy2
+
+```python
+from rpy2.robjects.packages import importr
+
+d = {'package.dependencies': 'package_dot_dependencies',
+     'package_dependencies': 'package_uscore_dependencies'}
+tools = importr('tools', robject_translations = d)
+```
+
+# Pitfalls when using Python and R
+**Integers**
+
+```r 
+library(reticulate)
+bi <- reticulate::import_builtins()
+
+bi$list(bi$range(0, 5))
+# TypeError: 'float' object cannot be interpreted as an integer
+```
+
+```r 
+library(reticulate)
+bi <- reticulate::import_builtins()
+
+bi$list(bi$range(0L, 5L))
+# [1] 0 1 2 3 4
+```
+
+# Rpy2: basics
+- Accessing R from Python
+  - `rpy2.rinterface`, the low-level interface
+  - `rpy2.robjects`, the high-level interface
+
+```python
+import rpy2
+import rpy2.robjects as robjects
+
+vector = robjects.IntVector([1,2,3])
+rsum = robjects.r['sum']
+
+rsum(vector)
+```
+
+# Rpy2: basics
+
+```python
+str_vector = robjects.StrVector(['abc', 'def', 'ghi'])
+flt_vector = robjects.FloatVector([0.3, 0.8, 0.7])
+int_vector = robjects.IntVector([1, 2, 3])
+mtx = robjects.r.matrix(robjects.IntVector(range(10)), nrow=5)
+```
+
+# Rpy2: numpy
+
+```python
+import numpy as np
+
+from rpy2.robjects import numpy2ri
+from rpy2.robjects import default_converter
+
+rd_m = np.random.random((10, 7))
+
+with (default_converter + numpy2ri.converter).context():
+    mtx2 = robjects.r.matrix(rd_m, nrow = 10)
+```
+
+# Rpy2: pandas
+```python
+import pandas as pd
+
+from rpy2.robjects import pandas2ri
+
+pd_df = pd.DataFrame({'int_values': [1,2,3],
+                      'str_values': ['abc', 'def', 'ghi']})
+
+with (default_converter + pandas2ri.converter).context():
+    pd_df_r = robjects.DataFrame(pd_df)
+```
 
-Example: code block
+# Rpy2: sparse matrices
 
-`rpy2.rinterface`, the low-level interface
-`rpy2.robjects`, the high-level interface
+```python
+import scipy as sp
 
-Example for calling R functions
+from anndata2ri import scipy2ri
 
-Example for conversion of arrays
+sparse_matrix = sp.sparse.csc_matrix(rd_m)
 
-## rpy2
-Conversion:
-numpy & pandas
+with (default_converter + scipy2ri.converter).context():
+    sp_r = scipy2ri.py2rpy(sparse_matrix)
+```
 
-Example: code block
+# Rpy2: anndata
 
-sparse matrices: anndata2ri
+```python
+import anndata as ad
+import scanpy.datasets as scd
 
-## rpy2
+import anndata2ri
 
-Jupyter(like) notebooks:
-make use of the Magic command interface
+adata_paul = scd.paul15()
 
-`%load_ext rmagic`
-`%R -i input -o output`
+with anndata2ri.converter.context():
+    sce = anndata2ri.py2rpy(adata_paul)
+    ad2 = anndata2ri.rpy2py(sce)
+```
 
-example
+# Rpy2: interactivity
 
-## rpy2
+```python
+%load_ext rpy2.ipython  # line magic that loads the rpy2 ipython extension.
+                        # this extension allows the use of the following cell magic
 
-1. let your method be run with matrices and arrays as input
-2. anndata2ri
-?
+%%R -i input -o output  # this line allows to specify inputs 
+                        # (which will be converted to R objects) and outputs 
+                        # (which will be converted back to Python objects) 
+                        # this line is put at the start of a cell
+                        # the rest of the cell will be run as R code
+```
 
-## reticulate
+# Reticulate
 
 # Disk-based interoperability
Original file line number	Diff line number	Diff line change
Expand Up		@@ -9,5 +9,5 @@ One language will act as the main language, and you will intereact with the othe

		![A schematic overview](images/imm_overview.png){#fig-im-overview}

		When evaluating R code within a Python program, we will make use of rpy2 to accomplish this. When evaluating Python code within an R program, we will make use of reticulate.
		When evaluating R code within a Python program, we will make use of `Rpy2` to accomplish this. When evaluating Python code within an R program, we will make use of `Reticulate`.