finetuning, pitfall figure and code example, umap plot

saeyslab · Sep 10, 2024 · 4f41ce6 · 4f41ce6
1 parent c44dee7
commit 4f41ce6
Show file tree

Hide file tree

Showing 6 changed files with 6,782 additions and 11,571 deletions.
diff --git a/book/in_memory/images/indexing.png b/book/in_memory/images/indexing.png
diff --git a/book/in_memory/pitfalls.qmd b/book/in_memory/pitfalls.qmd
@@ -20,10 +20,12 @@ If you notice something amiss with your matrices, check whether you need to tran
 ## Indexing: 0-based or 1-based
 Take care to remember that arrays and matrices in Python are indexed starting from 0 (as in, index 0 refers to the first element), while R uses 1-based indexing, where index 1 refers to the first element.
 
+![0-based vs 1-based indexing](images/indexing.png){#fig-indexing}
+
 ## Dots in variable names
 In R it is very common to use dots in symbols and variable names. This is invalid in Python: dots are used for function calls.
 
-When using rpy2, these dots are usually translated to underscores `_`. If this does not happen automatically, you can specify mappings for these symbols.
+When using rpy2, these dots are usually translated to underscores `_`. If this automatic translation can result in errors, this does not happen automatically. In this case, you can specify mappings for these symbols.
 
 ```{python rpy2_mapping}
 from rpy2.robjects.packages import importr
@@ -36,7 +38,7 @@ tools = importr('tools', robject_translations = d)
 ## Integers and floating point numbers
 Unless you explicitely specify, any number is represented as a floating point number in R. By adding a `L` at the end of the number, you specify that it is an integer.
 
-Python can be more strict about using integers or floating point numbers than R.
+Python is usually more strict about using integers or floating point numbers than R.
 
 ```{r int_example}
 float_ex <- 12
@@ -45,3 +47,22 @@ int_ex <- 12L
 is.integer(float_ex)
 is.integer(int_ex)
 ```
+
+This can often lead to errors when using `reticulate`! If you're calling a Python function and provide it with just a number in R, it probably won't be recognised as an integer, leading to weird errors:
+
+```{r float_integer_error}
+library(reticulate)
+bi <- reticulate::import_builtins()
+
+bi$list(bi$range(0, 5))
+```
+
+As you can see, you get errors: `TypeError: 'float' object cannot be interpreted as an integer`.
+
+This is easily fixed by specifiying integers:
+
+```{r float_integer_right}
+bi$list(bi$range(0L, 5L))
+```
+
+
diff --git a/book/in_memory/reticulate.qmd b/book/in_memory/reticulate.qmd
@@ -65,9 +65,11 @@ Jupyter notebooks (and some other notebooks) make this possible from the Python
 %load_ext rpy2.ipython  # line magic that loads the rpy2 ipython extension.
                         # this extension allows the use of the following cell magic
 
-%%R -i input -o output  # this line allows to specify inputs (which will be converted to R objects) and outputs (which will be converted back to Python objects) 
+%%R -i input -o output  # this line allows to specify inputs 
+                        # (which will be converted to R objects) and outputs 
+                        # (which will be converted back to Python objects) 
                         # this line is put at the start of a cell
-                        # the rest of the cell will be able to be ran as R code
+                        # the rest of the cell will be run as R code
 
 ```
 
@@ -100,90 +102,11 @@ sc$tl$umap(adata)
 adata
 ```
 
-```{r scanpy_plot}
-sc$pl$umap(adata)
-```
-
-
-## Usecase: theoretical reticulate example
-
-```{r read_in}
-library(anndata)
-
-adata_path <- "../usecase/data/sc_counts_subset.h5ad"
-adata <- anndata::read_h5ad(adata_path)
-```
-
-Subset to a single small molecule and control for computational efficiency:
-
-```{r select_sm_celltype}
-library(dplyr)
-
-sm_name <- "Belinostat"
-control_name <- "Dimethyl Sulfoxide"
-
-# subset obs
-adata <- adata[adata$obs$sm_name %in% c(control_name, sm_name), adata$var$highly_variable]
-```
-
-```{r import_pandas}
-library(reticulate)
-pd <- import("pandas", convert = FALSE)
-
-counts <- as.matrix(adata$X)
-```
-
-Combine data in a single data frame and compute pseudobulk
-
-This is a literal translation of the Python code. It is however absolute madness to construct a pandas dataframe in R instead of using just an R dataframe. We basically just needed a matrix with rownames and columnames.
-
-We provide the rest of the code as a theoretical example, but please reflect if you want to try something similar.
-
-We will however showcase a useful application of reticulate.
-
-```{r compute_pseudobulk, eval = FALSE}
-
-combined <- pd$DataFrame(
-  counts,
-  index = adata$obs["plate_well_celltype_reannotated"],
-  columns = adata$var_names
-)
-cr <- py_to_r(combined)
-
-# we lost the rownames
-rownames(cr) <- adata$obs_names
-cr["celltype"] <- adata$obs["plate_well_celltype_reannotated"]
-
-pb_X <- group_by(cr, celltype) %>% summarise(across(where(is.numeric), sum))
-```
-
-```{r pb_obs_r, eval = FALSE}
-pb_obs <- adata$obs[c("sm_name", "cell_type", "plate_name", "well", "plate_well_celltype_reannotated")]
-pb_obs <- pb_obs[!duplicated(pb_obs), ]
-```
-
-```{python pb_obs_py, eval = FALSE}
-pb_obs = adata.obs[["sm_name", "cell_type", "plate_name", "well"]].copy()
-pb_obs.index = adata.obs["plate_well_celltype_reannotated"]
-pb_obs = pb_obs.drop_duplicates()
-```
-
-```{r pb_anndata, eval = FALSE}
-select_X <- pb_X[pb_X$celltype %in% pb_obs$plate_well_celltype_reannotated, ]
-select_X <- select_X %>% select(-"celltype")
-
-pb_adata <- anndata::AnnData(
-  X = select_X,
-  obs = pb_obs,
-  var = adata$var
-)
-```
-
-<!-- note: don't remove the `eval=FALSE` for this one, as not to overwrite the usecase data -->
-```{r store_pseudobulk, eval=FALSE}
-write_h5ad(pb_adata, "../usecase/data/pseudobulk.h5ad")
+We can't easily show the result of the plot in this Quarto notebook, but we can save it and show it:
 
+```{r scanpy_plot}
+path <- "umap.png"
+sc$pl$umap(adata, color="leiden_res1", save=path)
 ```
 
-## Usecase: useful reticulate example
-
+![UMAP plot of the adata object](../../figuresumapumap.png){#fig-umap.png})
diff --git a/figures/umapumap.png b/figures/umapumap.png