Skip to content

Commit

Permalink
finetuning, pitfall figure and code example, umap plot
Browse files Browse the repository at this point in the history
  • Loading branch information
LouiseDck committed Sep 10, 2024
1 parent c44dee7 commit 4f41ce6
Show file tree
Hide file tree
Showing 6 changed files with 6,782 additions and 11,571 deletions.
Binary file added book/in_memory/images/indexing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 23 additions & 2 deletions book/in_memory/pitfalls.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,12 @@ If you notice something amiss with your matrices, check whether you need to tran
## Indexing: 0-based or 1-based
Take care to remember that arrays and matrices in Python are indexed starting from 0 (as in, index 0 refers to the first element), while R uses 1-based indexing, where index 1 refers to the first element.

![0-based vs 1-based indexing](images/indexing.png){#fig-indexing}

## Dots in variable names
In R it is very common to use dots in symbols and variable names. This is invalid in Python: dots are used for function calls.

When using rpy2, these dots are usually translated to underscores `_`. If this does not happen automatically, you can specify mappings for these symbols.
When using rpy2, these dots are usually translated to underscores `_`. If this automatic translation can result in errors, this does not happen automatically. In this case, you can specify mappings for these symbols.

```{python rpy2_mapping}
from rpy2.robjects.packages import importr
Expand All @@ -36,7 +38,7 @@ tools = importr('tools', robject_translations = d)
## Integers and floating point numbers
Unless you explicitely specify, any number is represented as a floating point number in R. By adding a `L` at the end of the number, you specify that it is an integer.

Python can be more strict about using integers or floating point numbers than R.
Python is usually more strict about using integers or floating point numbers than R.

```{r int_example}
float_ex <- 12
Expand All @@ -45,3 +47,22 @@ int_ex <- 12L
is.integer(float_ex)
is.integer(int_ex)
```

This can often lead to errors when using `reticulate`! If you're calling a Python function and provide it with just a number in R, it probably won't be recognised as an integer, leading to weird errors:

```{r float_integer_error}
library(reticulate)
bi <- reticulate::import_builtins()
bi$list(bi$range(0, 5))
```

As you can see, you get errors: `TypeError: 'float' object cannot be interpreted as an integer`.

This is easily fixed by specifiying integers:

```{r float_integer_right}
bi$list(bi$range(0L, 5L))
```


95 changes: 9 additions & 86 deletions book/in_memory/reticulate.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,11 @@ Jupyter notebooks (and some other notebooks) make this possible from the Python
%load_ext rpy2.ipython # line magic that loads the rpy2 ipython extension.
# this extension allows the use of the following cell magic
%%R -i input -o output # this line allows to specify inputs (which will be converted to R objects) and outputs (which will be converted back to Python objects)
%%R -i input -o output # this line allows to specify inputs
# (which will be converted to R objects) and outputs
# (which will be converted back to Python objects)
# this line is put at the start of a cell
# the rest of the cell will be able to be ran as R code
# the rest of the cell will be run as R code
```

Expand Down Expand Up @@ -100,90 +102,11 @@ sc$tl$umap(adata)
adata
```

```{r scanpy_plot}
sc$pl$umap(adata)
```


## Usecase: theoretical reticulate example

```{r read_in}
library(anndata)
adata_path <- "../usecase/data/sc_counts_subset.h5ad"
adata <- anndata::read_h5ad(adata_path)
```

Subset to a single small molecule and control for computational efficiency:

```{r select_sm_celltype}
library(dplyr)
sm_name <- "Belinostat"
control_name <- "Dimethyl Sulfoxide"
# subset obs
adata <- adata[adata$obs$sm_name %in% c(control_name, sm_name), adata$var$highly_variable]
```

```{r import_pandas}
library(reticulate)
pd <- import("pandas", convert = FALSE)
counts <- as.matrix(adata$X)
```

Combine data in a single data frame and compute pseudobulk

This is a literal translation of the Python code. It is however absolute madness to construct a pandas dataframe in R instead of using just an R dataframe. We basically just needed a matrix with rownames and columnames.

We provide the rest of the code as a theoretical example, but please reflect if you want to try something similar.

We will however showcase a useful application of reticulate.

```{r compute_pseudobulk, eval = FALSE}
combined <- pd$DataFrame(
counts,
index = adata$obs["plate_well_celltype_reannotated"],
columns = adata$var_names
)
cr <- py_to_r(combined)
# we lost the rownames
rownames(cr) <- adata$obs_names
cr["celltype"] <- adata$obs["plate_well_celltype_reannotated"]
pb_X <- group_by(cr, celltype) %>% summarise(across(where(is.numeric), sum))
```

```{r pb_obs_r, eval = FALSE}
pb_obs <- adata$obs[c("sm_name", "cell_type", "plate_name", "well", "plate_well_celltype_reannotated")]
pb_obs <- pb_obs[!duplicated(pb_obs), ]
```

```{python pb_obs_py, eval = FALSE}
pb_obs = adata.obs[["sm_name", "cell_type", "plate_name", "well"]].copy()
pb_obs.index = adata.obs["plate_well_celltype_reannotated"]
pb_obs = pb_obs.drop_duplicates()
```

```{r pb_anndata, eval = FALSE}
select_X <- pb_X[pb_X$celltype %in% pb_obs$plate_well_celltype_reannotated, ]
select_X <- select_X %>% select(-"celltype")
pb_adata <- anndata::AnnData(
X = select_X,
obs = pb_obs,
var = adata$var
)
```

<!-- note: don't remove the `eval=FALSE` for this one, as not to overwrite the usecase data -->
```{r store_pseudobulk, eval=FALSE}
write_h5ad(pb_adata, "../usecase/data/pseudobulk.h5ad")
We can't easily show the result of the plot in this Quarto notebook, but we can save it and show it:

```{r scanpy_plot}
path <- "umap.png"
sc$pl$umap(adata, color="leiden_res1", save=path)
```

## Usecase: useful reticulate example

![UMAP plot of the adata object](../../figuresumapumap.png){#fig-umap.png})
Binary file added figures/umapumap.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 4f41ce6

Please sign in to comment.