Skip to content

Commit

Permalink
Deploying to gh-pages from @ d880a17 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
berombau committed Sep 8, 2024
1 parent 57b09b8 commit 74aa0a4
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 29 deletions.
15 changes: 7 additions & 8 deletions book/in_memory_interoperability.html
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ <h1 class="title"><span class="chapter-number">3</span>&nbsp; <span class="chapt
</header>


<p>One aproach to interoperability is to work on in-memory representations of one object, and convert these in memory between different programming languages. This does not require you to write out your datasets and read them in in the different programming enivronment, but it does require you to set up an environment in both languages, which can be cumbersome. One language will act as the main language, and you will intereact with the other language using an FFI (foreign function interface). When evaluating R code within a Python program, we will make use of rpy2 to accomplish this. When evaluating Python code within an R program, we will make use of reticulate.</p>
<p>One aproach to interoperability is to work on in-memory representations of one object, and convert these in memory between different programming languages. This does not require you to write out your datasets and read them in in the different programming environment, but it does require you to set up an environment in both languages, which can be cumbersome. One language will act as the main language, and you will interact with the other language using an FFI (foreign function interface). When evaluating R code within a Python program, we will make use of rpy2 to accomplish this. When evaluating Python code within an R program, we will make use of reticulate.</p>
<section id="rpy2-basic-functionality" class="level2" data-number="3.1">
<h2 data-number="3.1" class="anchored" data-anchor-id="rpy2-basic-functionality"><span class="header-section-number">3.1</span> Rpy2: basic functionality</h2>
<p>Rpy2 is a foreign function interface to R. It can be used in the following way:</p>
Expand Down Expand Up @@ -318,18 +318,17 @@ <h2 data-number="3.1" class="anchored" data-anchor-id="rpy2-basic-functionality"
<div class="cell-output cell-output-stdout">
<pre><code>
0%| | 0.00/9.82M [00:00&lt;?, ?B/s]
0%| | 8.00k/9.82M [00:00&lt;02:10, 79.0kB/s]
0%| | 8.00k/9.82M [00:00&lt;02:10, 78.8kB/s]
0%| | 32.0k/9.82M [00:00&lt;01:01, 167kB/s]
1%| | 96.0k/9.82M [00:00&lt;00:27, 367kB/s]
2%|1 | 200k/9.82M [00:00&lt;00:16, 607kB/s]
2%|1 | 200k/9.82M [00:00&lt;00:16, 609kB/s]
4%|4 | 408k/9.82M [00:00&lt;00:09, 1.09MB/s]
8%|8 | 840k/9.82M [00:00&lt;00:04, 2.10MB/s]
17%|#6 | 1.66M/9.82M [00:00&lt;00:02, 4.04MB/s]
26%|##5 | 2.54M/9.82M [00:00&lt;00:01, 5.02MB/s]
56%|#####5 | 5.45M/9.82M [00:01&lt;00:00, 11.7MB/s]
73%|#######2 | 7.16M/9.82M [00:01&lt;00:00, 13.1MB/s]
91%|#########1| 8.98M/9.82M [00:01&lt;00:00, 14.4MB/s]
100%|##########| 9.82M/9.82M [00:01&lt;00:00, 8.26MB/s]</code></pre>
34%|###3 | 3.33M/9.82M [00:00&lt;00:00, 7.88MB/s]
53%|#####3 | 5.21M/9.82M [00:00&lt;00:00, 10.4MB/s]
83%|########3 | 8.16M/9.82M [00:01&lt;00:00, 15.4MB/s]
100%|##########| 9.82M/9.82M [00:01&lt;00:00, 8.71MB/s]</code></pre>
</div>
<div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> anndata2ri.converter.context():</span>
Expand Down
12 changes: 6 additions & 6 deletions book/introduction.html
Original file line number Diff line number Diff line change
Expand Up @@ -188,9 +188,9 @@ <h2 id="toc-title">Table of contents</h2>

<ul>
<li><a href="#code-porting" id="toc-code-porting" class="nav-link active" data-scroll-target="#code-porting"><span class="header-section-number">1.1</span> Code porting</a></li>
<li><a href="#in-memory-interoperability" id="toc-in-memory-interoperability" class="nav-link" data-scroll-target="#in-memory-interoperability"><span class="header-section-number">1.2</span> In-memory Interoperability</a></li>
<li><a href="#disk-based-interoperability" id="toc-disk-based-interoperability" class="nav-link" data-scroll-target="#disk-based-interoperability"><span class="header-section-number">1.3</span> Disk-based Interoperability</a></li>
<li><a href="#workflow-frameworks" id="toc-workflow-frameworks" class="nav-link" data-scroll-target="#workflow-frameworks"><span class="header-section-number">1.4</span> Workflow Frameworks</a></li>
<li><a href="#in-memory-interoperability" id="toc-in-memory-interoperability" class="nav-link" data-scroll-target="#in-memory-interoperability"><span class="header-section-number">1.2</span> In-memory interoperability</a></li>
<li><a href="#disk-based-interoperability" id="toc-disk-based-interoperability" class="nav-link" data-scroll-target="#disk-based-interoperability"><span class="header-section-number">1.3</span> Disk-based interoperability</a></li>
<li><a href="#workflow-frameworks" id="toc-workflow-frameworks" class="nav-link" data-scroll-target="#workflow-frameworks"><span class="header-section-number">1.4</span> Workflow frameworks</a></li>
</ul>
<div class="toc-actions"><ul><li><a href="https://github.com/saeyslab/polygloty/edit/main/book/introduction.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li><li><a href="https://github.com/saeyslab/polygloty/issues/new" class="toc-action"><i class="bi empty"></i>Report an issue</a></li><li><a href="https://github.com/saeyslab/polygloty/blob/main/book/introduction.qmd" class="toc-action"><i class="bi empty"></i>View source</a></li></ul></div></nav>
</div>
Expand Down Expand Up @@ -225,15 +225,15 @@ <h2 data-number="1.1" class="anchored" data-anchor-id="code-porting"><span class
<p>Furthermore, work is not done after the initial port – in order for the researcher’s work to be useful to others, the ported code must be maintained and kept up-to-date with the original implementation. For this reason, we don’t consider reimplementation a viable option for most use-cases and will not discuss it further in this book.</p>
</section>
<section id="in-memory-interoperability" class="level2" data-number="1.2">
<h2 data-number="1.2" class="anchored" data-anchor-id="in-memory-interoperability"><span class="header-section-number">1.2</span> In-memory Interoperability</h2>
<h2 data-number="1.2" class="anchored" data-anchor-id="in-memory-interoperability"><span class="header-section-number">1.2</span> In-memory interoperability</h2>
<p>Tools like rpy2 and reticulate allow for direct communication between languages within a single analysis session. This approach provides flexibility and avoids intermediate file I/O, but can introduce complexity in managing dependencies and environments.</p>
</section>
<section id="disk-based-interoperability" class="level2" data-number="1.3">
<h2 data-number="1.3" class="anchored" data-anchor-id="disk-based-interoperability"><span class="header-section-number">1.3</span> Disk-based Interoperability</h2>
<h2 data-number="1.3" class="anchored" data-anchor-id="disk-based-interoperability"><span class="header-section-number">1.3</span> Disk-based interoperability</h2>
<p>Storing intermediate results to disk in standardized, language-agnostic file formats (e.g., HDF5, Parquet) allows for sequential execution of scripts written in different languages. This approach is relatively simple but can lead to increased storage requirements and I/O overhead.</p>
</section>
<section id="workflow-frameworks" class="level2" data-number="1.4">
<h2 data-number="1.4" class="anchored" data-anchor-id="workflow-frameworks"><span class="header-section-number">1.4</span> Workflow Frameworks</h2>
<h2 data-number="1.4" class="anchored" data-anchor-id="workflow-frameworks"><span class="header-section-number">1.4</span> Workflow frameworks</h2>
<p>Workflow management systems (e.g., Nextflow, Snakemake) provide a structured approach to orchestrate complex, multi-language pipelines, enhancing reproducibility and automation. However, they may require a learning curve and additional configuration.</p>


Expand Down
14 changes: 7 additions & 7 deletions search.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@
"objectID": "book/introduction.html#in-memory-interoperability",
"href": "book/introduction.html#in-memory-interoperability",
"title": "1  Introduction",
"section": "1.2 In-memory Interoperability",
"text": "1.2 In-memory Interoperability\nTools like rpy2 and reticulate allow for direct communication between languages within a single analysis session. This approach provides flexibility and avoids intermediate file I/O, but can introduce complexity in managing dependencies and environments.",
"section": "1.2 In-memory interoperability",
"text": "1.2 In-memory interoperability\nTools like rpy2 and reticulate allow for direct communication between languages within a single analysis session. This approach provides flexibility and avoids intermediate file I/O, but can introduce complexity in managing dependencies and environments.",
"crumbs": [
"<span class='chapter-number'>1</span>  <span class='chapter-title'>Introduction</span>"
]
Expand All @@ -33,8 +33,8 @@
"objectID": "book/introduction.html#disk-based-interoperability",
"href": "book/introduction.html#disk-based-interoperability",
"title": "1  Introduction",
"section": "1.3 Disk-based Interoperability",
"text": "1.3 Disk-based Interoperability\nStoring intermediate results to disk in standardized, language-agnostic file formats (e.g., HDF5, Parquet) allows for sequential execution of scripts written in different languages. This approach is relatively simple but can lead to increased storage requirements and I/O overhead.",
"section": "1.3 Disk-based interoperability",
"text": "1.3 Disk-based interoperability\nStoring intermediate results to disk in standardized, language-agnostic file formats (e.g., HDF5, Parquet) allows for sequential execution of scripts written in different languages. This approach is relatively simple but can lead to increased storage requirements and I/O overhead.",
"crumbs": [
"<span class='chapter-number'>1</span>  <span class='chapter-title'>Introduction</span>"
]
Expand All @@ -43,8 +43,8 @@
"objectID": "book/introduction.html#workflow-frameworks",
"href": "book/introduction.html#workflow-frameworks",
"title": "1  Introduction",
"section": "1.4 Workflow Frameworks",
"text": "1.4 Workflow Frameworks\nWorkflow management systems (e.g., Nextflow, Snakemake) provide a structured approach to orchestrate complex, multi-language pipelines, enhancing reproducibility and automation. However, they may require a learning curve and additional configuration.\n\n\n\n\nHeumos, Lukas, Anna C. Schaar, Christopher Lance, Anastasia Litinetskaya, Felix Drost, Luke Zappia, Malte D. Lücken, et al. 2023. “Best Practices for Single-Cell Analysis Across Modalities.” Nature Reviews Genetics 24 (8): 550–72. https://doi.org/10.1038/s41576-023-00586-w.\n\n\nZappia, Luke, and Fabian J. Theis. 2021. “Over 1000 Tools Reveal Trends in the Single-Cell RNA-Seq Analysis Landscape.” Genome Biology 22 (1). https://doi.org/10.1186/s13059-021-02519-4.",
"section": "1.4 Workflow frameworks",
"text": "1.4 Workflow frameworks\nWorkflow management systems (e.g., Nextflow, Snakemake) provide a structured approach to orchestrate complex, multi-language pipelines, enhancing reproducibility and automation. However, they may require a learning curve and additional configuration.\n\n\n\n\nHeumos, Lukas, Anna C. Schaar, Christopher Lance, Anastasia Litinetskaya, Felix Drost, Luke Zappia, Malte D. Lücken, et al. 2023. “Best Practices for Single-Cell Analysis Across Modalities.” Nature Reviews Genetics 24 (8): 550–72. https://doi.org/10.1038/s41576-023-00586-w.\n\n\nZappia, Luke, and Fabian J. Theis. 2021. “Over 1000 Tools Reveal Trends in the Single-Cell RNA-Seq Analysis Landscape.” Genome Biology 22 (1). https://doi.org/10.1186/s13059-021-02519-4.",
"crumbs": [
"<span class='chapter-number'>1</span>  <span class='chapter-title'>Introduction</span>"
]
Expand Down Expand Up @@ -104,7 +104,7 @@
"href": "book/in_memory_interoperability.html",
"title": "3  In-memory interoperability",
"section": "",
"text": "3.1 Rpy2: basic functionality\nRpy2 is a foreign function interface to R. It can be used in the following way:\nimport rpy2\nimport rpy2.robjects as robjects\n\n/home/runner/work/polygloty/polygloty/renv/python/virtualenvs/renv-python-3.12/lib/python3.12/site-packages/rpy2/rinterface_lib/embedded.py:276: UserWarning: R was initialized outside of rpy2 (R_NilValue != NULL). Trying to use it nevertheless.\n warnings.warn(msg)\nR was initialized outside of rpy2 (R_NilValue != NULL). Trying to use it nevertheless.\n\nvector = robjects.IntVector([1,2,3])\nrsum = robjects.r['sum']\n\nrsum(vector)\n\n\n IntVector with 1 elements.\n \n\n\n\n6\nLuckily, we’re not restricted to just calling R functions and creating R objects. The real power of this in-memory interoperability lies in the conversion of Python objects to R objects to call R functions on, and then to the conversion of the results back to Python objects.\nRpy2 requires specific conversion rules for different Python objects. It is straightforward to create R vectors from corresponding Python lists:\nstr_vector = robjects.StrVector(['abc', 'def', 'ghi'])\nflt_vector = robjects.FloatVector([0.3, 0.8, 0.7])\nint_vector = robjects.IntVector([1, 2, 3])\nmtx = robjects.r.matrix(robjects.IntVector(range(10)), nrow=5)\nHowever, for single cell biology, the objects that are most interesting to convert are (count) matrices, arrays and dataframes. In order to do this, you need to import the corresponding rpy2 modules and specify the conversion context.\nimport numpy as np\n\nfrom rpy2.robjects import numpy2ri\nfrom rpy2.robjects import default_converter\n\nrd_m = np.random.random((10, 7))\n\nwith (default_converter + numpy2ri.converter).context():\n mtx2 = robjects.r.matrix(rd_m, nrow = 10)\nimport pandas as pd\n\nfrom rpy2.robjects import pandas2ri\n\npd_df = pd.DataFrame({'int_values': [1,2,3],\n 'str_values': ['abc', 'def', 'ghi']})\n\nwith (default_converter + pandas2ri.converter).context():\n pd_df_r = robjects.DataFrame(pd_df)\nOne big limitation of rpy2 is the inability to convert sparse matrices: there is no built-in conversion module for scipy. The anndata2ri package provides, apart from functionality to convert SingleCellExperiment objects to an anndata objects, functions to convert sparse matrices.\nTODO: how to subscript sparse matrix? Is it possible?\nimport scipy as sp\n\nfrom anndata2ri import scipy2ri\n\nsparse_matrix = sp.sparse.csc_matrix(rd_m)\n\nwith (default_converter + scipy2ri.converter).context():\n sp_r = scipy2ri.py2rpy(sparse_matrix)\nWe will showcase how to use anndata2ri to convert an anndata object to a SingleCellExperiment object and vice versa as well:\nimport anndata as ad\nimport scanpy.datasets as scd\n\nimport anndata2ri\n\nadata_paul = scd.paul15()\n\n\n 0%| | 0.00/9.82M [00:00&lt;?, ?B/s]\n 0%| | 8.00k/9.82M [00:00&lt;02:10, 79.0kB/s]\n 0%| | 32.0k/9.82M [00:00&lt;01:01, 167kB/s] \n 1%| | 96.0k/9.82M [00:00&lt;00:27, 367kB/s]\n 2%|1 | 200k/9.82M [00:00&lt;00:16, 607kB/s] \n 4%|4 | 408k/9.82M [00:00&lt;00:09, 1.09MB/s]\n 8%|8 | 840k/9.82M [00:00&lt;00:04, 2.10MB/s]\n 17%|#6 | 1.66M/9.82M [00:00&lt;00:02, 4.04MB/s]\n 26%|##5 | 2.54M/9.82M [00:00&lt;00:01, 5.02MB/s]\n 56%|#####5 | 5.45M/9.82M [00:01&lt;00:00, 11.7MB/s]\n 73%|#######2 | 7.16M/9.82M [00:01&lt;00:00, 13.1MB/s]\n 91%|#########1| 8.98M/9.82M [00:01&lt;00:00, 14.4MB/s]\n100%|##########| 9.82M/9.82M [00:01&lt;00:00, 8.26MB/s]\n\n\nwith anndata2ri.converter.context():\n sce = anndata2ri.py2rpy(adata_paul)\n ad2 = anndata2ri.rpy2py(sce)",
"text": "3.1 Rpy2: basic functionality\nRpy2 is a foreign function interface to R. It can be used in the following way:\nimport rpy2\nimport rpy2.robjects as robjects\n\n/home/runner/work/polygloty/polygloty/renv/python/virtualenvs/renv-python-3.12/lib/python3.12/site-packages/rpy2/rinterface_lib/embedded.py:276: UserWarning: R was initialized outside of rpy2 (R_NilValue != NULL). Trying to use it nevertheless.\n warnings.warn(msg)\nR was initialized outside of rpy2 (R_NilValue != NULL). Trying to use it nevertheless.\n\nvector = robjects.IntVector([1,2,3])\nrsum = robjects.r['sum']\n\nrsum(vector)\n\n\n IntVector with 1 elements.\n \n\n\n\n6\nLuckily, we’re not restricted to just calling R functions and creating R objects. The real power of this in-memory interoperability lies in the conversion of Python objects to R objects to call R functions on, and then to the conversion of the results back to Python objects.\nRpy2 requires specific conversion rules for different Python objects. It is straightforward to create R vectors from corresponding Python lists:\nstr_vector = robjects.StrVector(['abc', 'def', 'ghi'])\nflt_vector = robjects.FloatVector([0.3, 0.8, 0.7])\nint_vector = robjects.IntVector([1, 2, 3])\nmtx = robjects.r.matrix(robjects.IntVector(range(10)), nrow=5)\nHowever, for single cell biology, the objects that are most interesting to convert are (count) matrices, arrays and dataframes. In order to do this, you need to import the corresponding rpy2 modules and specify the conversion context.\nimport numpy as np\n\nfrom rpy2.robjects import numpy2ri\nfrom rpy2.robjects import default_converter\n\nrd_m = np.random.random((10, 7))\n\nwith (default_converter + numpy2ri.converter).context():\n mtx2 = robjects.r.matrix(rd_m, nrow = 10)\nimport pandas as pd\n\nfrom rpy2.robjects import pandas2ri\n\npd_df = pd.DataFrame({'int_values': [1,2,3],\n 'str_values': ['abc', 'def', 'ghi']})\n\nwith (default_converter + pandas2ri.converter).context():\n pd_df_r = robjects.DataFrame(pd_df)\nOne big limitation of rpy2 is the inability to convert sparse matrices: there is no built-in conversion module for scipy. The anndata2ri package provides, apart from functionality to convert SingleCellExperiment objects to an anndata objects, functions to convert sparse matrices.\nTODO: how to subscript sparse matrix? Is it possible?\nimport scipy as sp\n\nfrom anndata2ri import scipy2ri\n\nsparse_matrix = sp.sparse.csc_matrix(rd_m)\n\nwith (default_converter + scipy2ri.converter).context():\n sp_r = scipy2ri.py2rpy(sparse_matrix)\nWe will showcase how to use anndata2ri to convert an anndata object to a SingleCellExperiment object and vice versa as well:\nimport anndata as ad\nimport scanpy.datasets as scd\n\nimport anndata2ri\n\nadata_paul = scd.paul15()\n\n\n 0%| | 0.00/9.82M [00:00&lt;?, ?B/s]\n 0%| | 8.00k/9.82M [00:00&lt;02:10, 78.8kB/s]\n 0%| | 32.0k/9.82M [00:00&lt;01:01, 167kB/s] \n 1%| | 96.0k/9.82M [00:00&lt;00:27, 367kB/s]\n 2%|1 | 200k/9.82M [00:00&lt;00:16, 609kB/s] \n 4%|4 | 408k/9.82M [00:00&lt;00:09, 1.09MB/s]\n 8%|8 | 840k/9.82M [00:00&lt;00:04, 2.10MB/s]\n 17%|#6 | 1.66M/9.82M [00:00&lt;00:02, 4.04MB/s]\n 34%|###3 | 3.33M/9.82M [00:00&lt;00:00, 7.88MB/s]\n 53%|#####3 | 5.21M/9.82M [00:00&lt;00:00, 10.4MB/s]\n 83%|########3 | 8.16M/9.82M [00:01&lt;00:00, 15.4MB/s]\n100%|##########| 9.82M/9.82M [00:01&lt;00:00, 8.71MB/s]\n\n\nwith anndata2ri.converter.context():\n sce = anndata2ri.py2rpy(adata_paul)\n ad2 = anndata2ri.rpy2py(sce)",
"crumbs": [
"<span class='chapter-number'>3</span>  <span class='chapter-title'>In-memory interoperability</span>"
]
Expand Down
Loading

0 comments on commit 74aa0a4

Please sign in to comment.