add_gpu_usage

imsb-uke · Jan 25, 2024 · 86c6357 · 86c6357
1 parent b3d39c0
commit 86c6357
Show file tree

Hide file tree

Showing 14 changed files with 12,395 additions and 11,866 deletions.
diff --git a/README.md b/README.md
@@ -12,11 +12,11 @@ conda >= v22 through [Anaconda](https://docs.anaconda.com/free/anaconda/install/
 
 
 ## Installing
-```python
+```shell
 ## Installation
 
 # Create and activate virtual environment. This is recommended to avoid conflict in dependencies.
-conda create -y -n dissect python=3.8
+conda create -y -n dissect python=3.9
 conda activate dissect
 
 # Clone DISSECT
@@ -37,11 +37,30 @@ cd DISSECT/tutorials
 ## Launch jupyter lab
 jupyter notebook
 
+```
+## GPU usage
+By default, tensorflow-gpu which is installed while installed DISSECT works as long as appropriate CUDA driver is installed. DISSECT uses tensorflow-gpu version 2.7.0 with CUDA 11.2 and cuDNN 8.1. The available devices to tensorflow can be checked as below.
+
+```python
+import tensorflow as tf
+gpus = tf.config.list_physical_devices("GPU")
+print(gpus)
+
+```
+This will output a list of the available GPU devices as the output below where we have 1 GPU available. 
+```
+[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] 
+```
+In case there are multiple GPUs available, a particular GPU can be set by,
+
+```python
+gpu_number = 0 # Using only the first GPU
+tf.config.experimental.set_visible_devices(gpus[gpu_number], 'GPU')
 ```
 
 ## Tutorials
 Interactive tutorials including required data are available as part of this repository at [Tutorials](https://github.com/imsb-uke/DISSECT/tree/main/tutorials).
-1. expanded_tutorial.ipynb: Step by step deconvolution
-2. minimal_tutorial.ipynb: Complete deconvolution from a single configuration file using minimal steps of code
+1. expanded_tutorial.ipynb: Step by step deconvolution for bulk
+2. expanded_tutorial_spatial.ipynb: Step by step deconvolution of spatial transcriptomics data (10x Visium)
 
 To get answers quickly for a problem or feature request, please open an issue.
diff --git a/dissect/PropsSimulator/simulator.c b/dissect/PropsSimulator/simulator.c
diff --git a/dissect/PropsSimulator/simulator.py b/dissect/PropsSimulator/simulator.py
@@ -106,6 +106,8 @@ def generate_props(self):
 
         fig = plt.figure()
         ax = plt.boxplot(self.props_complete, labels=self.celltypes)  #
+        if self.n_celltypes>10:
+            plt.xticks(rotation=45, ha="right")
         plt.ylabel("Proportion")
         plt.title("Proportions of cell-types in generated samples")
         plt.savefig(
@@ -319,7 +321,7 @@ def __init__(self):
         pass
 
     def initialize(self, config):
-        self.config["simulation_params"] = config
+        self.config = config
         self.sc_adata = sc.read(config["simulation_params"]["scdata"])
         if "sparse" in str(type(self.sc_adata.X)):
             self.sc_adata.X = np.array(self.sc_adata.X.todense())
@@ -385,18 +387,22 @@ def generate_props(self):
 
         fig = plt.figure()
         ax = plt.boxplot(self.props_sparse, labels=self.celltypes)  #
+        if self.n_celltypes>10:
+            plt.xticks(rotation=45, ha="right")
         plt.ylabel("Proportion")
         plt.title("Proportions of cell-types in generated samples")
         plt.savefig(
-            os.path.join(self.config.simulation_folder, "boxplot_props_sparse.pdf")
-        )
-        fig = plt.figure()
-        ax = plt.boxplot(self.cells_sparse, labels=self.celltypes)
-        plt.ylabel("Count")
-        plt.title("Counts of cell-types in generated samples")
-        plt.savefig(
-            os.path.join(self.config.simulation_folder, "boxplot_ncells_sparse.pdf")
+            os.path.join(self.simulation_folder, "boxplot_props_sparse.pdf")
         )
+        # fig = plt.figure()
+        # ax = plt.boxplot(self.cells_sparse, labels=self.celltypes)
+        # if self.n_celltypes>10:
+        #     plt.xticks(rotation=45, ha="right")
+        # plt.ylabel("Count")
+        # plt.title("Counts of cell-types in generated samples")
+        # plt.savefig(
+        #     os.path.join(self.simulation_folder, "boxplot_ncells_sparse.pdf")
+        # )
 
         self.props = self.props_sparse
         self.cells = self.cells_sparse
@@ -477,7 +483,7 @@ def simulate(self, save=True):
                 )
             )
 
-            if self.config["generate_component_figures"]:
+            if self.config["simulation_params"]["generate_component_figures"]:
                 idxs = {}
                 celltypes_col = []
                 for j in range(self.n_celltypes):
@@ -536,19 +542,6 @@ def simulate_per_batch(self, save=True):
                 )
             )
 
-            adata.write(os.path.join(self.simulation_folder, "simulated.h5ad"))
-            sc.set_figure_params(dpi=200)
-            tmp = adata.copy()
-            sc.pp.normalize_total(tmp, target_sum=1e6)
-            sc.pp.log1p(tmp)
-            sc.tl.pca(tmp)
-            sc.pl.pca(tmp, color=adata.obs.columns, show=False)
-            plt.savefig(
-                os.path.join(
-                    self.simulation_folder, "scatterplot_pca_simulated.pdf"
-                )
-            )
-
             if self.config["simulation_params"]["generate_component_figures"]:
                 idxs = {}
                 celltypes_col = []
@@ -613,7 +606,8 @@ def simulate(config):
         print("Number of batches in single-cell data is 1. If this is incorrect, please specify name of the batch column as in the single-cell data object (.obs)")
         sim.simulate(save=True)
     sim.config["simulation_params"]["simulation_folder"] = os.path.join(sim.config["experiment_folder"], "simulation")
-    sim.config["simulation_params"]["concentration"] = list(sim.config["simulation_params"]["concentration"])
+    if config["simulation_params"]["type"]=="bulk":
+        sim.config["simulation_params"]["concentration"] = list(sim.config["simulation_params"]["concentration"])
     sim.config["deconv_params"]["reference"] = os.path.join(sim.config["simulation_params"]["simulation_folder"], "simulated.h5ad")
     save_dict_to_file(sim.config)
 
diff --git a/dissect/PropsSimulator/simulator.pyx b/dissect/PropsSimulator/simulator.pyx
@@ -108,16 +108,11 @@ class Simulate(object):
         ax = plt.boxplot(self.props_complete, labels=self.celltypes)  #
         plt.ylabel("Proportion")
         plt.title("Proportions of cell-types in generated samples")
+        if self.n_celltypes>10:
+            plt.xticks(rotation=45, ha="right")
         plt.savefig(
             os.path.join(self.simulation_folder, "boxplot_props_complete.pdf")
         )
-        fig = plt.figure()
-        ax = plt.boxplot(self.cells_complete, labels=self.celltypes)
-        plt.ylabel("Count")
-        plt.title("Counts of cell-types in generated samples")
-        plt.savefig(
-            os.path.join(self.simulation_folder, "boxplot_ncells_complete.pdf")
-        )
 
         self.props = np.concatenate([self.props_complete, self.props_sparse], axis=0)
         self.cells = np.concatenate([self.cells_complete, self.cells_sparse], axis=0)
@@ -304,7 +299,7 @@ class Simulate_st(object):
         pass
 
     def initialize(self, config):
-        self.config["simulation_params"] = config
+        self.config = config
         self.sc_adata = sc.read(config["simulation_params"]["scdata"])
         if "sparse" in str(type(self.sc_adata.X)):
             self.sc_adata.X = np.array(self.sc_adata.X.todense())
@@ -370,19 +365,13 @@ class Simulate_st(object):
 
         fig = plt.figure()
         ax = plt.boxplot(self.props_sparse, labels=self.celltypes)  #
+        if self.n_celltypes>10:
+            plt.xticks(rotation=45, ha="right")
         plt.ylabel("Proportion")
         plt.title("Proportions of cell-types in generated samples")
         plt.savefig(
-            os.path.join(self.config.simulation_folder, "boxplot_props_sparse.pdf")
+            os.path.join(self.simulation_folder, "boxplot_props_sparse.pdf")
         )
-        fig = plt.figure()
-        ax = plt.boxplot(self.cells_sparse, labels=self.celltypes)
-        plt.ylabel("Count")
-        plt.title("Counts of cell-types in generated samples")
-        plt.savefig(
-            os.path.join(self.config.simulation_folder, "boxplot_ncells_sparse.pdf")
-        )
-
         self.props = self.props_sparse
         self.cells = self.cells_sparse
 
@@ -462,7 +451,7 @@ class Simulate_st(object):
                 )
             )
 
-            if self.config["generate_component_figures"]:
+            if self.config["simulation_params"]["generate_component_figures"]:
                 idxs = {}
                 celltypes_col = []
                 for j in range(self.n_celltypes):
@@ -598,7 +587,8 @@ def simulate(config):
         print("Number of batches in single-cell data is 1. If this is incorrect, please specify name of the batch column as in the single-cell data object (.obs)")
         sim.simulate(save=True)
     sim.config["simulation_params"]["simulation_folder"] = os.path.join(sim.config["experiment_folder"], "simulation")
-    sim.config["simulation_params"]["concentration"] = list(sim.config["simulation_params"]["concentration"])
+    if config["simulation_params"]["type"]=="bulk":
+        sim.config["simulation_params"]["concentration"] = list(sim.config["simulation_params"]["concentration"])
     sim.config["deconv_params"]["reference"] = os.path.join(sim.config["simulation_params"]["simulation_folder"], "simulated.h5ad")
     save_dict_to_file(sim.config)
-
+
diff --git a/dissect/configs/config.py b/dissect/configs/config.py
@@ -4,7 +4,7 @@
     "simulation_params": { 
         "scdata": "/home/user/experiment/data.h5ad",  # Path to sc/snRNA-seq data, should be anndata
         "n_samples": None,  # Number of samples to generate. Default: 1000 times the number of celltypes,
-        "type": "bulk",
+        "type": "bulk", # bulk or st to simulate bulk and spatial transcriptomics respectively
         "celltype_col": "celltype",  # Name of the column corresponding to cell-type labels in adata.obs
         "batch_col": None,  # If more than one batches are present, name of the column corrsponding to batch labels in adata.obs
         "cells_per_sample": None,  # Number of cells to sample to generate one sample.
@@ -31,15 +31,15 @@
     "deconv_params": {
         "test_dataset": "../bulk.txt",
         "test_dataset_format": "txt",  # Either tab-delimited txt file with genes in rows or h5ad file compatible with Scanpy.
-        "test_dataset_type": "bulk",  # bulk, microarray or spatial
+        "test_dataset_type": "bulk",  # bulk or microarray. For spatial, set it to bulk as similar training procedure is used.
         "duplicated": "first",  # In case, there are duplicated genes in the test_dataset. To use the first occuring gene, write first. To sum the duplicated genes, write sum. To take average, write mean
         "normalize_simulated": "cpm",  # "cpm", # Only CPM and None is supported. Write CPM if not already TPM/CPM.
         "normalize_test": "cpm",  # Write CPM if not already TPM/CPM
         "var_cutoff": 0.1,  # variance cutoff for gene filtering
         "test_in_mix": None,  # Number of test samples to use in the generation of online mixtures. None uses all samples.
         "simulated": True,  # True if dataset is already simulated. False, if it is a single-cell dataset.
         "sig_matrix": False,
-        "mix": "srm",
+        "mix": "srm", # srm for bulk and spatial data, rrm for microarray data
         "save_config": True,
         "network_params": {
             "n_hidden_layers": 4,  # Number of hidden layers

diff --git a/dissect/prepare_data.py b/dissect/prepare_data.py
@@ -39,6 +39,8 @@ def dataset(config):
         X_real = pd.read_table(config["deconv_params"]["test_dataset"], index_col=0)
     elif config["deconv_params"]["test_dataset_format"] == "h5ad":
         X_real = sc.read(config["deconv_params"]["test_dataset"])
+        if "parse" in str(type(X_real.X)):
+            X_real.X = np.array(X_real.X.todense())
         X_real = pd.DataFrame(
             X_real.X, index=X_real.obs.index.tolist(), columns=X_real.var_names.tolist()
         ).T
@@ -127,9 +129,11 @@ def dataset(config):
     # Simulated if not simulated
     if config["deconv_params"]["simulated"]:
         X_sim = X_sc
+
     else:
         X_sim = simulate(X_sc, config["deconv_params"]["simulation_params"])
-
+    if "parse" in str(type(X_sim.X)):
+        X_sim = np.array(X_sim.X.todense())
     # Normalization
     if config["deconv_params"]["normalize_simulated"] == "cpm":
         sc.pp.normalize_total(X_sim, target_sum=1e6)

diff --git a/docs/index.html b/docs/index.html
@@ -50,7 +50,7 @@ <h2 id="installing">Installing</h2>
 <pre><code>## Installation
 
 # Create and activate virtual environment
-conda create -y -n dissect python=3.8
+conda create -y -n dissect python=3.9
 conda activate dissect
 
 # Clone DISSECT
@@ -68,12 +68,34 @@ <h2 id="installing">Installing</h2>
 # Go to tutorials directory within DISSECT
 cd DISSECT/tutorials
 </code></pre>
+<h2 id="gpu-usage">GPU usage</h2>
+
+<p>By default, tensorflow-gpu which is installed while installed DISSECT works as long as appropriate CUDA driver is installed. DISSECT uses tensorflow-gpu version 2.7.0 with CUDA 11.2 and cuDNN 8.1. The available devices to tensorflow can be checked as below.</p>
+
+<pre><code>
+import tensorflow as tf
+gpus = tf.config.list_physical_devices("GPU")
+print(gpus)
+</code></pre>
+<p>This will output a list of the available GPU devices as the output below where we have 1 GPU available.</p> 
+<pre><code>
+[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] 
+</code></pre>
+<p>In case there are multiple GPUs available, a particular GPU can be set by,</p>
+
+<pre><code>
+gpu_number = 0 # Using only the first GPU
+tf.config.experimental.set_visible_devices(gpus[gpu_number], 'GPU')
+</code></pre>
+
+
+
 <h2 id="tutorials">Tutorials</h2>
 <p>Interactive tutorials including required data are available as part of this repository at <a href="https://github.com/imsb-uke/DISSECT/tree/main/tutorials">Tutorials</a>.</p> Below are the static versions of these tutorials.
  <nav>
         <ul>
-            <li><a href="expanded_tutorial.html">1. Step by step tutorial</a></li>
-            <li><a href="minimal_tutorial.html">2. Minimal tutorial</a></li>
+            <li><a href="tutorial.html">1. Step by step tutorial for bulk</a></li>
+            <li><a href="tutorial_spatial.html">2. Step by step tutorial for spatial transcriptomics (10x Visium)</a></li>
         </ul>
     </nav>
 <p>To get answers quickly for a problem or feature request, please open an issue on <a href="https://github.com/imsb-uke/DISSECT">GitHub.</a></p>