Skip to content

Commit

Permalink
add_gpu_usage
Browse files Browse the repository at this point in the history
  • Loading branch information
robinredX committed Jan 25, 2024
1 parent b3d39c0 commit 86c6357
Show file tree
Hide file tree
Showing 14 changed files with 12,395 additions and 11,866 deletions.
27 changes: 23 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ conda >= v22 through [Anaconda](https://docs.anaconda.com/free/anaconda/install/


## Installing
```python
```shell
## Installation

# Create and activate virtual environment. This is recommended to avoid conflict in dependencies.
conda create -y -n dissect python=3.8
conda create -y -n dissect python=3.9
conda activate dissect

# Clone DISSECT
Expand All @@ -37,11 +37,30 @@ cd DISSECT/tutorials
## Launch jupyter lab
jupyter notebook

```
## GPU usage
By default, tensorflow-gpu which is installed while installed DISSECT works as long as appropriate CUDA driver is installed. DISSECT uses tensorflow-gpu version 2.7.0 with CUDA 11.2 and cuDNN 8.1. The available devices to tensorflow can be checked as below.

```python
import tensorflow as tf
gpus = tf.config.list_physical_devices("GPU")
print(gpus)

```
This will output a list of the available GPU devices as the output below where we have 1 GPU available.
```
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
```
In case there are multiple GPUs available, a particular GPU can be set by,

```python
gpu_number = 0 # Using only the first GPU
tf.config.experimental.set_visible_devices(gpus[gpu_number], 'GPU')
```

## Tutorials
Interactive tutorials including required data are available as part of this repository at [Tutorials](https://github.com/imsb-uke/DISSECT/tree/main/tutorials).
1. expanded_tutorial.ipynb: Step by step deconvolution
2. minimal_tutorial.ipynb: Complete deconvolution from a single configuration file using minimal steps of code
1. expanded_tutorial.ipynb: Step by step deconvolution for bulk
2. expanded_tutorial_spatial.ipynb: Step by step deconvolution of spatial transcriptomics data (10x Visium)

To get answers quickly for a problem or feature request, please open an issue.
5,749 changes: 2,716 additions & 3,033 deletions dissect/PropsSimulator/simulator.c

Large diffs are not rendered by default.

42 changes: 18 additions & 24 deletions dissect/PropsSimulator/simulator.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,8 @@ def generate_props(self):

fig = plt.figure()
ax = plt.boxplot(self.props_complete, labels=self.celltypes) #
if self.n_celltypes>10:
plt.xticks(rotation=45, ha="right")
plt.ylabel("Proportion")
plt.title("Proportions of cell-types in generated samples")
plt.savefig(
Expand Down Expand Up @@ -319,7 +321,7 @@ def __init__(self):
pass

def initialize(self, config):
self.config["simulation_params"] = config
self.config = config
self.sc_adata = sc.read(config["simulation_params"]["scdata"])
if "sparse" in str(type(self.sc_adata.X)):
self.sc_adata.X = np.array(self.sc_adata.X.todense())
Expand Down Expand Up @@ -385,18 +387,22 @@ def generate_props(self):

fig = plt.figure()
ax = plt.boxplot(self.props_sparse, labels=self.celltypes) #
if self.n_celltypes>10:
plt.xticks(rotation=45, ha="right")
plt.ylabel("Proportion")
plt.title("Proportions of cell-types in generated samples")
plt.savefig(
os.path.join(self.config.simulation_folder, "boxplot_props_sparse.pdf")
)
fig = plt.figure()
ax = plt.boxplot(self.cells_sparse, labels=self.celltypes)
plt.ylabel("Count")
plt.title("Counts of cell-types in generated samples")
plt.savefig(
os.path.join(self.config.simulation_folder, "boxplot_ncells_sparse.pdf")
os.path.join(self.simulation_folder, "boxplot_props_sparse.pdf")
)
# fig = plt.figure()
# ax = plt.boxplot(self.cells_sparse, labels=self.celltypes)
# if self.n_celltypes>10:
# plt.xticks(rotation=45, ha="right")
# plt.ylabel("Count")
# plt.title("Counts of cell-types in generated samples")
# plt.savefig(
# os.path.join(self.simulation_folder, "boxplot_ncells_sparse.pdf")
# )

self.props = self.props_sparse
self.cells = self.cells_sparse
Expand Down Expand Up @@ -477,7 +483,7 @@ def simulate(self, save=True):
)
)

if self.config["generate_component_figures"]:
if self.config["simulation_params"]["generate_component_figures"]:
idxs = {}
celltypes_col = []
for j in range(self.n_celltypes):
Expand Down Expand Up @@ -536,19 +542,6 @@ def simulate_per_batch(self, save=True):
)
)

adata.write(os.path.join(self.simulation_folder, "simulated.h5ad"))
sc.set_figure_params(dpi=200)
tmp = adata.copy()
sc.pp.normalize_total(tmp, target_sum=1e6)
sc.pp.log1p(tmp)
sc.tl.pca(tmp)
sc.pl.pca(tmp, color=adata.obs.columns, show=False)
plt.savefig(
os.path.join(
self.simulation_folder, "scatterplot_pca_simulated.pdf"
)
)

if self.config["simulation_params"]["generate_component_figures"]:
idxs = {}
celltypes_col = []
Expand Down Expand Up @@ -613,7 +606,8 @@ def simulate(config):
print("Number of batches in single-cell data is 1. If this is incorrect, please specify name of the batch column as in the single-cell data object (.obs)")
sim.simulate(save=True)
sim.config["simulation_params"]["simulation_folder"] = os.path.join(sim.config["experiment_folder"], "simulation")
sim.config["simulation_params"]["concentration"] = list(sim.config["simulation_params"]["concentration"])
if config["simulation_params"]["type"]=="bulk":
sim.config["simulation_params"]["concentration"] = list(sim.config["simulation_params"]["concentration"])
sim.config["deconv_params"]["reference"] = os.path.join(sim.config["simulation_params"]["simulation_folder"], "simulated.h5ad")
save_dict_to_file(sim.config)

30 changes: 10 additions & 20 deletions dissect/PropsSimulator/simulator.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -108,16 +108,11 @@ class Simulate(object):
ax = plt.boxplot(self.props_complete, labels=self.celltypes) #
plt.ylabel("Proportion")
plt.title("Proportions of cell-types in generated samples")
if self.n_celltypes>10:
plt.xticks(rotation=45, ha="right")
plt.savefig(
os.path.join(self.simulation_folder, "boxplot_props_complete.pdf")
)
fig = plt.figure()
ax = plt.boxplot(self.cells_complete, labels=self.celltypes)
plt.ylabel("Count")
plt.title("Counts of cell-types in generated samples")
plt.savefig(
os.path.join(self.simulation_folder, "boxplot_ncells_complete.pdf")
)

self.props = np.concatenate([self.props_complete, self.props_sparse], axis=0)
self.cells = np.concatenate([self.cells_complete, self.cells_sparse], axis=0)
Expand Down Expand Up @@ -304,7 +299,7 @@ class Simulate_st(object):
pass

def initialize(self, config):
self.config["simulation_params"] = config
self.config = config
self.sc_adata = sc.read(config["simulation_params"]["scdata"])
if "sparse" in str(type(self.sc_adata.X)):
self.sc_adata.X = np.array(self.sc_adata.X.todense())
Expand Down Expand Up @@ -370,19 +365,13 @@ class Simulate_st(object):

fig = plt.figure()
ax = plt.boxplot(self.props_sparse, labels=self.celltypes) #
if self.n_celltypes>10:
plt.xticks(rotation=45, ha="right")
plt.ylabel("Proportion")
plt.title("Proportions of cell-types in generated samples")
plt.savefig(
os.path.join(self.config.simulation_folder, "boxplot_props_sparse.pdf")
os.path.join(self.simulation_folder, "boxplot_props_sparse.pdf")
)
fig = plt.figure()
ax = plt.boxplot(self.cells_sparse, labels=self.celltypes)
plt.ylabel("Count")
plt.title("Counts of cell-types in generated samples")
plt.savefig(
os.path.join(self.config.simulation_folder, "boxplot_ncells_sparse.pdf")
)

self.props = self.props_sparse
self.cells = self.cells_sparse

Expand Down Expand Up @@ -462,7 +451,7 @@ class Simulate_st(object):
)
)

if self.config["generate_component_figures"]:
if self.config["simulation_params"]["generate_component_figures"]:
idxs = {}
celltypes_col = []
for j in range(self.n_celltypes):
Expand Down Expand Up @@ -598,7 +587,8 @@ def simulate(config):
print("Number of batches in single-cell data is 1. If this is incorrect, please specify name of the batch column as in the single-cell data object (.obs)")
sim.simulate(save=True)
sim.config["simulation_params"]["simulation_folder"] = os.path.join(sim.config["experiment_folder"], "simulation")
sim.config["simulation_params"]["concentration"] = list(sim.config["simulation_params"]["concentration"])
if config["simulation_params"]["type"]=="bulk":
sim.config["simulation_params"]["concentration"] = list(sim.config["simulation_params"]["concentration"])
sim.config["deconv_params"]["reference"] = os.path.join(sim.config["simulation_params"]["simulation_folder"], "simulated.h5ad")
save_dict_to_file(sim.config)


6 changes: 3 additions & 3 deletions dissect/configs/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"simulation_params": {
"scdata": "/home/user/experiment/data.h5ad", # Path to sc/snRNA-seq data, should be anndata
"n_samples": None, # Number of samples to generate. Default: 1000 times the number of celltypes,
"type": "bulk",
"type": "bulk", # bulk or st to simulate bulk and spatial transcriptomics respectively
"celltype_col": "celltype", # Name of the column corresponding to cell-type labels in adata.obs
"batch_col": None, # If more than one batches are present, name of the column corrsponding to batch labels in adata.obs
"cells_per_sample": None, # Number of cells to sample to generate one sample.
Expand All @@ -31,15 +31,15 @@
"deconv_params": {
"test_dataset": "../bulk.txt",
"test_dataset_format": "txt", # Either tab-delimited txt file with genes in rows or h5ad file compatible with Scanpy.
"test_dataset_type": "bulk", # bulk, microarray or spatial
"test_dataset_type": "bulk", # bulk or microarray. For spatial, set it to bulk as similar training procedure is used.
"duplicated": "first", # In case, there are duplicated genes in the test_dataset. To use the first occuring gene, write first. To sum the duplicated genes, write sum. To take average, write mean
"normalize_simulated": "cpm", # "cpm", # Only CPM and None is supported. Write CPM if not already TPM/CPM.
"normalize_test": "cpm", # Write CPM if not already TPM/CPM
"var_cutoff": 0.1, # variance cutoff for gene filtering
"test_in_mix": None, # Number of test samples to use in the generation of online mixtures. None uses all samples.
"simulated": True, # True if dataset is already simulated. False, if it is a single-cell dataset.
"sig_matrix": False,
"mix": "srm",
"mix": "srm", # srm for bulk and spatial data, rrm for microarray data
"save_config": True,
"network_params": {
"n_hidden_layers": 4, # Number of hidden layers
Expand Down
6 changes: 5 additions & 1 deletion dissect/prepare_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ def dataset(config):
X_real = pd.read_table(config["deconv_params"]["test_dataset"], index_col=0)
elif config["deconv_params"]["test_dataset_format"] == "h5ad":
X_real = sc.read(config["deconv_params"]["test_dataset"])
if "parse" in str(type(X_real.X)):
X_real.X = np.array(X_real.X.todense())
X_real = pd.DataFrame(
X_real.X, index=X_real.obs.index.tolist(), columns=X_real.var_names.tolist()
).T
Expand Down Expand Up @@ -127,9 +129,11 @@ def dataset(config):
# Simulated if not simulated
if config["deconv_params"]["simulated"]:
X_sim = X_sc

else:
X_sim = simulate(X_sc, config["deconv_params"]["simulation_params"])

if "parse" in str(type(X_sim.X)):
X_sim = np.array(X_sim.X.todense())
# Normalization
if config["deconv_params"]["normalize_simulated"] == "cpm":
sc.pp.normalize_total(X_sim, target_sum=1e6)
Expand Down
28 changes: 25 additions & 3 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ <h2 id="installing">Installing</h2>
<pre><code>## Installation

# Create and activate virtual environment
conda create -y -n dissect python=3.8
conda create -y -n dissect python=3.9
conda activate dissect

# Clone DISSECT
Expand All @@ -68,12 +68,34 @@ <h2 id="installing">Installing</h2>
# Go to tutorials directory within DISSECT
cd DISSECT/tutorials
</code></pre>
<h2 id="gpu-usage">GPU usage</h2>

<p>By default, tensorflow-gpu which is installed while installed DISSECT works as long as appropriate CUDA driver is installed. DISSECT uses tensorflow-gpu version 2.7.0 with CUDA 11.2 and cuDNN 8.1. The available devices to tensorflow can be checked as below.</p>

<pre><code>
import tensorflow as tf
gpus = tf.config.list_physical_devices("GPU")
print(gpus)
</code></pre>
<p>This will output a list of the available GPU devices as the output below where we have 1 GPU available.</p>
<pre><code>
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
</code></pre>
<p>In case there are multiple GPUs available, a particular GPU can be set by,</p>

<pre><code>
gpu_number = 0 # Using only the first GPU
tf.config.experimental.set_visible_devices(gpus[gpu_number], 'GPU')
</code></pre>



<h2 id="tutorials">Tutorials</h2>
<p>Interactive tutorials including required data are available as part of this repository at <a href="https://github.com/imsb-uke/DISSECT/tree/main/tutorials">Tutorials</a>.</p> Below are the static versions of these tutorials.
<nav>
<ul>
<li><a href="expanded_tutorial.html">1. Step by step tutorial</a></li>
<li><a href="minimal_tutorial.html">2. Minimal tutorial</a></li>
<li><a href="tutorial.html">1. Step by step tutorial for bulk</a></li>
<li><a href="tutorial_spatial.html">2. Step by step tutorial for spatial transcriptomics (10x Visium)</a></li>
</ul>
</nav>
<p>To get answers quickly for a problem or feature request, please open an issue on <a href="https://github.com/imsb-uke/DISSECT">GitHub.</a></p>
Expand Down
Loading

0 comments on commit 86c6357

Please sign in to comment.