Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Relative Log Expression (RLE) Plots #348

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions pydeseq2/dds.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from pydeseq2.preprocessing import deseq2_norm_transform
from pydeseq2.utils import build_design_matrix
from pydeseq2.utils import dispersion_trend
from pydeseq2.utils import make_rle_plot
from pydeseq2.utils import make_scatter
from pydeseq2.utils import mean_absolute_deviation
from pydeseq2.utils import n_or_more_replicates
Expand Down Expand Up @@ -1010,6 +1011,36 @@ def plot_dispersions(
**kwargs,
)

def plot_rle(
self,
normalize: bool = False,
save_path: Optional[str] = None,
**kwargs,
):
"""Plot ratio of log expressions for each sample.

Useful for visualizing sample to sample variation.

Parameters
----------
normalize : bool, optional
Whether to normalize the counts before plotting. (default: ``False``).

save_path : str or None
The path where to save the plot. If left None, the plot won't be saved
(default: ``None``).

**kwargs
Keyword arguments for the scatter plot.
"""
make_rle_plot(
count_matrix=self.X,
normalize=normalize,
sample_ids=self.obsm["design_matrix"].index,
save_path=save_path,
**kwargs,
BorisMuzellec marked this conversation as resolved.
Show resolved Hide resolved
)

def _fit_parametric_dispersion_trend(self, vst: bool = False):
r"""Fit the dispersion curve according to a parametric model.

Expand Down
61 changes: 61 additions & 0 deletions pydeseq2/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1608,3 +1608,64 @@ def lowess(
delta = (1 - delta**2) ** 2

return yest


def make_rle_plot(
count_matrix: np.array,
sample_ids: np.array,
normalize: bool = False,
save_path: Optional[str] = None,
**kwargs,
) -> None:
"""
Create a ratio of log expression plot using matplotlib.

Parameters
----------
count_matrix : ndarray
An mxn matrix of count data, where m is the number of samples (rows),
and n is the number of genes (columns).

sample_ids : ndarray
An array of sample identifiers.

normalize : bool
Whether to normalize the count matrix before plotting. (default: ``False``).

save_path : str or None
The path where to save the plot. If left None, the plot won't be saved
(default: ``None``).

**kwargs :
Additional keyword arguments passed to matplotlib's boxplot function.
"""
if normalize:
print("Plotting normalized RLE plot...")
geometric_mean = np.exp(np.mean(np.log(count_matrix + 1), axis=0))
size_factors = np.median(count_matrix / geometric_mean, axis=1)
count_matrix = count_matrix / size_factors[:, np.newaxis]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is recomputing median-of-ratios size factors, right? (up to the +1 in the log)

In that case, it might be better to add a size_factors argument to make_rle_plot.

From there, in dds.plot_rle, if normalize=True, we would first check whether size factors (self.obsm["size_factors"]) were already computed. If so, we pass them directly. If not, we call self.fit_size_factors first.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @BorisMuzellec ,

For some reason when I use the size factors computed in dds.fit_size_factors() I do not get an RLE plot with the sample medians centered around 0.

image

But when I compute the sizefactors internally I do get an RLE plot with the sample medians centered around 0

image

Do you know what could be causing this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we would expect the sample medians to be zero given that RLE plot subtracts the gene medians. Also, I would be wary of making any conclusion from the test data.

That being said what you wrote is what is implemented in R's plotRLE method (https://rdrr.io/github/davismcc/scater/src/R/plotRLE.R). If it's standard, I'm happy to keep it as is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathjd I think we may still have some consistency issues though: if we're using log(counts + 1) for size factors, shouldn't we also do the same everywhere (gene medians, and plotting)

Copy link
Author

@jonathjd jonathjd Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BorisMuzellec Happy holidays! Sorry for the delay, I was on vacation. It's probably a good idea to add the psuedocount to the gene_median calculations so we don't divide by a gene with 0 counts across all samples. I can add that now.

I added the psuedocount and ran some tests and the psuedocount alters the gene medians distance from the sample medians and skews the plot significantly.


plt.rcParams.update({"font.size": 10})

fig, ax = plt.subplots(figsize=(15, 8), dpi=600)

# Calculate median expression across samples
gene_medians = np.median(count_matrix, axis=0)
rle_values = np.log2(count_matrix / gene_medians)

kwargs.setdefault("alpha", 0.5)
boxprops = {"facecolor": "lightgray", "alpha": kwargs.pop("alpha")}

ax.boxplot(rle_values.T, patch_artist=True, boxprops=boxprops, **kwargs)

ax.axhline(0, color="red", linestyle="--", linewidth=1, alpha=0.5, zorder=3)
ax.set_xlabel("Sample")
ax.set_ylabel("Relative Log Expression")
ax.set_xticks(np.arange(len(sample_ids)))
ax.set_xticklabels(sample_ids, rotation=90)
plt.tight_layout()

if save_path:
plt.savefig(save_path, bbox_inches="tight")
else:
plt.show()
Loading