-
Notifications
You must be signed in to change notification settings - Fork 4
Visualising RNA feature distributions with R2Dtool
R2Dtool offers several flexible and modular plotting functions to visualise the distributions of isoform-resolved RNA features around genomic landmarks. Here, we provide more information on how this functions are implemented and can be used.
R2Dtool offers three main plotting functions:
-
plotMetaTranscript
: Visualises the distribution of RNA features across a normalisedmetatranscript
model. -
plotMetaJunction
: Shows the distribution of RNA features around splice junctions (only on spliced transcripts) -
plotMetaCodon
: Displays the distribution of RNA features around start or stop codons (only on protein-coding transcripts)
Each of these functions is implemented as a separate R script, all of which operate on the the output of r2d annotate
.
Note:annotate
must be called with the -H flag, since column headers are required to specify which columns can be used to filter for significant RNA features (see below)
For general usage, the plots can be generated using the r2d plot...
commands from the Rust-based command-line interface, as described in the README. The source Rscript for each plot is available at ./scripts/R2_plot[type].R
and can be modified and used directly if greater customisation is required.
R2Dtool metaplots are designed to show the density of RNA features on metatranscript positions; that is, the proportion of 'positive' RNA features, against all features that have been tested, at a position, either on a scaled metatranscript model (metatranscript plots), or at a fixed distance from a transcriptomic landmark (metacodon or metajunction plots).
In order to specify 'positive features', a numberic column must exist in the annotated BED file, for example, site stoichiometry, or site p-value, from which 'negative' and 'positive' sites can be identified. In order to facilitate this, the input data provided to the plots should contain information about both 'tested' and 'significant' RNA features. The syntax of the R2Dtool plots relies on both these data types being present, and is specified:
In order to normalise for differences in the amounts of sites tested at different transcriptomic positions, R2Dtool normalises the number of 'positive' sites at a given position, to the total number of sites that have been tested at that position. To perform this normalisation, R2Dtool requires a method to determine which RNA features are considered 'significant'. This is done using three parameters;
-
filter_field
: The name of the column in the input file used to filter significant sites. -
cutoff
: A numeric value defining the threshold for significance. -
cutoff_type
: Specifies whether values above (upper
) or below (lower
) the cutoff are considered significant.
These parameters allow R2Dtool to work with various types of RNA feature data. For example:
-
For RNA modification data, you might use a stoichiometry measure as the
filter_field
, with acutoff
of 0.5 andcutoff_type
ofupper
to consider sites with >50% modification as significant. -
For statistical measures, you could use a p-value as the
filter_field
, with acutoff
of 0.05 andcutoff_type
oflower
to consider sites with p < 0.05 as significant.
In R2Dtool plots, "density" refers to the relative abundance of a feature on a transcript, normalized for the length of the region and the number of tested sites at a given distance from a feature of interest.
The y-axis in R2Dtool plots typically shows the "proportion of significant sites", which is calculated as:
proportion = (number of significant sites) / (total number of sites)
This proportion is calculated for each bin or position along the x-axis.
R2Dtool offers two methods for calculating and displaying confidence intervals:
-
LOESS (Local Regression): This method uses local weighted regression to smooth the data and calculate confidence intervals. It's the default method and generally produces smoother curves.
-
Binomial: This method calculates exact binomial confidence intervals for each point. It may be more appropriate for datasets with high variability or where precise point estimates are needed.
Users can specify the confidence interval method using the -c
flag in the command-line interface.
All plotting functions share these common parameters:
-
input_file
: Path to the annotated sites file generated byr2d annotate
. -
output_file
: Path where the plot will be saved (include file extension, e.g., .png or .svg). -
filter_field
: The name of the column in the input file used to filter significant sites. -
cutoff
: Numeric value defining the threshold for significance. -
cutoff_type
: Specifies the comparison direction, either 'lower' or 'upper', to determine significance. -
confidence_method
: Strategy for displaying confidence intervals: 'loess' (default) or 'binomial'. -
save_table
: (Optional) Path to save the aggregated data as a tab-separated file, where source data is required.
This function generates a plot showing the distribution of RNA features across a normalized transcript model.
r2d plotMetaTranscript -i <input_file> -o <output_file> -f <filter_field> -u <cutoff> -t <cutoff_type> [-c <confidence_method>] [-s <save_table>] [-l]
-
-l
: Display transcript region labels (5' UTR, CDS, 3'UTR) on the plot.
The x-axis represents the relative position along the transcript, normalized to a scale of 0-3, where:
- 0-1: 5' UTR
- 1-2: CDS (Coding Sequence)
- 2-3: 3' UTR
The y-axis shows the proportion of significant sites at each position.
- Each RNA feature is assigned to a metatranscript region (5' UTR, CDS, or 3' UTR) based on the transcript model annotated for the isoform to which the feature is mapped.
- The relative position of the feature in the metatranscript region is calculated by comparing the position of the feature to the length of the metatranscript region.
- The script bins the normalized transcript positions into intervals.
- For each interval, it calculates the ratio of significant sites to total sites.
- A smoothed line is drawn using either LOESS regression or binomial confidence intervals.
This plot allows you to visualize how RNA features are distributed across different regions of transcripts. For example, you might observe enrichment of modifications in the 3' UTR or depletion near the start codon. The isoform-aware nature of R2Dtool ensures that features are correctly placed based on the specific isoform they were mapped to, providing a more accurate representation than methods that use a single representative transcript per gene.
This function creates a plot showing the distribution of RNA features around splice junctions. By default, the plot shows the proportions of features when junctions are at specific distances from sites, e.g. the sites are shown at x = 0 and the coordinates of the junctions are indicated on the plot.
r2d plotMetaJunction -i <input_file> -o <output_file> -f <filter_field> -u <cutoff> -t <cutoff_type> [-c <confidence_method>] [-s <save_table>] [-r]
The -r flag reverses the x-axis, so the positions of junctions are shown at x = 0, and the distance to the nearest m6A sites is indicated on the x-axis. An example is shown on the README page.
The x-axis represents the distance from the nearest splice junction, with negative values indicating upstream positions and positive values indicating downstream positions.
The y-axis shows the proportion of significant sites at each position relative to the splice junctions.
- The script calculates the distance of each site to its nearest upstream and downstream splice junctions, considering the specific isoform the feature is mapped to.
- It then bins these distances and calculates the ratio of significant sites in each bin.
- A smoothed line is drawn using either LOESS regression or binomial confidence intervals.
This plot helps visualize how RNA features are distributed around splice junctions. It can reveal patterns such as depletion or enrichment of features near splice sites, which might indicate roles in splicing regulation or be a consequence of the splicing process. The isoform-aware approach ensures that the distances are calculated based on the actual splice junctions present in the isoform where each feature was detected.
This function generates a plot showing the distribution of RNA features around start or stop codons.
r2d plotMetaCodon -i <input_file> -o <output_file> -f <filter_field> -u <cutoff> -t <cutoff_type> [-c <confidence_method>] [-s <save_table>] (-s | -e)
-
-s
: Plot distribution around the start codon -
-e
: Plot distribution around the stop codon
The x-axis represents the distance from the start or stop codon, with the codon position at 0.
The y-axis shows the proportion of significant sites at each position relative to the codon.
- The script calculates the distance of each site to the start or stop codon of the specific isoform it's mapped to.
- It then bins these distances and calculates the ratio of significant sites in each bin.
- A smoothed line is drawn using either LOESS regression or binomial confidence intervals.
This plot allows you to examine how RNA features are distributed relative to start or stop codons. It can reveal patterns such as enrichment of modifications near the stop codon, which might suggest roles in translation termination or mRNA stability. The isoform-aware approach ensures that the distances are calculated based on the actual start or stop codon positions in the isoform where each feature was detected.
While the plotting functions are implemented as R scripts, they can be easily called using the R2Dtool Rust-based command-line interface. The Rust code handles parameter parsing and calls the appropriate R script with the correct arguments.
The R scripts use the ggplot2
library to generate high-quality, publication-ready plots. They also handle data aggregation and statistical calculations.
This design allows for easy integration of the plotting functions into larger bioinformatics pipelines while leveraging the powerful plotting capabilities of R and ggplot2.
-
Isoform-Aware Analysis: Unlike methods that use a single representative transcript per gene, R2Dtool considers the specific isoform each feature is mapped to. This provides a more accurate representation of feature distributions, especially for genes with multiple isoforms.
-
Flexibility: The filtering approach allows R2Dtool to work with various types of RNA feature data, whether it's based on stoichiometry, statistical significance, or other measures.
-
Comprehensive Visualization: By providing multiple plot types (metatranscript, metajunction, metacodon), R2Dtool allows researchers to examine RNA feature distributions from different perspectives.
-
Statistical Rigor: The inclusion of confidence intervals (either through LOESS or binomial methods) provides a measure of uncertainty in the observed patterns.
-
Integration with Bioinformatics Pipelines: The command-line interface and option to save data tables make it easy to incorporate R2Dtool plots into larger analysis workflows.