Skip to content

Commit

Permalink
Merge branch 'release/v2.1'
Browse files Browse the repository at this point in the history
  • Loading branch information
dbolotin committed Feb 6, 2017
2 parents 4cb38f2 + e6c7644 commit 7c4dcd4
Show file tree
Hide file tree
Showing 47 changed files with 3,227 additions and 522 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ doc/_build
.floo
.flooignore
out
test_target
19 changes: 19 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,4 +1,23 @@

MiXCR 2.1 ( 6 Feb 2017)
========================

-- Major review of all analysis steps for non-enriched libraries (RNA-Seq, etc...). Efficiency of
TCR/IG extraction substantially improved (according to our benchmarks, efficiency is highest among
all tools available for RNA-Seq repertoire extraction known to date; successfully work even for
48+48 RNA-Seq data). Zero false-positive alignments and false-overlaps detected.
-- Additional round of alignment for V gene in paired-end reads aligner (improve efficiency and
accuracy for some boundary cases; negligible impact on analysis speed).
-- New action `extendAlignments` to extend TCR alignments with defined V and J genes but not fully
covering CDR3 sequence.
-- Scripting-friendly export format now used by default. Use `-v` to return to column names with
spaces.
-- Information on the number of deleted nucleotides / size of P-segment for `V`, `D`, and `J` genes
now is explicitly exported in `-defaultAnchorPoints` field (see docs for more info).
-- Many small fixes and enhancements.
-- Correct marks for P-segment of J gene in `exportAlignmentsPretty` and `exportClonesPretty`


MiXCR 2.0.4 ( 4 Feb 2017)
========================

Expand Down
58 changes: 53 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,40 @@

MiXCR is a universal software for fast and accurate analysis of raw T- or B- cell receptor repertoire sequencing data.

- Easy to use. Default pipeline can be executed without any additional parameters (see *Usage* section)

- TCR and IG repertoires

- Following species are supported *out-of-the-box* using built-in library:
- human
- mouse
- rat (only TRB and TRA)
- *... several new species will be available soon*

- Efficiently extract repertoires from most of (if not *all*) types of TCR/IG-containing raw sequencing data:
- data from all specialized RepSeq sample preparation protocols
- RNA-Seq
- WGS
- single-cell data
- *etc..*

- Has optional CDR3 reconstruction step, that allows to *recover full hypervariable region from several disjoint reads*. Uses sophisticated algorithms protecting from false-positive assemblies at the same time having best in class efficiency.

- Assemble clonotypes, applying several *error-correction* algorithms to eliminate artificial diversity arising from PCR and sequencing errors

- Clonotypes can be assembled based on CDR3 sequence (default) as well as any other region, including *full-length* variable sequence (from beginning of FR1 to the end of FR4)

- Provides exhaustive output information for clonotypes and per-read alignments:
- nucleotide and amino acid sequences of all immunologically relevant regions (FR1, CDR1, ..., CDR3, etc..)
- identified V, D, J, C genes
- nucleotide and amino acid mutations in germline regions
- variable region topology (number of end V / D / J nucleotide deletions, length of P-segments, number of non-template N nucleotides)
- sequencing quality scores for any extracted sequence
- several other useful pieces of information

- Completely transparent pipeline, possible to track individual read fate from raw fastq entry to clonotype. Several useful tools available to evaluate pipeline performance: human readable alignments visualization, diff tool for alignment and clonotype files, etc...


## Installation / Download

#### Using Homebrew on Mac OS X or Linux (linuxbrew)
Expand All @@ -17,7 +51,7 @@ to upgrade already installed MiXCR to the newest version:

#### Manual install (any OS)

* download latest MiXCR version from [release page](https://github.com/milaboratory/mixcr/releases/latest)
* download latest stable MiXCR build from [release page](https://github.com/milaboratory/mixcr/releases/latest)
* unzip the archive
* add resulting folder to your ``PATH`` variable
* or add symbolic link for ``mixcr`` script to your ``bin`` folder
Expand All @@ -30,20 +64,35 @@ to upgrade already installed MiXCR to the newest version:

## Usage

Here is a very simple example of analysis of raw human RepSeq data:
#### Enriched RepSeq Data

Here is a very simple usage example that will extract repertoire data (in the form of clonotypes list) from raw sequencing data of enriched RepSeq library:

mixcr align -r log.txt input_R1.fastq.gz input_R2.fastq.gz alignments.vdjca
mixcr assemble -r log.txt alignments.vdjca clones.clns
mixcr exportClones clones.clns clones.txt

this sequence of commands will produce a tab-delimited list of clones (`clones.txt`) assembled by their CDR3 sequences with extensive information on their abundancies, V, D and J genes etc.
this will produce a tab-delimited list of clones (`clones.txt`) assembled by their CDR3 sequences with extensive information on their abundances, V, D and J genes, mutations in germline regions, topology of VDJ junction etc.

#### Repertoire extraction from RNA-Seq

For more details see documentation.
MiXCR is equally effective in extraction of repertoire information from non-enriched data, like RNA-Seq or WGS. This example illustrates usage for RNA-Seq:

mixcr align -p rna-seq -r log.txt input_R1.fastq.gz input_R2.fastq.gz alignments.vdjca
mixcr assemblePartial alignments.vdjca alignment_contigs.vdjca
mixcr assemble -r log.txt alignment_contigs.vdjca clones.clns
mixcr exportClones clones.clns clones.txt

#### Further reading

MiXCR pipeline is very flexible, and can be applied to raw data from broad spectrum of experimental setups. For detailed description of MiXCR features and options please see documentation.

## Documentation

Detailed documentation can be found at https://mixcr.readthedocs.io/

If you haven't found the answer to your question in the docs, or have any suggestions concerning new features, feel free to create an issue here, on GitHub, or write an email to [email protected] .

## Build

Dependancy:
Expand All @@ -63,7 +112,6 @@ To build MiXCR from source:
```
./build.sh
```


## License

Expand Down
20 changes: 16 additions & 4 deletions doc/export.rst
Original file line number Diff line number Diff line change
Expand Up @@ -177,17 +177,29 @@ The following table shows the correspondance between anchor point and positions
+--------------------------+---------------------+--------------------+
| VEnd / *PSegmentBegin* | 10 | 11 |
+--------------------------+---------------------+--------------------+
| VEndTrimmed | 11 | 12 |
| Number of 3' V deletions | 11 | 12 |
| (negative value), or | | |
| length of 3' V P-segment | | |
| (positive value) | | |
+--------------------------+---------------------+--------------------+
| DBeginTrimmed | 12 | 13 |
| Number of 5' D deletions | 12 | 13 |
| (negative value), or | | |
| length of 5' D P-segment | | |
| (positive value) | | |
+--------------------------+---------------------+--------------------+
| DBegin / *PSegmentEnd* | 13 | 14 |
+--------------------------+---------------------+--------------------+
| DEnd / *PSegmentBegin* | 14 | 15 |
+--------------------------+---------------------+--------------------+
| DEndTrimmed | 15 | 16 |
| Number of 3' D deletions | 15 | 16 |
| (negative value), or | | |
| length of 3' D P-segment | | |
| (positive value) | | |
+--------------------------+---------------------+--------------------+
| JBeginTrimmed | 16 | 17 |
| Number of 3' J deletions | 16 | 17 |
| (negative value), or | | |
| length of 3' J P-segment | | |
| (positive value) | | |
+--------------------------+---------------------+--------------------+
| JBegin / *PSegmentEnd* | 17 | 18 |
+--------------------------+---------------------+--------------------+
Expand Down
10 changes: 5 additions & 5 deletions doc/rnaseq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
Processing RNA-seq data
=======================

The typical MiXCR workflow can be applied for the analysis of RNA-seq samples. Though MiXCR can be used with the default parameters for aligning RNA-seq data, it is recommended to use ``rna-seq`` preset which is specifically tuned to perform well on such type of input:
The typical MiXCR workflow can be applied for the analysis of RNA-seq samples. It is recommended to use ``rna-seq`` preset which is specifically tuned to perform well on such type of input:

::

Expand Down Expand Up @@ -35,9 +35,9 @@ Note option ``-OallowPartialAlignments=true`` of the ``align`` command: it will
+------------------------------+---------------+--------------------------------------------------------------+
| ``kOffset`` | ``0`` | Offset taken from ``VEndTrimmed``/``JBeginTrimmed``. |
+------------------------------+---------------+--------------------------------------------------------------+
| ``minimalVJJunctionOverlap`` | ``18`` | Minimal length of the overlapped VJ region: two squences can |
| ``minimalAssembleOverlap`` | ``18`` | Minimal length of the overlapped VJ region: two squences can |
| | | be potentially merged only if they has at least |
| | | ``minimalVJJunctionOverlap`` consequent same nucleotides |
| | | ``minimalAssembleOverlap`` consequent same nucleotides |
| | | in the VJJunction region. |
+------------------------------+---------------+--------------------------------------------------------------+

Expand All @@ -60,7 +60,7 @@ The algorithm which restores merged sequence from two overlapped alignments has
+-----------------------------+---------------------+--------------------------------------------------------------+
| ``partsLayout`` | ``CollinearDirect`` | Relative orientation of paired reads. |
+-----------------------------+---------------------+--------------------------------------------------------------+
| ``minimalOverlap`` | ``20`` | Minimal length of the overlapped region. |
| ``minimalAssembleOverlap`` | ``20`` | Minimal length of the overlapped region. |
+-----------------------------+---------------------+--------------------------------------------------------------+
| ``maxQuality`` | ``45`` | Maximal sequence quality that can may be assigned in the |
| | | region of overlap. |
Expand All @@ -73,5 +73,5 @@ The above parameters can be specified in e.g. the following way:

::

mixcr assemblePartial -OmergerParameters.minimalOverlap=15 alignments.vdjca alignmentsRescued.vdjca
mixcr assemblePartial -OmergerParameters.minimalAssembleOverlap=15 alignments.vdjca alignmentsRescued.vdjca

76 changes: 76 additions & 0 deletions itests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/bin/bash

# "Integration" tests for MiXCR
# Test standard analysis pipeline results

# Linux readlink -f alternative for Mac OS X
function readlinkUniversal() {
targetFile=$1

cd `dirname $targetFile`
targetFile=`basename $targetFile`

# iterate down a (possible) chain of symlinks
while [ -L "$targetFile" ]
do
targetFile=`readlink $targetFile`
cd `dirname $targetFile`
targetFile=`basename $targetFile`
done

# compute the canonicalized name by finding the physical path
# for the directory we're in and appending the target file.
phys_dir=`pwd -P`
result=$phys_dir/$targetFile
echo $result
}

os=`uname`
delta=100

dir=""

case $os in
Darwin)
dir=$(dirname "$(readlinkUniversal "$0")")
;;
Linux)
dir="$(dirname "$(readlink -f "$0")")"
;;
FreeBSD)
dir=$(dirname "$(readlinkUniversal "$0")")
;;
*)
echo "Unknown OS."
exit 1
;;
esac

rm -rf ${dir}/test_target
mkdir ${dir}/test_target

cp ${dir}/src/test/resources/sequences/*.fastq ${dir}/test_target/

cd ${dir}/test_target/

PATH=${dir}:${PATH}

which mixcr

mixcr -v

function go_assemble {
mixcr assemble -r $1.clns.report $1.vdjca $1.clns || exit 1
for c in TCR IG TRB TRA TRG TRD IGH IGL IGK ALL
do
mixcr exportClones -c ${c} -s $1.clns $1.clns.${c}.txt || exit 1
done
}

for s in sample_IGH test;
do
mixcr align -r ${s}_paired.vdjca.report ${s}_R1.fastq ${s}_R2.fastq ${s}_paired.vdjca || exit 1
go_assemble ${s}_paired
mixcr align -r ${s}_single.vdjca.report ${s}_R1.fastq ${s}_single.vdjca || exit 1
go_assemble ${s}_single
done
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@

<groupId>com.milaboratory</groupId>
<artifactId>mixcr</artifactId>
<version>2.0.4</version>
<version>2.1</version>
<packaging>jar</packaging>
<name>MiXCR</name>

Expand Down
2 changes: 1 addition & 1 deletion repseqio
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,12 @@ public boolean accept(SequencePartitioning object) {
return object.isAvailable(ReferencePoint.VEnd) && object.getPosition(ReferencePoint.VEnd) != object.getPosition(ReferencePoint.VEndTrimmed);
}
};
public static final Filter<SequencePartitioning> IsJP = new Filter<SequencePartitioning>() {
@Override
public boolean accept(SequencePartitioning object) {
return object.isAvailable(ReferencePoint.JBegin) && object.getPosition(ReferencePoint.JBegin) != object.getPosition(ReferencePoint.JBeginTrimmed);
}
};
public static final Filter<SequencePartitioning> IsDPLeft = new Filter<SequencePartitioning>() {
@Override
public boolean accept(SequencePartitioning object) {
Expand All @@ -128,6 +134,7 @@ public boolean accept(SequencePartitioning object) {
public static final Filter<SequencePartitioning> NotDPLeft = FilterUtil.not(IsDPLeft);
public static final Filter<SequencePartitioning> NotDPRight = FilterUtil.not(IsDPRight);
public static final Filter<SequencePartitioning> NotVP = FilterUtil.not(IsVP);
public static final Filter<SequencePartitioning> NotJP = FilterUtil.not(IsJP);


public static final PointToDraw[] POINTS_FOR_REARRANGED = new PointToDraw[]{
Expand All @@ -153,8 +160,10 @@ public boolean accept(SequencePartitioning object) {
pd(ReferencePoint.DEnd, "D><DP", IsDPRight),
pd(ReferencePoint.DEndTrimmed, "DP>", IsDPRight),

pd(ReferencePoint.JBeginTrimmed, "<J"),
pd(ReferencePoint.CDR3End, "CDR3><FR4"),
pd(ReferencePoint.JBeginTrimmed, "<J", NotJP),
pd(ReferencePoint.JBegin, "JP><J", IsJP),
pd(ReferencePoint.JBeginTrimmed, "<JP", IsJP),
pd(ReferencePoint.CDR3End.move(-1), "CDR3><FR4").moveMarkerPoint(1),
pd(ReferencePoint.FR4End, "FR4>", -1),
pd(ReferencePoint.CBegin, "<C")
};
Expand All @@ -174,7 +183,7 @@ public boolean accept(SequencePartitioning object) {
pd(ReferencePoint.DBegin, "<D"),
pd(ReferencePoint.DEnd, "D>", -1),
pd(ReferencePoint.JBegin, "<J"),
pd(ReferencePoint.CDR3End, "CDR3><FR4"),
pd(ReferencePoint.CDR3End.move(-1), "CDR3><FR4").moveMarkerPoint(1),
pd(ReferencePoint.FR4End, "FR4>", -1)
};

Expand Down Expand Up @@ -213,6 +222,10 @@ public PointToDraw(ReferencePoint rp, String marker, int markerOffset, Filter<Se
this.activator = activator;
}

public PointToDraw moveMarkerPoint(int offset) {
return new PointToDraw(rp, marker, markerOffset + offset, activator);
}

public boolean draw(SequencePartitioning partitioning, MultiAlignmentHelper helper, char[] line, boolean overwrite) {
if (activator != null && !activator.accept(partitioning))
return true;
Expand Down
10 changes: 10 additions & 0 deletions src/main/java/com/milaboratory/mixcr/basictypes/VDJCHit.java
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,12 @@ public VDJCHit(VDJCGene gene, Alignment<NucleotideSequence>[] alignments, GeneFe
this.score = score;
}

public VDJCHit setAlignment(int target, Alignment<NucleotideSequence> alignment) {
Alignment<NucleotideSequence>[] newAlignments = alignments.clone();
newAlignments[target] = alignment;
return new VDJCHit(gene, newAlignments, alignedFeature);
}

public int getPosition(int target, ReferencePoint referencePoint) {
if (alignments[target] == null)
return -1;
Expand Down Expand Up @@ -103,6 +109,10 @@ public Alignment<NucleotideSequence> getAlignment(int target) {
return alignments[target];
}

public Alignment<NucleotideSequence>[] getAlignments() {
return alignments.clone();
}

public int numberOfTargets() {
return alignments.length;
}
Expand Down
11 changes: 11 additions & 0 deletions src/main/java/com/milaboratory/mixcr/basictypes/VDJCObject.java
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,13 @@ public final VDJCHit[] getHits(GeneType type) {
return hits == null ? new VDJCHit[0] : hits;
}

public Chains getTopChain(GeneType gt) {
final VDJCHit top = getBestHit(gt);
if (top == null)
return Chains.EMPTY;
return top.getGene().getChains();
}

public Chains getAllChains(GeneType geneType) {
if (allChains == null)
synchronized ( this ){
Expand Down Expand Up @@ -142,6 +149,10 @@ public final NSequenceWithQuality getTarget(int target) {
return targets[target];
}

public final NSequenceWithQuality[] getTargets(){
return targets.clone();
}

public final VDJCPartitionedSequence getPartitionedTarget(int target) {
if (partitionedTargets == null) {
partitionedTargets = new VDJCPartitionedSequence[targets.length];
Expand Down
Loading

0 comments on commit 7c4dcd4

Please sign in to comment.