Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/staging'
Browse files Browse the repository at this point in the history
  • Loading branch information
susannasiebert committed Aug 7, 2019
2 parents fdedebd + c79efab commit 75867e8
Show file tree
Hide file tree
Showing 1,880 changed files with 279,844 additions and 68,830 deletions.
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,9 @@
# built documents.
#
# The short X.Y version.
version = '1.4'
version = '1.5'
# The full version, including alpha/beta/rc tags.
release = '1.4.5'
release = '1.5.0'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
Binary file added docs/images/pVACbind_logo_trans-bg_sm_v4b.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/pVACbind_logo_trans-bg_v4b.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
113 changes: 75 additions & 38 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,10 @@ pVACtools is a cancer immunotherapy tools suite consisting of the following
tools:

**pVACseq**
A cancer immunotherapy pipeline for identifying and prioritizing neoantigens from a list of tumor mutations.
A cancer immunotherapy pipeline for identifying and prioritizing neoantigens from a VCF file.

**pVACbind**
A cancer immunotherapy pipeline for identifying and prioritizing neoantigens from a FASTA file.

**pVACfuse**
A tool for detecting neoantigens resulting from gene fusions.
Expand All @@ -28,6 +31,7 @@ tools:
:maxdepth: 2

pvacseq
pvacbind
pvacfuse
pvacvector
pvacviz
Expand All @@ -44,48 +48,81 @@ tools:
mailing_list


New in release |release|
------------------------

This is a hotfix release. It fixes the following issues:

- In a previous version we implemented a faster method for reading data from
the database in pVACapi. However, this would fail if the postgres user is
not a superuser. This version fixes this issue by using the previous
database file read method in this situation.
- This version marks certain columns of the output reports as not visualizable
in pVACviz/pVACapi because they contain string content that cannot be
plotted in a scatterplot.

New in version |version|
------------------------

This version adds the following features:

- pVACvector now tests spacers iteratively. During the first iteration, the
first spacer in the list of ``--spacers`` gets tested. In the next
iteration, the next spacer in the list gets added to the pool of spacers to
tests, and so on. If at any point a valid ordering is found, pVACvector will
finish its run and output the result. This might result in slightly
less optimal (but still valid) ordering but improves runtime significantly.
- If, after testing all spacers, no valid ordering if found, pVACvector will
clip the beginning and/or ends of problematic peptides by one amino acid.
The ordering finding process is then repeated on the updated list of
peptides. This process may be repeated up to a maximum set by the
``--max-clip-length`` parameter.
- This version adds a standalone command to create the pVACvector
visualizations that can be run by calling ``pvacvector visualize`` using a
pVACvector result file as the input.
- We removed the ``--aditional-input-file-list`` option to pVACseq. Readcount and
expression information are now taken directly from the VCF annotations.
Instructions on how to add these annotations to your input VCF can be found
on the :ref:`prerequisites_label` page.
- We added support for variants to pVACseq that are only annotated as
``protein_altering_variant`` without a more specific consequence of
``missense_variant``, ``inframe_insertion``, ``inframe_deletion``, or ``frameshift_variant``.
- We resolved some syntax differences that prevented pVACtools from being run
under python 3.6 or python 3.7. pVACtools should now be compatible with all
python3 versions.
- This version introduces a new tool, ``pVACbind``, which can be used
to run our immunotherapy pipeline with a peptides
FASTA file as input. This new tool is similar to pVACseq but certain
options and filters are removed:

- All input sequences are interpreted in isolation so corresponding
wildtype sequence and score information are not assigned. As a consequence,
the filter threshold option on fold change is removed.
- Because the input format doesn't allow for association of readcount,
expression or transcript support level data, pVACbind doesn't run the coverage
filter or transcript support level filter.
- No condensed report is generated.

Please see the :ref:`pvacbind` documentation for more information.

- pVACfuse now support annotated fusion files from `AGFusion <https://github.com/murphycj/AGFusion>`_ as input. The
:ref:`pvacfuse` documentation has been updated with instructions on how to
run AGFusion in the Prerequisites section.
- The top score filter has been updated to take into account alternative known
transcripts that might result in non-indentical peptide sequences/epitopes.
The top score filter now picks the best epitope for every available transcript of a
variant. If the resulting list of epitopes for one variant is not identical,
the filter will output all eptiopes. If the resulting list of epitopes for one
variant are identical, the filter only outputs the epitope for the transcript with the highest
transcript expression value. If no expression data is available, or if
multiple transcripts remain, the filter outputs the epitope for the
transcripts with the lowest transcript Ensembl ID.
- This version adds a few new options to the ``pvacseq
generate_protein_fasta`` command:

- The ``--mutant-only`` option can be used to only output mutant peptide
sequences instead of mutant and wildtype sequences.
- This command now has an option to provide a pVACseq all_eptiopes or
filtered TSV file as an input (``--input-tsv``). This will limit the
output fasta to only sequences that originated from the variants in that file.

- This release adds a ``pvacfuse generate_protein_fasta`` command that works
similarly to the ``pvacseq generate_protein_fasta`` command but works with
Integrate-NEO or AGFusion input files.
- We removed the sorting of the all_epitopes result file in order to reduce
memory usage. Only the filtered files will be sorted. This version also updates the sorting algorithm of the
filtered files as follows:

- If the ``--top-score-metric`` is set to ``median`` the results are first
filtered by the ``Median MT Score``. If multiple epitopes have the same
``Median MT Score`` they are then filtered by the ``Corresponding Fold
Change``. The last sorting criteria is the ``Best MT Score``.
- If the ``--top-score-metric`` is set to ``lowest`` the results are first
filtered by the ``Best MT Score``. If multiple epitopes have the same
``Best MT Score`` they are then filtered by the ``Corresponding Fold
Change``. The last sorting criteria is the ``Median MT Score``.

- pVACseq, pVACfuse, and pVACbind now calculcate manufacturability metrics
calculated for the predicted epitopes. Manufacturability metrics are also
calculcated for all protein sequences when running the ``pvacseq generate_protein_fasta``
and ``pvacfuse generate_protein_fasta`` commands. They are saved in the ``.manufacturability.tsv``
along to the result fasta.
- The pVACseq score that gets calculated for epitopes in the condensed report
is now converted into a rank. This will hopefully remove any confusion about
whether the previous score could be treated as an absolute measure of
immunogencity, which it was not intended for. Converting this score to a
rank ensures that it gets treated in isolation for only the epitopes in the
condensed file.
- The condensed report now also outputs the mutation position as well as the
full set of lowest and median wildtype and mutant scores.
- This version adds a clear cache function to pVACapi that can be called by
running ``pvacapi clear_cache``. Sometimes pVACapi can get into a state
where the cache file contains conflicting data compared to the actual
process outputs which results in errors. Clearing the cache using the ``pvacapi clear_cache``
function can be used in that situation to resolve these errors.

Past release notes can be found on our :ref:`releases` page.

Expand Down
20 changes: 20 additions & 0 deletions docs/pvacbind.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
.. image:: images/pVACbind_logo_trans-bg_sm_v4b.png
:align: right
:alt: pVACbind logo

.. _pvacbind:

pVACbind
====================================

This component of the pVACtools is used to predict neoantigens for the peptides in a FASTA file.

.. toctree::
:glob:

pvacbind/prerequisites
pvacbind/getting_started
pvacbind/run
pvacbind/output_files
pvacbind/filter_commands
pvacbind/additional_commands
25 changes: 25 additions & 0 deletions docs/pvacbind/additional_commands.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
.. image:: ../images/pVACbind_logo_trans-bg_sm_v4b.png
:align: right
:alt: pVACbind logo

Additional Commands
===================

To make using pVACbind easier, several convenience methods are included in the package.

.. _pvacbind_example_data:

Download Example Data
---------------------

.. program-output:: pvacbind download_example_data -h

List Valid Alleles
------------------

.. program-output:: pvacbind valid_alleles -h

List Allele-Specific Cutoffs
----------------------------

.. program-output:: pvacbind allele_specific_cutoffs -h
50 changes: 50 additions & 0 deletions docs/pvacbind/filter_commands.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
.. image:: ../images/pVACbind_logo_trans-bg_sm_v4b.png
:align: right
:alt: pVACbind logo

Filtering Commands
=============================

pVACbind currently offers two filters: a binding filter and a top score filter.

These filters are always run automatically as part
of the pVACbind pipeline using default cutoffs.

All filters can also be run manually on the filtered.tsv file to narrow the results down further,
or they can be run on the all_epitopes.tsv file to apply different filtering thresholds.

The binding filter is used to remove neoantigen candidates that do not meet desired peptide:MHC binding criteria.
The top score filter is used to select the most promising peptide candidate for each variant.
Multiple candidate peptides from a single somatic variant can be caused by multiple peptide lengths, registers, HLA alleles,
and transcript annotations.

Further details on each of these filters is provided below.

Binding Filter
--------------

.. program-output:: pvacbind binding_filter -h

The binding filter removes variants that don't pass the chosen binding threshold.
The user can chose whether to apply this filter to the ``lowest`` or the ``median`` binding
affinity score by setting the ``--top-score-metric`` flag. The ``lowest`` binding
affinity score is recorded in the ``Best MT Score`` column and represents the lowest
ic50 score of all prediction algorithms that were picked during the previous pVACseq run.
The ``median`` binding affinity score is recorded in the ``Median MT Score`` column and
corresponds to the median ic50 score of all prediction algorithms used to create the report.
Be default, the binding filter runs on the ``median`` binding affinity.

By default, entries with ``NA`` values will be included in the output. This
behavior can be turned off by using the ``--exclude-NAs`` flag.

Top Score Filter
----------------

.. program-output:: pvacbind top_score_filter -h

This filter picks the top epitope for a variant. By default the
``--top-score-metric`` option is set to ``median`` which will apply this
filter to the ``Median MT Score`` column and pick the epitope with the lowest
median mutant ic50 score for each variant. If the ``--top-score-metric``
option is set to ``lowest``, the ``Best MT Score`` column is instead used to
make this determination.
23 changes: 23 additions & 0 deletions docs/pvacbind/getting_started.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
.. image:: ../images/pVACbind_logo_trans-bg_sm_v4b.png
:align: right
:alt: pVACbind logo

Getting Started
---------------

pVACbind provides a set of example data to show the expected format of input and output files.
You can download the data set by running the ``pvacbind download_example_data`` :ref:`command <pvacbind_example_data>`.

The example data output can be reproduced by running the following command:

.. code-block:: none
pvacbind run \
<example_data_dir>/input.fasta \
Test \
HLA-A*02:01,HLA-B*35:01,DRB1*11:01 \
MHCflurry MHCnuggetsI MHCnuggetsII NNalign NetMHC PickPocket SMM SMMPMBEC SMMalign \
<output_dir> \
-e 8,9,10
A detailed description of all command options can be found on the :ref:`Usage <pvacbind_run>` page.
89 changes: 89 additions & 0 deletions docs/pvacbind/output_files.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
.. image:: ../images/pVACbind_logo_trans-bg_sm_v4b.png
:align: right
:alt: pVACbind logo

Output Files
============

The pVACbind pipeline will write its results in separate folders depending on
which prediction algorithms were chosen:

- ``MHC_Class_I``: for MHC class I prediction algorithms
- ``MHC_Class_II``: for MHC class II prediction algorithms
- ``combined``: If both MHC class I and MHC class II prediction algorithms were run, this folder combines the neoeptiope predictions from both

Each folder will contain the same list of output files (listed in the order
created):

.. list-table::
:header-rows: 1

* - File Name
- Description
* - ``<sample_name>.tsv``
- An intermediate file with variant information parsed from the input files.
* - ``<sample_name>.tsv_<chunks>`` (multiple)
- The above file but split into smaller chunks for easier processing with IEDB.
* - ``<sample_name>.all_epitopes.tsv``
- A list of all predicted epitopes and their binding affinity scores, with
additional variant information from the ``<sample_name>.tsv``.
* - ``<sample_name>.filtered.tsv``
- The above file after applying all filters, with cleavage site and stability
predictions added.

all_epitopes.tsv and filtered.tsv Report Columns
------------------------------------------------

.. list-table::
:header-rows: 1

* - Column Name
- Description
* - ``Mutation``
- The FASTA ID of the peptide sequence the epitope belongs to
* - ``HLA Allele``
- The HLA allele for this prediction
* - ``Sub-peptide Position``
- The one-based position of the epitope in the protein sequence used to make the prediction
* - ``Epitope Seq``
- The epitope sequence
* - ``Median Score``
- Median ic50 binding affinity of the epitope of all prediction algorithms used
* - ``Best Score``
- Lowest ic50 binding affinity of all prediction algorithms used
* - ``Best Score Method``
- Prediction algorithm with the lowest ic50 binding affinity for this epitope
* - ``Individual Prediction Algorithm Scores`` (multiple)
- ic50 scores for the ``Epitope Seq`` for the individual prediction algorithms used
* - ``cterm_7mer_gravy_score``
- Mean hydropathy of last 7 residues on the C-terminus of the peptide
* - ``max_7mer_gravy_score``
- Max GRAVY score of any kmer in the amino acid sequence. Used to determine if there are any extremely
hydrophobic regions within a longer amino acid sequence.
* - ``difficult_n_terminal_residue`` (T/F)
- Is N-terminal amino acid a Glutamine, Glutamic acid, or Cysteine?
* - ``c_terminal_cysteine`` (T/F)
- Is the C-terminal amino acid a Cysteine?
* - ``c_terminal_proline`` (T/F)
- Is the C-terminal amino acid a Proline?
* - ``cysteine_count``
- Number of Cysteines in the amino acid sequence. Problematic because they can form disulfide bonds across
distant parts of the peptide
* - ``n_terminal_asparagine`` (T/F)
- Is the N-terminal amino acid a Asparagine?
* - ``asparagine_proline_bond_count``
- Number of Asparagine-Proline bonds. Problematic because they can spontaneously cleave the peptide
* - ``Best Cleavage Position`` (optional)
- Position of the highest predicted cleavage score
* - ``Best Cleavage Score`` (optional)
- Highest predicted cleavage score
* - ``Cleavage Sites`` (optional)
- List of all cleavage positions and their cleavage score
* - ``Predicted Stability`` (optional)
- Stability of the pMHC-I complex
* - ``Half Life`` (optional)
- Half-life of the pMHC-I complex
* - ``Stability Rank`` (optional)
- The % rank stability of the pMHC-I complex
* - ``NetMHCstab allele`` (optional)
- Nearest neighbor to the ``HLA Allele``. Used for NetMHCstab prediction
8 changes: 8 additions & 0 deletions docs/pvacbind/prerequisites.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. image:: ../images/pVACbind_logo_trans-bg_sm_v4b.png
:align: right
:alt: pVACbind logo

Prerequisites
=============

The input to pVACbind is a FASTA file of peptide sequences.
16 changes: 16 additions & 0 deletions docs/pvacbind/run.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
.. image:: ../images/pVACbind_logo_trans-bg_sm_v4b.png
:align: right
:alt: pVACbind logo

.. _pvacbind_run:

Usage
====================================

.. warning::
Using a local IEDB installation is strongly recommended for larger datasets
or when the making predictions for many alleles, epitope lengths, or
prediction algorithms. More information on how to install IEDB locally can
be found on the :ref:`Installation <iedb_install>` page.

.. program-output:: pvacbind run -h
Loading

0 comments on commit 75867e8

Please sign in to comment.