All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Deprecation warning. Update sam->bam in readme.
- Update to
dorado_stereo.sh
, do not create directories when using -n flag (dry run).
- wrapper for a prototype stereo-calling pipeline
dorado_stereo.sh
that does 2-stage stereo calling.
- update fastx iteration in
split_pairs
to be compatible with pyfastx>=0.9.0.
- Bug where
split_pairs
would raise a StopIteration if dataset has < 5k reads.
split_pairs
, a tool to recover non-split reads into their template/complement parts.
- Bug where approximate start times led to incorrectly rejecting candidate duplex pairs.
- Bug where aligned BAMs would be used as inputs to filter_pairs. This meant running
filter_pairs
on a guppy directory would report an incorrect duplex rate
- Bug where an incorrect tag was used to get sequence lengths from .bam files (for dorado)
- ProcessPoolExecutor -> ThreadPoolExecutor in filter_pairs for faster filtering
- Reporting duplex rate in duplex_tools pair
- Ability to use an unmapped bam (output from dorado) as input to pairs_from_summary
- Ability to use an unmapped bam (output from dorado) as input to filter_pairs
- Convenience script (
duplex_tools pair unmapped.bam
) to call both pairs_from_summary and filter_pairs on a bam- usage: duplex_tools pair unmapped.bam
- Bug where template/complement pairs with small negative time between them would not be chosen (rounding error).
- Option to set --no_end_penalties in filter_pairs. This option favours partial matches and avoids unbounded negative pairing scores
- Bug where template/complement pairs with no time between them would not be chosen
- Option to set --threads in filter_pairs
- Updated defaults in readme for duplex basecalling (
chunks_per_runner
16 -> 416)
- Update defaults in pairs_from_summary to
--min_qscore 7 --max_abs_seqlen_diff 1000
- Passed --trim_start and --trim_end from cli to main function for split_on_adapter
- Flags
--trim_start
and--trim_end
for split_on_adapter (#13)
- Flag to allow splitting multiple times for reads with multiple adapters
- Moving debug output and edited reads to the main output directory
- Removed explicit dependency on pathlib, which caused #7
- split_on_adapter also defaults to both .fastq and fastq.gz from cli
- Options --min_qscore and --max_abs_seqlen_diff to find duplex reads
- Default filtering on min_qscore (12) and maximum length difference in pairs_from_summary for duplex reads
- Surprising behaviour in split_on_adapter to only work on gzipped fastqs by default
- Arguments --max_length and --min_length in filter_pairs
- Fixed argument bug in split_on_adapter. sample_type is now positional
- Corrected documentation for fillet.md.
- Enabled multiprocessing for fastq extraction in
filter_pairs
.
- Fixed number formatting in number of pairs logging.
- Documentation for read splitting
- Project name to duplex tools.
- Create single entry point program with subcommands.
- Duplex read pairing and filtering programs.
- Integration test to run the main read_fillet entry point.
- Bug that caused compressed outputs to be missing the .gz extension.
- Bug that caused setting trim=0 to return an empty sequence.
- Updated README.
- First version.