Skip to content

Commit

Permalink
Optimize v2/v3 read queries
Browse files Browse the repository at this point in the history
This solves a performance regression introduced in v0.6 when the read
algorithm was improved. The changes are too move the string comparision
of contig until after we've checked all integer based conditions.
Second, and more impactful, exit the intersection check early if the
regions have moved pasted the record. The regions are sorted so this
early exit is safe.
  • Loading branch information
Shelnutt2 committed Dec 9, 2020
1 parent 02d0f9b commit 4529721
Showing 1 changed file with 20 additions and 6 deletions.
26 changes: 20 additions & 6 deletions libtiledbvcf/src/read/reader.cc
Original file line number Diff line number Diff line change
Expand Up @@ -626,17 +626,18 @@ bool Reader::process_query_results_v3() {
for (; j < read_state_.regions.size(); j++) {
const auto& reg = read_state_.regions[j];

// If the region does not match the contig skip
if (reg.seq_name != std::get<2>(contig_info))
continue;

const uint32_t reg_min = reg.seq_offset + reg.min;
const uint32_t reg_max = reg.seq_offset + reg.max;

// If the vcf record is not contained in the region skip it
if (end < reg_min || real_start > reg_max)
if (real_start > reg_max)
continue;

// If the regions (sorted) are starting past the end of the record we can
// safely exit out, as we will not intersect this record anymore
if (end < reg_min)
break;

// Unless start is the real start (aka first record) then if we skip for
// any record greater than the region min the goal is to only capture
// starts which are within 1 anchor gap of the region start on the lower
Expand All @@ -649,6 +650,10 @@ bool Reader::process_query_results_v3() {
if (anchor_gap < reg_min && start < reg_min - anchor_gap)
continue;

// If the region does not match the contig skip
if (reg.seq_name != std::get<2>(contig_info))
continue;

// If we overflow when reporting this cell, save the index of the
// current region so that we restart from the same position on the
// next read. Otherwise, we will re-report the cells in regions with
Expand Down Expand Up @@ -733,9 +738,14 @@ bool Reader::process_query_results_v2() {
const uint32_t reg_max = reg.seq_offset + reg.max;

// If the vcf record is not contained in the region skip it
if (real_end < reg_min || start > reg_max)
if (start > reg_max)
continue;

// If the regions (sorted) are starting past the end of the record we can
// safely exit out, as we will not intersect this record anymore
if (real_end < reg_min)
break;

// Unless start is the real start (aka first record) then if we skip for
// any record greater than the region min the goal is to only capture
// starts which are within 1 anchor gap of the region start on the lower
Expand All @@ -748,6 +758,10 @@ bool Reader::process_query_results_v2() {
if (anchor_gap < reg_min && start < reg_min - anchor_gap)
continue;

// If the region does not match the contig skip
if (reg.seq_name != std::get<2>(contig_info))
continue;

// If we overflow when reporting this cell, save the index of the
// current region so that we restart from the same position on the
// next read. Otherwise, we will re-report the cells in regions with
Expand Down

0 comments on commit 4529721

Please sign in to comment.