Skip to content

Commit

Permalink
TurboPFor: ensure input (!) is padded, too
Browse files Browse the repository at this point in the history
We already ensured the output buffers we pass to TurboPFor are padded,
but it turns out that TurboPFor also reads past its input buffer!
(At least in the version we’re currently using.)

This manifested in crashes when merging multiple index files together,
and only first occurred starting a few weeks ago, presumably because a new
package entered the Debian archive that happens to trigger the bug.

Reading past the number of input bytes in the input buffer is not just an
invalid memory access, but a Segmentation Violation because we mmap
the input files.

fixes #123
  • Loading branch information
stapelberg committed Jan 16, 2024
1 parent ef1baf9 commit c10cc77
Showing 1 changed file with 24 additions and 5 deletions.
29 changes: 24 additions & 5 deletions internal/turbopfor/turbopfor.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,16 +44,26 @@ import (
"sync"
)

// Corresponding to #define P4NENC256_BOUND(n) ((n + 255) /256 + (n + 32) * sizeof(uint32_t))
// from https://github.com/powturbo/TurboPFor-Integer-Compression/issues/59
// Corresponding to p4nbound256v32:
//
// #define VP4BOUND(_n_, _esize_, _csize_) ((_esize_*_n_) + ((_n_+_csize_-1)/_csize_))
// size_t p4nbound256v32(size_t n) { return VP4BOUND(n, 4, 256); }
//
// see also https://github.com/powturbo/TurboPFor-Integer-Compression/issues/59
func EncodingSize(n int) int {
return ((n + 255) / 256) + (n+32)*4
}

// Corresponding to 32*4 extra bytes
// from https://github.com/powturbo/TurboPFor-Integer-Compression/issues/59
// Corresponding to p4nbound32:
//
// #define VP4BOUND(_n_, _esize_, _csize_) ((_esize_*_n_) + ((_n_+_csize_-1)/_csize_))
// size_t p4nbound32( size_t n) { return VP4BOUND(n, 4, 128); }
//
// see also https://github.com/powturbo/TurboPFor-Integer-Compression/issues/59
//
// see also https://github.com/powturbo/TurboPFor-Integer-Compression/issues/84
func DecodingSize(n int) int {
return n + 32*4
return ((n + 127) / 128) + (n+32)*4
}

// KNOWN WORKING
Expand Down Expand Up @@ -123,6 +133,15 @@ func P4nd1enc32(input []uint32) []byte {
}

func P4dec32(input []byte, output []uint32) (read int) {
// TurboPFor (at least older versions? see
// https://github.com/powturbo/TurboPFor-Integer-Compression/issues/84) read
// past their input buffer, so verify that the input buffer has enough
// capacity and use a temporary buffer if needed:
if required := DecodingSize(len(output)); cap(input) < required {
buffer := make([]byte, len(input), required)
copy(buffer, input)
input = buffer
}
return int(C.myp4dec32((*C.uchar)(&input[0]),
C.unsigned(len(output)),
(*C.uint32_t)(&output[0])))
Expand Down

0 comments on commit c10cc77

Please sign in to comment.