-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Memory leak when using faidx::Reader::fetch_seq()
?
#401
Comments
This looks like a bug to me (and running your example through valgrind). I'm not sure how to fix this without changing the interface to the |
I guess the Wouldn't it be possible to keep the original setup and just do the free like this:
Also, maybe it makes sense to include the above MRE as a test case, to prevent regressions? But I'm not sure how big the test case is, and how easy it is to test for the memory leak... So it might be more important to get a fix in quickly... |
I don't think that version works because |
Then what about:
Does the |
And if this doesn't do the trick, maybe a |
I don't think to_bytes() clones. I just tested it (either calling to_bytes() in the return, or the second version that does it in the unsafe block) and valgrind complains about reading the memory block after free was called. This doesn't seem to work either:
Maybe it is cloning the ptr's address, not the actual data |
Hi everyone and sorry for my late reply I've tried my go at the source code, and concur with the observation that making My initial attemps involved trying to tie the lifetime of In any case, while I'm not a fan of the idea of removing
In other words, a documentation entry in the likes of this (see below) would -I hope- be a bit more transparent to users: /// Fetch the sequence as a byte array.
/// /* Snip */
///
/// # Safety
/// The returned static slice is internally allocated by `malloc()` and should be manually destroyed
/// by end users by calling `libc::free()` on it, e.g.:
/// ```no_run
/// let r = Reader::from_path("reference.fa")?;
/// unsafe {
/// let refseq = r.fetch_seq("chr1", 10, 20)?;
/// /* Do stuff... */
/// libc::free(refseq.as_ptr() as *mut std::ffi::c_void);
/// }
/// ```
pub unsafe fn fetch_seq<N: AsRef<str>>(&self, name: N, begin: usize, end: usize) -> Result<&'static [u8]> {
/* Snip */
} However, I would also argue that commiting to this signature, and thus ask users to manually handle memory kinda defeats the purpose of using Rust in the first place... I think a better long-term approach would be to provide with a safer(-ish) implementation, which introduces a smart pointer as a return value: pub struct RefSeq(&'static [u8]);
impl Drop for RefSeq {
fn drop(&mut self) {
unsafe { libc::free(self.0.as_ptr() as *mut ffi::c_void)};
}
}
impl std::ops::Deref for RefSeq {
type Target = &'static [u8];
fn deref(&self) -> &Self::Target {
&self.0
}
} Using /// Fetch the sequence as a byte array, wrapped within a smart pointer.
///
/// # Arguments
/// [...]
///
/// # Usage
/// ```
/// # use rust_htslib::faidx;
/// let r = faidx::Reader::from_path("test/test_cram.fa").unwrap();
/// let bytes = r.fetch_seq_safe("chr3", 20, 26).unwrap();
/// let refstr = std::str::from_utf8(*bytes).unwrap();
/// assert_eq!(refstr, "ATTTCAG");
/// ```
pub fn fetch_seq_safe<N: AsRef<str>>(&self, name: N, begin: usize, end: usize) -> Result<RefSeq> {
let bytes = unsafe { self.fetch_seq(name, begin, end)? };
Ok(RefSeq(bytes))
}
pub fn fetch_seq_string<N: AsRef<str>>(&self, name: N, begin: usize, end: usize) -> Result<String> {
let bytes = self.safe_fetch_seq(name, begin, end)?;
Ok(std::str::from_utf8(*bytes).unwrap().to_owned())
} Testing out
|
Timeline | Bottom-up stacktrace |
---|---|
(can confirm this behavior is equally displayed when patching my previously linked MRE)
I can also confirm, through a quick'n'dirty criterion
benchmark, that this solution does not carry any negative impact on runtime performance:
fetch_seq() bench output |
fetch_seq_safe() bench output |
---|---|
Unit-testing the memory consumption of fetch_seq()
As a follow up to @dlaehnemann 's suggestion:
Also, maybe it makes sense to include the above MRE as a test case, to prevent regressions? But I'm not sure how big the test case is, and how easy it is to test for the memory leak... So it might be more important to get a fix in quickly...
I'm hoping the unit-test described below could work. However, the solution I'm providing is far from being perfect, given that it :
- requires the addition of
memory-stats
as a dev-dependency. - requires the use of arbitrary parameters (
test_duration
andmemory_leeway
- hoping these variable names are self-explanatory), sincememory-stats
can sometimes be fairly inaccurate when estimating the resident set size of a software. - In terms of overall design, this is pretty much a dumb spin-loop which ensures the memory size remains generally constant for
test_duration
seconds... "Keep it stupid simple", as they say, but I'm sure a smarter and more accurate solution can be found.
fn get_rss() -> Result<usize, &'static str>{
memory_stats::memory_stats()
.map(|usage| usage.physical_mem)
.ok_or("Cannot access physical memory")
}
#[test]
#[ignore] // cargo test memory_leak -- --ignored --test-threads=1
fn fetch_seq_memory_leak() -> Result<(), &'static str> {
std::thread::sleep(std::time::Duration::from_secs(5)); // warmup
let test_duration = 15; // seconds
let memory_leeway = 500_000; // bytes
let r = open_reader();
let base_rss = get_rss()?;
let start = std::time::Instant::now();
while (std::time::Instant::now() - start).as_secs() < test_duration {
let bytes = r.fetch_seq_safe("chr1", 1, 120).unwrap();
let seq = std::str::from_utf8(&bytes).expect("Invalid UTF-8 sequence");
assert!(get_rss()? - base_rss < memory_leeway);
}
Ok(())
}
Notes :
- All Changes and suggestions above can be found within MaelLefeuvre@643ecd
RefSeq
andfetch_seq_safe()
are probably not the best suited names.
Cheers!
Is there something here that I can do to get this fix implemented? I am seeing the same memory problem in my application. |
I'd also be happy to contribute and/or submit a PR proposal if the current contributors of this project deem it acceptable. In the meantime, if you don't wish to use a forked project, a simple temporary workaround for your application might be to manually call A pseudo code example: loop {
/* Setup code*/
let refseq = reference.fetch_seq(chr, start, end)?;
/* Do stuff with refseq */
unsafe {libc::free(refseq.as_ptr() as *mut std::ffi::c_void)}; // Free up memory
} You'll probably need to import libc within your [dependencies]
libc = "0.2" Hope this helps, until a more stable solution to the issue is found. Cheers |
Excellent, thank you for the detailed solution! |
temporarily, see rust-bio/rust-htslib#401
Bug description
Hello,
I'm currently working on a simple rust utility that requires me to read and compare bam records with its matching reference sequence.
My approach ended up using the
rust_htslib::faidx::Reader::fetch_seq()
. However, i'm now noticing my binary is experiencing quite the serious memory leak (see screen captures below)Profiling the binary through
heaptrack
, clearly points out to the ffi call tohtslib::faidx_fetch_seq64()
within thefaidx::Reader::fetch_seq()
method (Seesrc/faidx/mod.rs:73-79
)Minimally reproducible example
A simple loop such as the one below is enough to create a steady increase of the binary's resident set size, while I would expect the memory usage to pretty much constant in this example.
A small public github repository containing the example above, along with a dummy reference file can be found here: rust-htslib-memleak-mre
Some thoughts:
Of course, I might be completely misusing the
fetch_seq()
method - hence the '?' in this issue's title. Regardless, be it a safety or documentation issue, I'm thinking this might need some resolving.Anyway, Any help, insight, or suggestion regarding this matter would be greatly appreciated. Also, I'm always happy to contribute on a pull request if requested.
Cheers !
The text was updated successfully, but these errors were encountered: