memory leak #18

SolskGaer · 2018-01-30T06:59:59Z

when passing a long text, like 20MB long, the script will soon run out of the computer's memory. My machine has 16GB of RAM, it takes about 5 minutes to freeze.

romanovzky · 2018-07-25T08:16:34Z

I notice this as well, huge memory leak that does not improve by invoking python's garbage collector

fbarrios · 2018-07-26T12:23:27Z

Hi! I will look this up, but the TextRank algorithm is actually designed to summarise articles. I don't know if results with 20MB input can be trusted.

romanovzky · 2018-07-27T08:50:52Z

Hi @fbarrios ,
I've noticed the memory leak if you attempt to summarise/extract keywords from many small documents. It seems that it might be retaining in memory things that it shouldn't (previous graphs maybe?).
Attempt it yourself, get an extensive corpus of small articles (like news articles) and extract keywords from all of them, and you'll very quickly see the used memory jumping to +16GB.
Cheers

Panaetius · 2019-02-20T12:28:36Z

Profiling with cProfile, when the issue occurs for me, execution seems to be stuck in graph.py:159:edge_weight(), with tons of calls. The relevant edge_weight call is in pagerank_weighted.py:50:build_adjacency_matrix().

My guess is that it is due to there being too many sentences and/or too many connections between sentences, and since the adjacency matrix runtime is O(n^2) relative to number of sentences, this can blow up if there's lots of sentences.

A quick, dirty fix is to limit the number of sentences to some maximum in the keywords() method, though of course then you won't be analyzing the whole text anymore.

Switching over to a sparse adjacency matrix and estimating likely candidates, as described on https://cran.r-project.org/web/packages/textrank/vignettes/textrank.html (In the MinHash section) might make it feasible for larger documents. But I don't know your codebase or the textrank algorithm well enough to implement it myself easily.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory leak #18

memory leak #18

SolskGaer commented Jan 30, 2018

romanovzky commented Jul 25, 2018

fbarrios commented Jul 26, 2018

romanovzky commented Jul 27, 2018

Panaetius commented Feb 20, 2019 •

edited

Loading

memory leak #18

memory leak #18

Comments

SolskGaer commented Jan 30, 2018

romanovzky commented Jul 25, 2018

fbarrios commented Jul 26, 2018

romanovzky commented Jul 27, 2018

Panaetius commented Feb 20, 2019 • edited Loading

Panaetius commented Feb 20, 2019 •

edited

Loading