Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak when running encoding in ThreadPool #1854

Open
JoanFM opened this issue Mar 3, 2023 · 4 comments
Open

Memory Leak when running encoding in ThreadPool #1854

JoanFM opened this issue Mar 3, 2023 · 4 comments

Comments

@JoanFM
Copy link

JoanFM commented Mar 3, 2023

I have seen a curious behavior when running the encoding of a sentence-transformer model insida a threadPool.

Look at this code which runs with no problem and constant memory consumption:

from sentence_transformers import SentenceTransformer

if __name__ == '__main__':
    model = SentenceTransformer('msmarco-distilbert-base-v3', device='cpu')


    def f():
        texts = ['testsdkjsdlfajsdlslfk jofjiwo wofj owifjwo ijwoifj ofj o3jpovpopor3j'] * 30
        embeddings = model.encode(texts)
        print(f' embeddings shape {embeddings.shape}')


    while True:
        f()

On the other hand, this code blows the system memory and rapidly leads to an OOM:

from sentence_transformers import SentenceTransformer
import asyncio

if __name__ == '__main__':
    model = SentenceTransformer('msmarco-distilbert-base-v3', device='cpu')


    def f():
        texts = ['testsdkjsdlfajsdlslfk jofjiwo wofj owifjwo ijwoifj ofj o3jpovpopor3j'] * 30
        embeddings = model.encode(texts)
        print(f' embeddings shape {embeddings.shape}')


    while True:
        loop = asyncio.new_event_loop()
        loop.run_in_executor(None, f)

The only difference is that the encoding happens inside a Thread.

@JoanFM
Copy link
Author

JoanFM commented Mar 3, 2023

Seems potentially related to:

pytorch/pytorch#64412

@chschroeder
Copy link

In #1795 a sample similar to your non-threaded variant also seems to fail. Not sure if this is related but there have been multiple issues regarding memory leaks lately.

@chschroeder
Copy link

chschroeder commented Mar 13, 2023

Update: I just had a few minutes and tried the above script when everything within the main loop of encode() is deleted except for tokenize():

for start_index in trange(0, len(sentences), batch_size, desc="Batches", disable=not show_progress_bar):
    sentences_batch = sentences_sorted[start_index:start_index + batch_size]
    features = self.tokenize(sentences_batch)
    del features
return []

The leak is showing even then. Once I delete tokenize (and return constant output) the leak is gone. So there is at least a problem in tokenize().

(Off Topic: Why is this extra Transformer class needed?)

This lead me to a very old issue here: huggingface/transformers#197. This is my progress so far, will stop now, but maybe these notes will help.

Edit: similar issue here

Edit2:
I forgot to add that I changed to model to paraphrase-multilingual-MiniLM-L12-v2. With the above script:

  • sentence-transformers/msmarco-distilbert-base-v3: no leak
  • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2: leak
  • sentence-transformers/all-MiniLM-L12-v2: no leak
  • sentence-transformers/paraphrase-mpnet-base-v2: no leak

Seems there are multiple problems but the tokenizer is only an issue for the model sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. I stumbled upon this one because I had problems with this model in another context.

@AnghelRA
Copy link

AnghelRA commented Apr 5, 2024

Found the same issue when encoding an image with clip, when in a thread the memory continues to increase, but when running in a loop, the memory remains constant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants