Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocation of 360434219 exceeds 10% of free system memory. #588

Open
icrecescu opened this issue Nov 25, 2024 · 12 comments
Open

Allocation of 360434219 exceeds 10% of free system memory. #588

icrecescu opened this issue Nov 25, 2024 · 12 comments

Comments

@icrecescu
Copy link

icrecescu commented Nov 25, 2024

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04 x86_64): Linux x86_64 in a Docker cointainer
  • TensorFlow installed from (source or binary):
  • TensorFlow version (use command below): 1.0.0-RC.2
  • Java version (i.e., the output of java -version): openjdk version "21.0.4"
  • Java command line flags (e.g., GC parameters):
  • Python version (if transferring a model trained in Python): 3.9
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:

Describe the current behavior
I am using TensorFlow in a Spring Boot application, which exposes an endpoint for NER processing. The TensorFlow model is trained in Python and loaded into the Java application for inference.

To optimize performance, I initialize the TensorFlow session once during application startup using a @PostConstruct method and store it in a private field:

private Session session;

@PostConstruct
private void initialize() throws IOException {
    byte[] bytes = Files.readAllBytes(Paths.get("/path/to/model/"));
    Graph graph = new Graph();
    graph.importGraphDef(GraphDef.parseFrom(bytes), "PREFIX");
    session = new Session(graph);
}

The session is reused in a public method for running predictions:

public Result predict(String input) {
    try (Tensor textTensor = Tensor.of(TINT32.class, ...);
         Result result = session.runner()
                                .feed("otherOperationName", textTensor)
                                .fetch("operationName")
                                .run()) {
        // Process the result here
    }
}

During performance testing, I monitored the heap memory and found no significant issues. However, when the application runs in a Docker container, it crashes after a while, regardless of the memory allocated to the container (even with 120GB of memory). The following warning appears in the logs before the crash:

W external/local_tsl//framework/cpu_allocator_impl.cc:83] Allocation of 34891293 exceeds 10% of free system memory.

Is it possible that the memory leak is caused by the session being stored in a private field and never explicitly closed, even though all tensors and intermediate results are properly managed (closed) in the predict method?

Describe the expected behavior
The application should not exhibit memory leaks or crashes when deployed in a Docker container, regardless of memory allocation.

@Craigacp
Copy link
Collaborator

What model are you using, how big are the inputs and how big are the outputs?

@icrecescu
Copy link
Author

Hi @Craigacp, thank you for your response! To clarify, the model uses a protobuf file. The input consists of phrases that are transformed into six arrays. The size of these arrays depends on the number of words in the input phrases, which typically contain around 20 words. From these arrays, six tensors are created and used as input to TensorFlow.

The model outputs two tensors as a result of the computation.

I should also mention that the Docker container starts with approximately 5GB of memory usage. Over time, after handling thousands upon thousands of requests, the memory usage grows significantly and eventually reaches around 100GB.

@Craigacp
Copy link
Collaborator

Ok, so you have 6 inputs which are roughly 20-30 ints long? Or have you embedded them externally? And the output is what dimension?

Presumably you're closing all the input tensors not just the main one?

You should explicitly close the session when you're done, but that wouldn't cause a leak aside from the side of the model itself unless Spring is continually reconstructing your model wrapper. Can you try using the concrete function loader rather than a bare saved model?

@icrecescu
Copy link
Author

Yes, you’re correct — the inputs are typically 20-30 integers long. The model produces two output tensors:

A tensor with a constant shape of DenseTFloat32[134][134].
Another tensor with a shape of DenseTFloat32[1][Number of words][134].
All the input tensors are managed inside a try-with-resources block to ensure proper cleanup. However, I’m not entirely sure what you mean by “using the concrete function.” Could you clarify?

Additionally, I should mention that at some point, I iterate over the results and perform further operations. This involves creating a new EagerSession (or potentially reusing an existing one — I’m unsure how this works behind the scenes). Here's an example of how I’m doing this:

EagerSession session = EagerSession.create();
Ops tf = Ops.create(session);
for(int i = 0; i < result.shape.get(0)){
    Add<TFloat32> v = tf.math.add(...);
    Max<Tfloat32> max = tf.max(...);
}
session.close();

In this example, I explicitly close the EagerSession after use. Is it possible I’m missing something here that could contribute to the issue?

@Craigacp
Copy link
Collaborator

A SavedModelBundle exposes a call operator for the main function (and other functions) in the saved model - https://github.com/tensorflow/java/blob/master/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/SavedModelBundle.java#L446, that might be worth exploring.

The eager session might be causing trouble too. @karllessard do you remember the correct way to clean up after an eager session?

@karllessard
Copy link
Collaborator

Storing your session as a private field of a Spring Bean shouldn't cause any trouble, you'll be able to reuse it for all predictions. For the eager session, since you are explicitly closing it, you should be fine too.

Your error message says Allocation of 34891293 exceeds 10% of free system memory., meaning that you try to allocate something of 33Mb at some point during the inference? That way beyond the size of your inputs/outputs, do you what ops in your graph trigger such allocation? Since your session is loaded only once, it shouldn't be the variables themselves.

@icrecescu
Copy link
Author

Hi @karllessard, thank you for your input! I should clarify that I trimmed the size number when posting — the actual allocation size is closer to 36GB. I’m not aware of any operations in my setup that could trigger such a massive memory allocation.

Could you please elaborate on what might be causing this issue and what I should check to investigate further? Is there a possibility that the model itself is broken or improperly serialized?

Thanks in advance for your help!

@karllessard
Copy link
Collaborator

During performance testing, I monitored the heap memory and found no significant issues.

One idea: If you could monitor the native memory (i.e. the memory used by the JVM process itself), it can give you a better hint. Check if it leaks even outside docker. The heap memory won't tell you much about this kind of leaks.

Now why the model itself needs to allocate that much memory all of a sudden? It is hard to tell without knowing much about the model architecture itself.

@pluppens
Copy link

@icrecescu You can try to use JeMalloc. You can inject and configure it using environment variables, so it's not too much of a hassle to try it. It can also profile & detect leaks.

You can also try 1.0.0-rc1 - I have a sneaking suspicion that there's a native memory leak in RC2 using the setup you described above (using a single Session), but I'll need more time to confirm that.

@pluppens
Copy link

I have a sneaking suspicion that there's a native memory leak in RC2 using the setup you described above (using a single Session), but I'll need more time to confirm that.

Apologies - it seems my suspicions were unfounded, and just an (unhappy) coincidence.

@icrecescu
Copy link
Author

Hi @pluppens, thank you for your suggestions! I’ll give JeMalloc a try. I also suspect a native memory leak, as everything seems fine within the Java heap, as I described earlier. The issue becomes apparent when the app runs inside a container. In the Kubernetes monitoring dashboard, I can observe the memory usage gradually increasing over time. This also happens with RC1 and with 0.0.5. I'll try to create a sample project which reproduces this issue.

@karllessard
Copy link
Collaborator

Also @icrecescu in your eager operations (post-processing I think), do you retrieve values of the tensor via output.asTensor()? If so, you might want to release these tensors explicitly as well, I'm not sure if they are freed by the EagerSession

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants