-
-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Potential memory leak on longer conversations #552
Comments
Btw this has been happening for a while... basically as long as I can remember when doing longer conversations. Just more recently with the features to add references etc. it has been more valuable to have longer conversations so I have used it more. Point is, I have no reason to believe this is anything recent that has caused a regression or something. I have noticed it at least for several weeks but I believe longer as I did not utilize longer conversations as much prior to the last month or so. |
Can you share the adapter that you're using and any config changes you've made to it? Are you using a rendering plugin for any markdown files? We could use an Agentic Workflow in the prompt library to easily replicate multiple successive prompts. Off the top of my head, I can't fathom why after processing there would be so much memory usage. Unless something is getting attached to the event loop with every request. |
I've created the following workflow that prompts the LLM 10 times for:
You can see the output here. Once you start the prompt off with ---@diagnostic disable: missing-fields
--[[
NOTE: Set the config path to enable the copilot adapter to work.
It will search the following paths for a token:
- "$CODECOMPANION_TOKEN_PATH/github-copilot/hosts.json"
- "$CODECOMPANION_TOKEN_PATH/github-copilot/apps.json"
--]]
vim.env["CODECOMPANION_TOKEN_PATH"] = vim.fn.expand("~/.config")
vim.env.LAZY_STDPATH = ".repro"
load(vim.fn.system("curl -s https://raw.githubusercontent.com/folke/lazy.nvim/main/bootstrap.lua"))()
local constants = {
LLM_ROLE = "llm",
USER_ROLE = "user",
SYSTEM_ROLE = "system",
}
-- Your CodeCompanion setup
local plugins = {
{
"olimorris/codecompanion.nvim",
dependencies = {
{ "nvim-treesitter/nvim-treesitter", build = ":TSUpdate" },
{ "nvim-lua/plenary.nvim" },
},
opts = {
--Refer to: https://github.com/olimorris/codecompanion.nvim/blob/main/lua/codecompanion/config.lua
strategies = {
--NOTE: Change the adapter as required
chat = { adapter = "copilot" },
inline = { adapter = "copilot" },
},
opts = {
log_level = "DEBUG",
},
prompt_library = {
["Test workflow"] = {
strategy = "workflow",
description = "Use a workflow to test the plugin",
opts = {
index = 4,
},
prompts = {
{
{
role = constants.USER_ROLE,
content = "Generate a Python class for managing a book library with methods for adding, removing, and searching books",
opts = {
auto_submit = false,
},
},
},
{
{
role = constants.USER_ROLE,
content = "Write unit tests for the library class you just created",
opts = {
auto_submit = true,
},
},
},
{
{
role = constants.USER_ROLE,
content = "Create a TypeScript interface for a complex e-commerce shopping cart system",
opts = {
auto_submit = true,
},
},
},
{
{
role = constants.USER_ROLE,
content = "Write a recursive algorithm to balance a binary search tree in Java",
opts = {
auto_submit = true,
},
},
},
{
{
role = constants.USER_ROLE,
content = "Generate a comprehensive regex pattern to validate email addresses with explanations",
opts = {
auto_submit = true,
},
},
},
{
{
role = constants.USER_ROLE,
content = "Create a Rust struct and implementation for a thread-safe message queue",
opts = {
auto_submit = true,
},
},
},
{
{
role = constants.USER_ROLE,
content = "Write a GitHub Actions workflow file for CI/CD with multiple stages",
opts = {
auto_submit = true,
},
},
},
{
{
role = constants.USER_ROLE,
content = "Create SQL queries for a complex database schema with joins across 4 tables",
opts = {
auto_submit = true,
},
},
},
{
{
role = constants.USER_ROLE,
content = "Write a Lua configuration for Neovim with custom keybindings and plugins",
opts = {
auto_submit = true,
},
},
},
{
{
role = constants.USER_ROLE,
content = "Generate documentation in JSDoc format for a complex JavaScript API client",
opts = {
auto_submit = true,
},
},
},
},
},
},
},
},
}
require("lazy.minit").repro({
spec = plugins,
dev = {
path = "~/Code/Neovim",
-- Only load my local plugins when we're on my machine
patterns = (jit.os == "OSX") and { "olimorris" } or {},
},
})
-- Setup Tree-sitter
local ts_status, treesitter = pcall(require, "nvim-treesitter.configs")
if ts_status then
treesitter.setup({
ensure_installed = { "lua", "markdown", "markdown_inline", "yaml" },
highlight = { enable = true },
})
end |
Overall it "feels" like one of a potential few things:
|
Another thing to note. As of late, with Claude, I have to fairly often use the "kill" command in code companion as the response stops steaming / cutoff mid response (this happens regardless of small or large conversation randomly). After using kill, then I typically use the "regenerate last response" (or whatever it is called). I have noticed that sometimes it looks weird after that where sometimes it duplicates the "sharing" references. So not sure if that could create some sort of additional instance of the chat somehow going forward and cause something weird ... no reason to suspect this being the cause but it is something to note. |
Btw do you have any good tips and tricks for profiling stuff like this when I get into this memory leaked state? Like ways to check potential problem autocmds or duplication of autocmds etc? Or any way to "profile" the memory some how to see what potentially could be the cause of the leak and/or see which parts of neovim are delaying / chopping as I type in insert mode? |
Yeah agreed. The tacit knowledge I have of seeing these types of things over the years in Neovim makes me think something in the plugin is iteratively getting applied to the main event loop (or something along those lines) which causes Neovim to balloon in memory usage. Using the plugin to diagnose the plugin, there may be some Tree-sitter optimizations I can make actually so will implement that later.
I'm experiencing the same issue occasionally. I believe this is an issue on the Anthropic side. I've basically been able to recreate the workflow every other adapter but it's Anthropic (whether through Copilot or themselves) that always stalls and is substantially slower at responding. Once the request is made to an LLM, there's very little that CodeCompanion does other than receive and render the output and the logs indicate that Anthropic just hangs after a while. Perhaps that's the
My first step is to ask an LLM to try and spot any optimizations 😆. If we can recreate this with the minimal.lua file then I'll investigate some profiling tools further. |
Another thought is that the chat object could be self referencing and therefore becoming larger with every response. If that's the case we'll see that with the minimal.lua recreation. |
That is what it "feels" like. Curious if could be memory references held onto Chat object from like keymaps or something too? For example I have a keymap for hide (as you implemented the custom keymaps) and is given access to the chat object. Could there be anything with that? 🤷🏻♂️ |
Also I typically always am providing like 3 or 4 file references for context. Not typically very big. I am a pretty big modularity guy so rarely are any of my files larger than 100 lines. |
With the chat reference stuff you mentioned, I am wondering if it stems from the Claude api inconsistency which causes me to use the "kill" keymap and the "regenerate last response"? 🤔 maybe extra references, instances and / or circular references of some sort occurring? It definitely gets worse with more text in the buffer so whatever it is, that is a large factor I believe. |
I've reduced object nesting between the chat buffer and the references object in #561. If you could give that a test at some point would be very much appreciated. Previously, the references object had the whole chat buffer linked to it. Over a large request, I could absolutely expect that to cause an impact. I'll also review if there are any possible design choices that are affecting garbage collection. |
Awesome! And I'm hoping to get some time tonight to work which I will be able to do some testing. This week I'm traveling with the family to my in-laws for the holidays and promised my wife I won't work, so I won't have much time to this week ;) |
Haha I know that feeling 😄! Regardless, enjoy the holidays and thanks for all your contributions to the plugin this year. |
Ok @olimorris I am sorry to report that you are not going to like the findings (likely requires a large refactoring) 😬 But at least I believe I know the problem(s) so you know what needs to be fixed! :-) TLDR: The problem seems to be caused by the fact that you are utilizing treesitter markdown queries for a lot of the parsing of the chat buffer, adding/removing messages and references etc. The cause is because in order to successfully rely on treesitter queries, you have to ensure that the markdown in every response (and in combination with the entire chat buffer) adheres to proper markdown stucture / formatting, mostly as it pertains to Heading levels (H1 ( In addition to the fact that the LLM could be providing markdown in any sort of format depending on what the user asks for, their system prompt, their individual message prompts etc.... the user could also be adding markdown to their user message prompt as well. But the "ultimate" issue is the fact that it appear code companion relies on in different ways the It also appears that in different areas (I tried to dig into the code a bit to assess if my hunch was correct and see how big/small the problem may be) there may be some variations to how you use the treesitter queries to get what you want. For example, it seems that for a message response from the LLM, you look for the "last" (most recent) H2 ( Either way, relying on Markdown and its nested "hierarchy" I don't think is something you will ever be able to deterministicly determine, especially given users can use their own System Prompts which means you have no ensured control on the format the LLM must respond in or rules with regards to markdown levels. One potential idea where you could still use treesitter is that instead of trying to use treesitter to grab nodes and the test returned etc. and trying to "match up" an actual markdown structure hierarchy to get all content "within" a certain section (and its children), you could instead just look for
BUT this still comes with problems as well as there is nothing stopping the user from accidentally (or intentionally) changing any of the content in the conversation. Maybe they are writing a new message and accidentlaly delete the I believe, if possible, the "source of truth" of the conversation should just be the actual table/list of messages that get sent and received to/from the LLM each time (curl). So then you are not depending on the content of the chat buffer for the actual chat conversation. Of course this may make the Taking all of this info, I boil it down to causing two issues:
Ok, enough of all of that... now to the "evidence" that hopefully will help you understand / see the problems I described above. I tried to make these screenshots as self explanatory as possible so I will not add too much details about each but will number them so if you have any questions or comments on particular ones, you can reference them easily. NOTE: I used the keymap to remove the system prompt each time to control the prompt as much as possible. So if you want to re-create my examples, I suggest doing that. Example 1: Seems to look for the most recent H2 (
|
@olimorris sorry to drop this mammoth of findings on you but I wanted to document and provide to you before the holidays as I mentioned I likely will not be much available this week ;) While I won't likely be able to test things this week, I certainly will be able to response to messages / questions / ideas here as I look at my GitHub notifications on my phone quite often. So don't hesitate to ask questions or bounce ideas off me here. Hope you have a Happy Holidays!! |
Thank you for taking the time out of your holidays for this comprehensive post. Really, really appreciated. I've had a quick scan through and will think about possible solutions. I think there's a lot of optimization to do in the chat buffer still so will start thinking about that. Most importantly, Happy Holidays to you and your family! |
No problem and sounds good! Overall was it clear enough what my general findings (and opinions) were? Do you need any clarification on it or do you feel like you understand what I was trying to convey? |
Btw I did try your test workflow and I amended it a bit adding these 3 additional prompts to the end of your 10: {
{
role = constants.USER_ROLE,
content = 'I have asked for several different requests thus far. Can you please summarize each of my requests (number them) and provide an extremely detailed summary / overview for each. Each summary needs to be at minimum 3 paragraphs long. Include the original request with each as well.',
opts = {
auto_submit = true,
},
},
},
{
{
role = constants.USER_ROLE,
content = 'Now do the same for each request but provide any code that you originally responded with for each so that I have a good consolidated view of all code provided in our conversation.',
opts = {
auto_submit = true,
},
},
},
{
{
role = constants.USER_ROLE,
content = 'Please summarize the entire conversation to this point. Be as detailed and verbose as you possibly can. There should be a minimum of 50 paragraphs in your response.',
opts = {
auto_submit = true,
},
},
}, Additionally, after doing this with gpt4o (claude is slower and has the issue of "failing" too often for a giant test like this) I then "manually" asked this final follow-up question. This probably could be added to the workflow above as well but I want to document exactly what steps I took. After this, my nvim.exe RAM was at 3gb. Please now summarize each original request in a few sentences as well as provide a summary of your response(s) to each request in another few sentences. |
Btw one last update. After above test and 3gb RAM "bloat", I used the code companion "clear chat" keymap and the memory still did not get freed up. And with a completely empty chat, when I typed it was still choppy (albeit it a tad less choppy) and especially using slash commands The point is, there is definitely large amount of data and/or memory references of variables etc. or something that is "stuck" and does not get freed up until actually using the |
This is the workaround I use that helps a bit. On You should experience some improvements. Here's a partial code dump on my config -- Enable Treesitter highlighting for buffer
local function enable_treesitter(bufnr)
bufnr = bufnr or vim.api.nvim_get_current_buf()
vim.treesitter.start(bufnr)
end
-- Disable Treesitter highlighting for buffer
local function disable_treesitter(bufnr)
bufnr = bufnr or vim.api.nvim_get_current_buf()
if vim.treesitter.highlighter.active[bufnr] then
vim.treesitter.stop(bufnr)
end
end
local function enable_render_markdown()
local status_ok, render = pcall(require, "render-markdown")
if status_ok then
vim.schedule(render.enable)
end
end
local function disable_render_markdown()
local status_ok, render = pcall(require, "render-markdown")
if status_ok then
render.disable()
end
end
local function handle_enable()
vim.schedule(enable_treesitter)
vim.schedule(enable_render_markdown)
vim.schedule(function()
vim.cmd([[silent! normal! Gzb]])
end)
end
local function handle_disable()
disable_render_markdown()
disable_treesitter()
end
function M.opts()
...
aucmd({ "User" }, handle_disable, group, {
pattern = { "CodeCompanionRequestStarted", "CodeCompanionAgentStarted" },
})
aucmd({ "User" }, vim.schedule_wrap(handle_enable), group, {
pattern = { "CodeCompanionRequestFinished", "CodeCompanionAgentFinished" },
})
...
return {opts}
|
Ready to start looking at this once again. Unfortunately, I still cannot recreate this at all. @GitMurf can you confirm that you're seeing this in a I have now pushed a refactor which scopes the Tree-sitter queries. The plugin will no longer parse a whole buffer in order to get the latest messages from a user. I'm observing improvements of around 10-20+ms in large buffers. But I don't believe that will account for 3GB of ram usage. |
I think there are 2 issues going on that are related but also distinct (I believe):
An important note is that I have my system prompt instructed to return several pieces to each message response and have it returning in markdown broken into headings and subheadings. I think this contributes to the issues. I have sense instructed LLM to add single backticks around heading characters in response so treesitter does not consider it headings. Like I'll let you know if I am able to pinpoint the source of the lagging typing. |
And is render-markdown enabled? |
Have to retest again as haven't done any deep debugging on this issue since my original messages. But when I did test I did do testing without render markdown enabled and still got issue. But I know for sure I tested a large conversation, got laggy, saved the chat buffer as a markdown file, opened that markdown file of the chat and it was buttery smooth (with render markdown enabled). This is what makes me think it is some sort of autocmd or similar. |
Yeah! I too noticed the memory issue recently, I have no idea what's causing this but as soon as I close my neovim my ram magically freed up from 200Mb to 4Gb free. I am using tmux and can see the free ram in the statusline so whenever it is low I close and reopen neovim. Glad I am not the only one. |
@ravitemer if you can share information about how it occurred for you that would be great. Adapter/model, length of chat buffer, plugins enabled e.g render markdown. Are you using any tools when it occurs etc |
Sorry! I can't say that this plugin was the problem. If I tried it with minimal.lua I can't replicate it. But I am using lunarvim which with default config is taking around 300Mb on startup. If I add my plugins and navigate around and do some work, It's taking too much space like from 3.8Gb free ram to 900Gb even without opening CC Chat once. It's pretty amazing how as soon as I close nvim the memory magically frees up like in a second. So the problem must be with my config. The reason I got to notice this just recently is because I have started using tmux. Along that same time I found this bug while using codecompanion where just after 3-4 messages when I submit with the chat I get "No messages to submit" warning which is now gone. So while searching through discussions and issues I found this memory leak issue and thought this might it and started following. By the way! This plugin is awesome. Amazing work! I have to say working with codecompanion in neovim beats Cursor, Vscode. Super thankful for you work. |
Your
minimal.lua
configRan with nvim --clean but overall this is more of a FYI and curious whether you have experienced similar (or heard from others).
Error messages
N/A
Log output
N/A
Health check output
N/A (everything is fine)
Describe the bug
I have noticed the longer a conversation goes, the slower the code companion text input gets (choppy) and the memory consumption of nvim.exe rises continually with each message.
I am not talking humungous conversation, but also not small. Typically maybe 10-15 messages from me, but also pretty code heavy so responses are not short from LLM. The Code Companion buffer is around 700-1,000 lines (for the conversation) when it seems to start to feel more sluggish. And then it gets worse and worse from there.
In my most recent conversation this morning, nvim.exe balooned to 4GB of RAM! And the conversation was at 2,300 lines in the buffer FYI. When I quit/closed the conversation, the memory seemed to hang around even after opening a new chat buffer. It wasn't until I sent a new message in a new conversation that it paused (felt like it froze for a bit) in the middle of the response but then after waiting a bit it got responsive again and my nvim RAM consumption dropped back down to a normal level around 300MB. So it seems whatever "memory leak", it got cleared up after sending a new message to a new conversation after "deleting" / stopping / quitting the long/slow conversation buffer.
Let me know if you have any questions, ideas or thoughts on how I can provide more information / debug?
Reproduce the bug
No response
Final checks
minimal.lua
file from above and have shared thisThe text was updated successfully, but these errors were encountered: