Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Potential memory leak on longer conversations #552

Open
3 tasks done
GitMurf opened this issue Dec 20, 2024 · 29 comments
Open
3 tasks done

[Bug]: Potential memory leak on longer conversations #552

GitMurf opened this issue Dec 20, 2024 · 29 comments
Assignees
Labels
bug Something isn't working

Comments

@GitMurf
Copy link
Contributor

GitMurf commented Dec 20, 2024

Your minimal.lua config

Ran with nvim --clean but overall this is more of a FYI and curious whether you have experienced similar (or heard from others).

Error messages

N/A

Log output

N/A

Health check output

N/A (everything is fine)

Describe the bug

I have noticed the longer a conversation goes, the slower the code companion text input gets (choppy) and the memory consumption of nvim.exe rises continually with each message.

I am not talking humungous conversation, but also not small. Typically maybe 10-15 messages from me, but also pretty code heavy so responses are not short from LLM. The Code Companion buffer is around 700-1,000 lines (for the conversation) when it seems to start to feel more sluggish. And then it gets worse and worse from there.

In my most recent conversation this morning, nvim.exe balooned to 4GB of RAM! And the conversation was at 2,300 lines in the buffer FYI. When I quit/closed the conversation, the memory seemed to hang around even after opening a new chat buffer. It wasn't until I sent a new message in a new conversation that it paused (felt like it froze for a bit) in the middle of the response but then after waiting a bit it got responsive again and my nvim RAM consumption dropped back down to a normal level around 300MB. So it seems whatever "memory leak", it got cleared up after sending a new message to a new conversation after "deleting" / stopping / quitting the long/slow conversation buffer.

Let me know if you have any questions, ideas or thoughts on how I can provide more information / debug?

Reproduce the bug

No response

Final checks

  • I have made sure this issue exists in the latest version of the plugin
  • I have tested with the minimal.lua file from above and have shared this
  • I have shared the contents of the log file
@GitMurf GitMurf added the bug Something isn't working label Dec 20, 2024
@GitMurf
Copy link
Contributor Author

GitMurf commented Dec 20, 2024

Btw this has been happening for a while... basically as long as I can remember when doing longer conversations. Just more recently with the features to add references etc. it has been more valuable to have longer conversations so I have used it more.

Point is, I have no reason to believe this is anything recent that has caused a regression or something. I have noticed it at least for several weeks but I believe longer as I did not utilize longer conversations as much prior to the last month or so.

@olimorris
Copy link
Owner

Can you share the adapter that you're using and any config changes you've made to it? Are you using a rendering plugin for any markdown files?

We could use an Agentic Workflow in the prompt library to easily replicate multiple successive prompts.

Off the top of my head, I can't fathom why after processing there would be so much memory usage. Unless something is getting attached to the event loop with every request.

@olimorris
Copy link
Owner

I've created the following workflow that prompts the LLM 10 times for:

  1. Generate a Python class for managing a book library with methods for adding, removing, and searching books
  2. Write unit tests for the library class you just created
  3. Create a TypeScript interface for a complex e-commerce shopping cart system
  4. Write a recursive algorithm to balance a binary search tree in Java
  5. Generate a comprehensive regex pattern to validate email addresses with explanations
  6. Create a Rust struct and implementation for a thread-safe message queue
  7. Write a GitHub Actions workflow file for CI/CD with multiple stages
  8. Create SQL queries for a complex database schema with joins across 4 tables
  9. Write a Lua configuration for Neovim with custom keybindings and plugins
  10. Generate documentation in JSDoc format for a complex JavaScript API client

You can see the output here. Once you start the prompt off with <CR> it will automatically kick off the successive prompts. Can you let me know if you experience the same issues with this config? As you'll see from the video, CodeCompanion is performant throughout.

---@diagnostic disable: missing-fields

--[[
NOTE: Set the config path to enable the copilot adapter to work.
It will search the following paths for a token:
  - "$CODECOMPANION_TOKEN_PATH/github-copilot/hosts.json"
  - "$CODECOMPANION_TOKEN_PATH/github-copilot/apps.json"
--]]
vim.env["CODECOMPANION_TOKEN_PATH"] = vim.fn.expand("~/.config")

vim.env.LAZY_STDPATH = ".repro"
load(vim.fn.system("curl -s https://raw.githubusercontent.com/folke/lazy.nvim/main/bootstrap.lua"))()

local constants = {
  LLM_ROLE = "llm",
  USER_ROLE = "user",
  SYSTEM_ROLE = "system",
}

-- Your CodeCompanion setup
local plugins = {
  {
    "olimorris/codecompanion.nvim",
    dependencies = {
      { "nvim-treesitter/nvim-treesitter", build = ":TSUpdate" },
      { "nvim-lua/plenary.nvim" },
    },
    opts = {
      --Refer to: https://github.com/olimorris/codecompanion.nvim/blob/main/lua/codecompanion/config.lua
      strategies = {
        --NOTE: Change the adapter as required
        chat = { adapter = "copilot" },
        inline = { adapter = "copilot" },
      },
      opts = {
        log_level = "DEBUG",
      },
      prompt_library = {
        ["Test workflow"] = {
          strategy = "workflow",
          description = "Use a workflow to test the plugin",
          opts = {
            index = 4,
          },
          prompts = {
            {
              {
                role = constants.USER_ROLE,
                content = "Generate a Python class for managing a book library with methods for adding, removing, and searching books",
                opts = {
                  auto_submit = false,
                },
              },
            },
            {
              {
                role = constants.USER_ROLE,
                content = "Write unit tests for the library class you just created",
                opts = {
                  auto_submit = true,
                },
              },
            },
            {
              {
                role = constants.USER_ROLE,
                content = "Create a TypeScript interface for a complex e-commerce shopping cart system",
                opts = {
                  auto_submit = true,
                },
              },
            },
            {
              {
                role = constants.USER_ROLE,
                content = "Write a recursive algorithm to balance a binary search tree in Java",
                opts = {
                  auto_submit = true,
                },
              },
            },
            {
              {
                role = constants.USER_ROLE,
                content = "Generate a comprehensive regex pattern to validate email addresses with explanations",
                opts = {
                  auto_submit = true,
                },
              },
            },
            {
              {
                role = constants.USER_ROLE,
                content = "Create a Rust struct and implementation for a thread-safe message queue",
                opts = {
                  auto_submit = true,
                },
              },
            },
            {
              {
                role = constants.USER_ROLE,
                content = "Write a GitHub Actions workflow file for CI/CD with multiple stages",
                opts = {
                  auto_submit = true,
                },
              },
            },
            {
              {
                role = constants.USER_ROLE,
                content = "Create SQL queries for a complex database schema with joins across 4 tables",
                opts = {
                  auto_submit = true,
                },
              },
            },
            {
              {
                role = constants.USER_ROLE,
                content = "Write a Lua configuration for Neovim with custom keybindings and plugins",
                opts = {
                  auto_submit = true,
                },
              },
            },
            {
              {
                role = constants.USER_ROLE,
                content = "Generate documentation in JSDoc format for a complex JavaScript API client",
                opts = {
                  auto_submit = true,
                },
              },
            },
          },
        },
      },
    },
  },
}

require("lazy.minit").repro({
  spec = plugins,
  dev = {
    path = "~/Code/Neovim",
    -- Only load my local plugins when we're on my machine
    patterns = (jit.os == "OSX") and { "olimorris" } or {},
  },
})

-- Setup Tree-sitter
local ts_status, treesitter = pcall(require, "nvim-treesitter.configs")
if ts_status then
  treesitter.setup({
    ensure_installed = { "lua", "markdown", "markdown_inline", "yaml" },
    highlight = { enable = true },
  })
end

@GitMurf
Copy link
Contributor Author

GitMurf commented Dec 21, 2024

Can you share the adapter that you're using and any config changes you've made to it? Are you using a rendering plugin for any markdown files?

  • Using out of the box Claude.
  • modified the system prompt a bit but I see no way how that could be relevant.
  • using the recommended render markdown plugin. But I also copied the conversation into its own buffer and had no issues in it with the 2,000 lines of markdown.

We could use an Agentic Workflow in the prompt library to easily replicate multiple successive prompts.

  • I will test your provided 10 prompt flow when I have a chance this weekend.

Overall it "feels" like one of a potential few things:

  1. Somehow a bunch of extra autocmds being registered or something like that. Only insert mode seems to be affected. Normal mode can navigate fine.
  2. It increasingly slows down when using the autocomplete for slash commands / # / @ items. Almost as if the problem could be tied to autocmds / logic around completions.
  3. I can't think of a reason for the memory leak / increase to 4gb of RAM though to be caused by 1 or 2 🤷🏻‍♂️ so that is very odd.
  4. Also oddly enough as it gets worse, the streaming of chat responses renders slower and slower. It gets to a point where it stops actually rendering updates until I move my cursor. So to see the updated streamed response I have to keep using hjkl in normal mode to move the cursor to "initiate" a render refresh. Have no clue why this would be other than maybe some protection by neovim where it throttles the refresh rate when memory consumption gets high (and/or remaining resources gets low). 🤷🏻‍♂️

@GitMurf
Copy link
Contributor Author

GitMurf commented Dec 21, 2024

Another thing to note. As of late, with Claude, I have to fairly often use the "kill" command in code companion as the response stops steaming / cutoff mid response (this happens regardless of small or large conversation randomly). After using kill, then I typically use the "regenerate last response" (or whatever it is called). I have noticed that sometimes it looks weird after that where sometimes it duplicates the "sharing" references. So not sure if that could create some sort of additional instance of the chat somehow going forward and cause something weird ... no reason to suspect this being the cause but it is something to note.

@GitMurf
Copy link
Contributor Author

GitMurf commented Dec 21, 2024

Btw do you have any good tips and tricks for profiling stuff like this when I get into this memory leaked state? Like ways to check potential problem autocmds or duplication of autocmds etc? Or any way to "profile" the memory some how to see what potentially could be the cause of the leak and/or see which parts of neovim are delaying / chopping as I type in insert mode?

@olimorris
Copy link
Owner

Overall it "feels" like one of a potential few things:

  1. Somehow a bunch of extra autocmds being registered or something like that. Only insert mode seems to be affected. Normal mode can navigate fine.
  2. It increasingly slows down when using the autocomplete for slash commands / # / @ items. Almost as if the problem could be tied to autocmds / logic around completions.

Yeah agreed. The tacit knowledge I have of seeing these types of things over the years in Neovim makes me think something in the plugin is iteratively getting applied to the main event loop (or something along those lines) which causes Neovim to balloon in memory usage.

Using the plugin to diagnose the plugin, there may be some Tree-sitter optimizations I can make actually so will implement that later.

Another thing to note. As of late, with Claude, I have to fairly often use the "kill" command in code companion as the response stops steaming / cutoff mid response (this happens regardless of small or large conversation randomly)

I'm experiencing the same issue occasionally. I believe this is an issue on the Anthropic side. I've basically been able to recreate the workflow every other adapter but it's Anthropic (whether through Copilot or themselves) that always stalls and is substantially slower at responding. Once the request is made to an LLM, there's very little that CodeCompanion does other than receive and render the output and the logs indicate that Anthropic just hangs after a while. Perhaps that's the max_tokens parameter kicking in but I'd still expect to see an error message.

Btw do you have any good tips and tricks for profiling stuff like this when I get into this memory leaked state?

My first step is to ask an LLM to try and spot any optimizations 😆. If we can recreate this with the minimal.lua file then I'll investigate some profiling tools further.

@olimorris
Copy link
Owner

@GitMurf could you test PR #557 at some point? I don't believe any of the changes would impact memory but they're sensible optimizations...basically, don't call vim.treesitter.get_parser with the frequency that I was previously.

@olimorris
Copy link
Owner

Another thought is that the chat object could be self referencing and therefore becoming larger with every response. If that's the case we'll see that with the minimal.lua recreation.

@GitMurf
Copy link
Contributor Author

GitMurf commented Dec 21, 2024

Another thought is that the chat object could be self referencing and therefore becoming larger with every response. If that's the case we'll see that with the minimal.lua recreation.

That is what it "feels" like. Curious if could be memory references held onto Chat object from like keymaps or something too? For example I have a keymap for hide (as you implemented the custom keymaps) and is given access to the chat object. Could there be anything with that? 🤷🏻‍♂️

@GitMurf
Copy link
Contributor Author

GitMurf commented Dec 21, 2024

Also I typically always am providing like 3 or 4 file references for context. Not typically very big. I am a pretty big modularity guy so rarely are any of my files larger than 100 lines.

@GitMurf
Copy link
Contributor Author

GitMurf commented Dec 21, 2024

With the chat reference stuff you mentioned, I am wondering if it stems from the Claude api inconsistency which causes me to use the "kill" keymap and the "regenerate last response"? 🤔 maybe extra references, instances and / or circular references of some sort occurring?

It definitely gets worse with more text in the buffer so whatever it is, that is a large factor I believe.

@olimorris
Copy link
Owner

olimorris commented Dec 22, 2024

I've reduced object nesting between the chat buffer and the references object in #561. If you could give that a test at some point would be very much appreciated.

Previously, the references object had the whole chat buffer linked to it. Over a large request, I could absolutely expect that to cause an impact.

I'll also review if there are any possible design choices that are affecting garbage collection.

@GitMurf
Copy link
Contributor Author

GitMurf commented Dec 22, 2024

If you could give that a test at some point would be very much appreciated.

Awesome! And I'm hoping to get some time tonight to work which I will be able to do some testing. This week I'm traveling with the family to my in-laws for the holidays and promised my wife I won't work, so I won't have much time to this week ;)

@olimorris
Copy link
Owner

Haha I know that feeling 😄! Regardless, enjoy the holidays and thanks for all your contributions to the plugin this year.

@GitMurf
Copy link
Contributor Author

GitMurf commented Dec 23, 2024

Ok @olimorris I am sorry to report that you are not going to like the findings (likely requires a large refactoring) 😬 But at least I believe I know the problem(s) so you know what needs to be fixed! :-)

TLDR: The problem seems to be caused by the fact that you are utilizing treesitter markdown queries for a lot of the parsing of the chat buffer, adding/removing messages and references etc.

The cause is because in order to successfully rely on treesitter queries, you have to ensure that the markdown in every response (and in combination with the entire chat buffer) adheres to proper markdown stucture / formatting, mostly as it pertains to Heading levels (H1 (#), H2 (##), H3 (###) etc.

In addition to the fact that the LLM could be providing markdown in any sort of format depending on what the user asks for, their system prompt, their individual message prompts etc.... the user could also be adding markdown to their user message prompt as well.

But the "ultimate" issue is the fact that it appear code companion relies on in different ways the ## User Message (heading 2 user message heading) and ## LLM Message (heading 2 LLM response heading).

It also appears that in different areas (I tried to dig into the code a bit to assess if my hunch was correct and see how big/small the problem may be) there may be some variations to how you use the treesitter queries to get what you want. For example, it seems that for a message response from the LLM, you look for the "last" (most recent) H2 (##) in the chat buffer and grab anything "nested underneath it (H2 level)" (H3, H4, paragraphs, lists etc.). OR it just comes out looking that way and maybe it is the most recent "## LLM" heading (in my examples below I name LLM to CodeCompanion) and then anything "nested below" H3 or higher, but it looks like even potentially it also brings in other H2 (see my examples below where I am a bit unsure).

Either way, relying on Markdown and its nested "hierarchy" I don't think is something you will ever be able to deterministicly determine, especially given users can use their own System Prompts which means you have no ensured control on the format the LLM must respond in or rules with regards to markdown levels.

One potential idea where you could still use treesitter is that instead of trying to use treesitter to grab nodes and the test returned etc. and trying to "match up" an actual markdown structure hierarchy to get all content "within" a certain section (and its children), you could instead just look for ## User or ## LLM and the line number for the heading and then just take everything after and/or between two heading line numbers. So for example, the most recent message returned by the LLM would just be:

  1. Grab the line number of all matching ## LLM headings.
  2. Take the highest line number (most recent).
  3. Do the same for ## User (to get the most recent User heading)
  4. The last message from LLM is the content between line number from Step 2 and line number from Step 3.

BUT this still comes with problems as well as there is nothing stopping the user from accidentally (or intentionally) changing any of the content in the conversation. Maybe they are writing a new message and accidentlaly delete the ## User heading that is automatically added for them at the end of the buffer. Or maybe they go up to copy a previous message in the buffer and accidentally delete it and have to undo and then do what any classic vimmer occasionally does, fat finger some vim motion quickly that does something stupid (go to delete a line but accidentally hit 3 before the dd so it deletes the 3 previous lines which happens to be the ## LLM heading of the previous message)... unfortunately just unknown what the user could do intentionally or not.

I believe, if possible, the "source of truth" of the conversation should just be the actual table/list of messages that get sent and received to/from the LLM each time (curl). So then you are not depending on the content of the chat buffer for the actual chat conversation. Of course this may make the references (shared info) features a little tougher because I know you rely on the content in the buffer for good reason so the user can modify easily. But with that, maybe there is a "Reset buffer" command or something that lets you re-create the buffer content from the actual table/list of messages (source of truth). In fact, this would actually be awesome as it then would allow you to add some features for saving conversations and restoring etc. if you save your conversations as a JSON of the messages (source of truth messages from the curl interactions) then you could easily have a feature for restoring a conversation which restores the chat buffer properly and then allows you to continue your conversation.

Taking all of this info, I boil it down to causing two issues:

  1. Incorrect Conversation History: This would explain why I sometimes am confused why code companion LLM does not remember some piece of context or repeats code when I knew it already suggested that earlier in the conversation! I definitely have noticed this and believe this is why. The chat messages store in conversation history will likely be "truncated" if there is any markdown in the responses. And for me I often prompt heavily on making sure the LLM responds structurally using markdown which would explain why I have noticed the problems maybe more than most others.

  2. Performance: The issues are likely caused by more and more "complex" treesitter queries due to a ton of markdown syntax throughout the conversation and in combination with misformed markdown (heading nesting out of whack). In my quick review of the code there seems to be several spots where parsing the treesitter content looping through the nodes returned and constructing messages, references etc. This is likely the cause of "choppy" texting the larger the conversation. Also keep in mind if you have a fancy mac (really not even fancy anymore, just a m1 processor or newer) you may not notice these impacts until much longer conversations. Unfortunately PCs are still FAR behind in the cpu arms race as I have a solid Lenovo laptop but others that I develop with who have M1s, M2s etc. it is crazy to see the difference when software / features are slow for me when developing and they never notice it so don't realize there is a problem / bug / inefficiency. Just an aside that is a real problem in software development these days because so much of dev community are on Macs and now pretty much all mac users are on a M1 or better.

Ok, enough of all of that... now to the "evidence" that hopefully will help you understand / see the problems I described above. I tried to make these screenshots as self explanatory as possible so I will not add too much details about each but will number them so if you have any questions or comments on particular ones, you can reference them easily.

NOTE: I used the keymap to remove the system prompt each time to control the prompt as much as possible. So if you want to re-create my examples, I suggest doing that.

Example 1: Seems to look for the most recent H2 (##) and grab the content after that and store that as the message.

Prompt I Used:

Please reformat the following in Markdown syntax by converting any `A.` items to Heading 1 (`#`), `B.` items to Heading 2 (`##`), and `C.` items to Heading 3 (`###`).

I want to just copy and paste the result directly into another markdown file so do NOT include the response text in a code block. Just return it formatted as described above so that I can copy the entire response and paste it directly into another markdown file.

Here is the text to reformat:

A. Top Level
1. Top Level A: List item 1.
4. Top Level A: List item 2.

B. Second Level
1. Second Level B: List item 1.
3. Second Level B: List item 2.

C. Third Level
1. Third Level C: List item 1.
2. Third Level C: List item 2.

As you can see in the screenshot, the chat message history (from debug info) has removed the first part of the LLM message up until the H2 ## sdf. It removed the following:

# Top Level

1. Top Level A: List item 1.
2. Top Level A: List item 2.

## Second Level

image

Example 2: Here you see the correct full response is stored in the history because I changed the prompt slightly to tell it to start at H3 (###) instead of H1 (#)

image

Example 3: Similar to examples 1 and 2 but I do NOT have a H2 in the LLM response so actually everything is cutoff from message history starting at the H1 in the response (the stored message history just includes the quote from very top).

image

Example 4: Similar to example 1 where it cuts off (chat history storage) the first part of the LLM response up until the ## Second Level (second) heading

The point of this one is to show/prove with a follow-up question what the LLM thinks actually is in the conversation history and asked it to just repeat what it thinks is the last message it sent. So you see in my screenshot box 1 is the actual real response and box 2 shows what the LLM sees as the last message due to the chat history storing the incorrect truncated message.

explorer - Program_Manager_qsG5GOZh0u_2024-12-22_23-00-16

Example 5: Last but not least, sort of a combination of all the examples above issues coming together

This shows how because the LLM response is under a H1 heading (meaning it does not fall underneath an H2 hierarchy which is what the ## LLM (CodeCompanion) message heading is) and there are not other H2 headings in the response from the LLM, the code companion treesitter parser likely returns no content as cannot find anything under the LLM message H2 from a treesitter/markdown perspective, so it must "fallback" to the previous message which as you can see is actually the user's message.

So then you see I ask it a follow-up to summarize the conversation and it says that it (the LLM) failed to provide actual reasons and just repeated the message request from the user. Clearly the LLM actually did response appropriately, but the chat history incorrectly stored the message as a duplicate of the Users's message (because of the treesitter/markdown parsing issues).

image

@GitMurf
Copy link
Contributor Author

GitMurf commented Dec 23, 2024

@olimorris sorry to drop this mammoth of findings on you but I wanted to document and provide to you before the holidays as I mentioned I likely will not be much available this week ;)

While I won't likely be able to test things this week, I certainly will be able to response to messages / questions / ideas here as I look at my GitHub notifications on my phone quite often. So don't hesitate to ask questions or bounce ideas off me here.

Hope you have a Happy Holidays!!

@olimorris
Copy link
Owner

Thank you for taking the time out of your holidays for this comprehensive post. Really, really appreciated. I've had a quick scan through and will think about possible solutions. I think there's a lot of optimization to do in the chat buffer still so will start thinking about that.

Most importantly, Happy Holidays to you and your family!

@GitMurf
Copy link
Contributor Author

GitMurf commented Dec 23, 2024

I've had a quick scan through and will think about possible solutions. I think there's a lot of optimization to do in the chat buffer still so will start thinking about that.

No problem and sounds good!

Overall was it clear enough what my general findings (and opinions) were? Do you need any clarification on it or do you feel like you understand what I was trying to convey?

@GitMurf
Copy link
Contributor Author

GitMurf commented Dec 23, 2024

I've created the following workflow that prompts the LLM 10 times for:

Btw I did try your test workflow and I amended it a bit adding these 3 additional prompts to the end of your 10:

          {
            {
              role = constants.USER_ROLE,
              content = 'I have asked for several different requests thus far. Can you please summarize each of my requests (number them) and provide an extremely detailed summary / overview for each. Each summary needs to be at minimum 3 paragraphs long. Include the original request with each as well.',
              opts = {
                auto_submit = true,
              },
            },
          },
          {
            {
              role = constants.USER_ROLE,
              content = 'Now do the same for each request but provide any code that you originally responded with for each so that I have a good consolidated view of all code provided in our conversation.',
              opts = {
                auto_submit = true,
              },
            },
          },
          {
            {
              role = constants.USER_ROLE,
              content = 'Please summarize the entire conversation to this point. Be as detailed and verbose as you possibly can. There should be a minimum of 50 paragraphs in your response.',
              opts = {
                auto_submit = true,
              },
            },
          },

Additionally, after doing this with gpt4o (claude is slower and has the issue of "failing" too often for a giant test like this) I then "manually" asked this final follow-up question. This probably could be added to the workflow above as well but I want to document exactly what steps I took. After this, my nvim.exe RAM was at 3gb.

Please now summarize each original request in a few sentences as well as provide a summary of your response(s) to each request in another few sentences.

@GitMurf
Copy link
Contributor Author

GitMurf commented Dec 23, 2024

Btw one last update. After above test and 3gb RAM "bloat", I used the code companion "clear chat" keymap and the memory still did not get freed up. And with a completely empty chat, when I typed it was still choppy (albeit it a tad less choppy) and especially using slash commands /f.....i.....l.....e.... I then opened a couple new chats, still the memory stayed at 3gb. I then asked a new simple question in the original chat (after it was cleared) and it still stayed high. It wasn't until I did the "quit chat" keymap that then it kind of briefly "froze" (a second or two) and then the memory was cleared and RAM back to a normal level.

The point is, there is definitely large amount of data and/or memory references of variables etc. or something that is "stuck" and does not get freed up until actually using the keymaps.close to completely close/kill the chat.

@mystilleef
Copy link

This is the workaround I use that helps a bit.

On CodeCompanionRequestStarted disable treesitter and markdown-render. Reenable them on CodeCompanionRequestFinished.

You should experience some improvements.

Here's a partial code dump on my config

-- Enable Treesitter highlighting for buffer
local function enable_treesitter(bufnr)
  bufnr = bufnr or vim.api.nvim_get_current_buf()
  vim.treesitter.start(bufnr)
end

-- Disable Treesitter highlighting for buffer
local function disable_treesitter(bufnr)
  bufnr = bufnr or vim.api.nvim_get_current_buf()
  if vim.treesitter.highlighter.active[bufnr] then
    vim.treesitter.stop(bufnr)
  end
end

local function enable_render_markdown()
  local status_ok, render = pcall(require, "render-markdown")
  if status_ok then
    vim.schedule(render.enable)
  end
end

local function disable_render_markdown()
  local status_ok, render = pcall(require, "render-markdown")
  if status_ok then
    render.disable()
  end
end

local function handle_enable()
  vim.schedule(enable_treesitter)
  vim.schedule(enable_render_markdown)
  vim.schedule(function()
    vim.cmd([[silent! normal! Gzb]])
  end)
end

local function handle_disable()
  disable_render_markdown()
  disable_treesitter()
end

function M.opts()
  ...
  aucmd({ "User" }, handle_disable, group, {
    pattern = { "CodeCompanionRequestStarted", "CodeCompanionAgentStarted" },
  })
  aucmd({ "User" }, vim.schedule_wrap(handle_enable), group, {
    pattern = { "CodeCompanionRequestFinished", "CodeCompanionAgentFinished" },
  })
  ...
  return {opts}

@olimorris
Copy link
Owner

Ready to start looking at this once again. Unfortunately, I still cannot recreate this at all.

@GitMurf can you confirm that you're seeing this in a minimal.lua file without plugins like render-markdown? I've trawled the codebase and worked quite extensively with Claude on this and I can't see any circular references that would be causing this issue.

I have now pushed a refactor which scopes the Tree-sitter queries. The plugin will no longer parse a whole buffer in order to get the latest messages from a user. I'm observing improvements of around 10-20+ms in large buffers. But I don't believe that will account for 3GB of ram usage.

@GitMurf
Copy link
Contributor Author

GitMurf commented Jan 4, 2025

I think there are 2 issues going on that are related but also distinct (I believe):

  1. Memory issue which is not consistently reproducible. It just seems to happen every so often on a long conversation.

  2. Lag typing in the buffer as the conversation gets larger. This I believe is a treesitter issue as we have discussed but I'm not totally sure. I really need to dive deeper into debugging it so I can pinpoint where the lag is coming from. It would seem it needs to be coming from some autocmd but also could be due to treesitter reparsing the entire buffer. Is it doing this parsing in every keystroke? I know you mentioned you scoped the treesitter queries down which may end up solving the problem! I am going to try and pinpoint the cause(s) of the issue before I update so we have something to compare to.

An important note is that I have my system prompt instructed to return several pieces to each message response and have it returning in markdown broken into headings and subheadings. I think this contributes to the issues. I have sense instructed LLM to add single backticks around heading characters in response so treesitter does not consider it headings. Like #, ## etc. This seems to have helped with the memory leak issue and less laggy while typing in long conversations, but it still is there and gets worse the longer you make the conversation.

I'll let you know if I am able to pinpoint the source of the lagging typing.

@olimorris
Copy link
Owner

And is render-markdown enabled?

@GitMurf
Copy link
Contributor Author

GitMurf commented Jan 4, 2025

And is render-markdown enabled?

Have to retest again as haven't done any deep debugging on this issue since my original messages. But when I did test I did do testing without render markdown enabled and still got issue.

But I know for sure I tested a large conversation, got laggy, saved the chat buffer as a markdown file, opened that markdown file of the chat and it was buttery smooth (with render markdown enabled). This is what makes me think it is some sort of autocmd or similar.

@ravitemer
Copy link

And is render-markdown enabled?

Have to retest again as haven't done any deep debugging on this issue since my original messages. But when I did test I did do testing without render markdown enabled and still got issue.

But I know for sure I tested a large conversation, got laggy, saved the chat buffer as a markdown file, opened that markdown file of the chat and it was buttery smooth (with render markdown enabled). This is what makes me think it is some sort of autocmd or similar.

Yeah! I too noticed the memory issue recently, I have no idea what's causing this but as soon as I close my neovim my ram magically freed up from 200Mb to 4Gb free. I am using tmux and can see the free ram in the statusline so whenever it is low I close and reopen neovim. Glad I am not the only one.

@olimorris
Copy link
Owner

@ravitemer if you can share information about how it occurred for you that would be great. Adapter/model, length of chat buffer, plugins enabled e.g render markdown. Are you using any tools when it occurs etc

@ravitemer
Copy link

ravitemer commented Jan 9, 2025

@ravitemer if you can share information about how it occurred for you that would be great. Adapter/model, length of chat buffer, plugins enabled e.g render markdown. Are you using any tools when it occurs etc

Sorry! I can't say that this plugin was the problem. If I tried it with minimal.lua I can't replicate it. But I am using lunarvim which with default config is taking around 300Mb on startup. If I add my plugins and navigate around and do some work, It's taking too much space like from 3.8Gb free ram to 900Gb even without opening CC Chat once. It's pretty amazing how as soon as I close nvim the memory magically frees up like in a second. So the problem must be with my config. The reason I got to notice this just recently is because I have started using tmux. Along that same time I found this bug while using codecompanion where just after 3-4 messages when I submit with the chat I get "No messages to submit" warning which is now gone. So while searching through discussions and issues I found this memory leak issue and thought this might it and started following.

By the way! This plugin is awesome. Amazing work! I have to say working with codecompanion in neovim beats Cursor, Vscode. Super thankful for you work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants