-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Kafka Source stage - Hight CPU usage (>= 100%) - idle consumer #1587
Comments
Hi @nuxwin! Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can! |
@nuxwin Does this happen without the Monitor stage? |
@mdemoret-nv Of Course, yes. |
@mdemoret-nv This doesn't seem directly related to the kafka source stage anyway. I'm wondering if this is not due to the asyncio loop. I get the same problem with the below pipeline. For us, this look like a big problem for a production use. #!/opt/conda/envs/morpheus/bin/python
import logging
import click
import pandas as pd
import time
from morpheus.config import Config, CppConfig, PipelineModes
from morpheus.messages.message_meta import MessageMeta
from morpheus.pipeline.linear_pipeline import LinearPipeline
from morpheus.pipeline.stage_decorator import source, stage
from morpheus.utils.logger import configure_logging
logger = logging.getLogger("morpheus.{__name__}")
@source
def source_generator() -> Generator[MessageMeta, None, None]:
while True:
time.sleep(5)
yield MessageMeta(df=pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}))
@stage
def simple_stage(message: MessageMeta) -> MessageMeta:
logger.debug(f"simple_stage:\n\n{message.df.to_string()}")
return message
@click.command()
@click.option(
"--num_threads",
default=1,
type=click.IntRange(min=1),
help="Number of internal pipeline threads to use.",
)
@click.option(
"--pipeline_batch_size",
default=1,
type=click.IntRange(min=1),
help="Internal batch size for the pipeline. Can be much larger than the model batch size",
)
def run_pipeline(num_threads, pipeline_batch_size):
configure_logging(log_level=logging.DEBUG)
CppConfig.set_should_use_cpp(False)
config = Config()
config.mode = PipelineModes.OTHER
config.num_threads = num_threads
config.pipeline_batch_size = pipeline_batch_size
pipeline = LinearPipeline(config)
pipeline.set_source(source_generator(config))
pipeline.add_stage(simple_stage(config))
pipeline.run()
if __name__ == "__main__":
run_pipeline() |
Any new ? |
I'm suspecting the same thing. Its possible the asyncio loop is just polling for changes and seeing nothing scheduled so it just repeats the process until there is some work.
Can you elaborate more on why this would be a big problem for production use for you? If the high CPU usage is due to the asyncio loop, it likely is not impacting performance of the pipeline. The loop is only spinning because there is no other work to do. Once messages are in the pipeline, they will occupy the CPU instead of the asyncio loop. |
The problem is the hight CPU (core) usage >=100% at full time. The machine's fans speeding up because of this. I wonder if this could reduce the lifespan of the CPU. I don't think that having a CPU core at 100% is something normal, especially when there is no other processing than a polling. There should be a sleep or sth like this between each polling, assuming that the problem come from the loop. Often, the high CPU usage are encoutered in while(true) loops when there is no sleep, especially when no treatment is made. Hope you get my English . |
Yes I understand what you are saying. I agree that the pipeline should not be utilizing 100% of the CPU if there is no work to be processed. We will need to look into why the asyncio loop is consistently spinning. A simple solution could be to schedule a small sleep in the loop when there is no more work. I was wondering if there was anything specific to your deployment where 100% CPU utilization would cause problems beyond the added energy use and wear and tear. For example, some environments utilize the CPU utilization to scale their system. If the CPU was always at 100%, then it would scale infinitely which would be a problem. And the solution I suggested above may not work in that environment. |
@nuxwin Also note that |
We are developing for financial entities, among other. Our clients make use of ESXi VMs (using NVIDIA vGPUs). They won't accept such CPU usage on an "idle" pipeline. Thank you for your time. That's much appreciated. |
I'm talking about a CPU core which is 100% used, not about the CPU usage average ;) So yeah, of course, for a machine with 10 cores, average usage would be reduced to 10%. But the problem remain : there is a core that is 100% used, all the time. |
Version
24.3
Which installation method(s) does this occur on?
Docker, Source
Describe the bug.
100% CPU (core) usage while normal CPU usage expected.
Increasing value of the poll_interval doesn't change anything, even
when set to 2s.
Minimum reproducible example
Relevant log output
Full env printout
Click here to see environment details
Other/Misc.
Code of Conduct
The text was updated successfully, but these errors were encountered: