Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix excessive CPU usage due to polling with no sleep #150

Conversation

NadavTasher
Copy link

Pull Request check-list

  • Do tests and lints pass with this change?
  • Do the CI tests pass with this change (enable it first in your forked repo and wait for the github action build to finish)?
  • Is the new or changed code fully tested?
  • Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?
  • Is there an example added to the examples folder (if applicable)?

Description of change

Fixed excessive CPU usage by changing the default timeout values from 0 to none.

@aiven-sal
Copy link
Member

Hi!
What do you mean by "polling with no sleep" ? Where does that happen in valkey-py?

@NadavTasher
Copy link
Author

if self.subscribed_event.wait(timeout) is True:

Here for example

@aiven-sal
Copy link
Member

if self.subscribed_event.wait(timeout) is True:

Here for example

Maybe it's because I didn't get my coffee yet, but I don't see any loop that could consume CPU. Could you point me to where that happens exactly?

@NadavTasher
Copy link
Author

NadavTasher commented Dec 13, 2024

My apologies – this is an issue I solved a while back for redis-py and forgot what it was about.

This issue only seems to appear when the user runs get_message() repeatedly in order to filter/receive messages in a loop.

The issue is that get_message() without any parameters is supposed to be blocking, but it is not, since the default parameter is 0.0 and not None (even though it seems like None was the intention, since it is compared to inside the function).

This solved the redis-py issue redis/redis-py#3208, but since redis-py doesn't get much activity (and since I have switched to valkey), I thought it would be easier to fix it here.

For example:
Calling this function should yield count results, and be quite efficient at it.

def receive_sync(channel: str = CHANNEL, connection: redis.Redis = REDIS_SYNC, count: int = 0) -> typing.Iterator[munch.Munch]:
    received = 0
    with connection.pubsub() as subscriber:
        subscriber.subscribe(channel)
        while (received < count) or (count == 0):
            message = subscriber.get_message(ignore_subscribe_messages=True)
            if message is None:
                continue
            data = message.get("data")
            if not data:
                continue
            yield munch.Munch(json.loads(data))
            received += 1

But without this fix my function would loop without blocking.

Currently I have to set the timeout parameter to make sure it doesn't use 100% of my CPU:

def receive_sync(channel: str = CHANNEL, connection: redis.Redis = REDIS_SYNC, count: int = 0) -> typing.Iterator[munch.Munch]:
    received = 0
    with connection.pubsub() as subscriber:
        subscriber.subscribe(channel)
        while (received < count) or (count == 0):
            message = subscriber.get_message(ignore_subscribe_messages=True, timeout=1)
            if message is None:
                continue
            data = message.get("data")
            if not data:
                continue
            yield munch.Munch(json.loads(data))
            received += 1

@aiven-sal
Copy link
Member

aiven-sal commented Dec 13, 2024

Ah yes, thank you for taking the time to explain!

I'm not the original author of the code, but I disagree with

get_message() without any parameters is supposed to be blocking

I don't think that there is anything indicating that get_message without parameters should be blocking, the documentation doesn't say that. Also, the fact that your change breaks a few tests is an indication that this behavior was intended.

I agree that many APIs default to blocking behavior and, for this reason, one might intuitively expect this to work in a similar way without reading the doc, but this is the way it was written a long time ago.
Changing it now has the potential to break other users of the library, so I don't think we can merge this PR.

The only thing we can do is to make the documentation more explicit about get_message's behavior to avoid confusion.

@aiven-sal aiven-sal closed this Dec 13, 2024
@NadavTasher
Copy link
Author

I understand.

I think the documentation should be clearer.
I also think that even if this PR is not mereged, we should add some warning mechanism.
I suggest we add warnings.warn("get_message is non-blocking, 0 timeout was provided") as a way to warn users of this dangerous behaviour.

@aiven-sal
Copy link
Member

I crated an issue to track this #151

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants