Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reports are being sent multiple times #22664

Open
1 of 3 tasks
mattitoo opened this issue Jan 10, 2023 · 19 comments
Open
1 of 3 tasks

Reports are being sent multiple times #22664

mattitoo opened this issue Jan 10, 2023 · 19 comments
Assignees
Labels
doc Namespace | Anything related to documentation good first issue Good first issues for new contributors

Comments

@mattitoo
Copy link
Contributor

On Superset 1.5.1 we have ca. 50 reports enabled that are mostly sent on a daily basis.
Sometimes, a report is sent out mutliple times without an apparent reason (s. screenshot of the Report Execution Log), even though the report is set to only send it out once a day (s. screenshot).

This behavior has been seen on different reports on different days, with no apparent pattern. Just randomly, it seems that Superset decides to send out a report several times. The only notable circumstance is that this happens after the scheduled time.

How to reproduce the bug

  1. Set up a report with a schedule for once a day

Expected results

Each report is sent out according to schedule

Actual results

Some reports are sent out multiple times at random (but all around the scheudled time)

Screenshots

Bildschirmfoto 2023-01-10 um 16 31 47

Bildschirmfoto 2023-01-09 um 16 38 14

Environment

(please complete the following information):

  • browser type and version: Any browser
  • superset version: 1.5.1

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

Add any other context about the problem here.

@mattitoo mattitoo added the #bug Bug report label Jan 10, 2023
@devaale
Copy link

devaale commented Jan 11, 2023

@mattitoo It's not related to your problem - but could you share information if you're able to send reports of charts as png with this version of superset ?

@mattitoo
Copy link
Contributor Author

@devaale Yes, it works nicely.

@devaale
Copy link

devaale commented Jan 11, 2023

@mattitoo By any chance you could share your cloned superset repo as public repository of your own ? I'm having difficulty getting png using API with the master branch.

@mattitoo
Copy link
Contributor Author

For organisational reasons I can not do that right now. But we did not implement any changes on the Reporting features or API whatsoever.

@devaale
Copy link

devaale commented Jan 11, 2023

Mhm - understandable, no changes required in superset_config.py file either - in order for API endpoint api/v1/chart/{pk}/cache_screenshot or reporting when sending chart as .PNG to work, right ? @mattitoo Thank you - i'll try out 1.5.1 version.

@chathawee
Copy link

I am facing this issue too. Some day it sent out duplicately and some day did not sent

@mattitoo
Copy link
Contributor Author

We found out that is a timing issue when generating reports. Basically, every 60 seconds there is a check if a report is scheduled but not finished yet. But, it is not taken into account whether the report was triggered, but not finished yet.
So, if reports take too long, they are not registered as finished, and the report is triggered again. Then, the first report is sent out (because it is finished now) and the second one as well.

@chathawee
Copy link

@mattitoo Could you please suggest? Do you have any idea how to fix it?

@unnyns-307
Copy link

Is this related to caching timeout in redis/celery worker?
I found some related issue about celery worker that duplicate task celery/celery#3270
and there is some comment suggested to extend VISIBILITY_TIMEOUT config of celery but I'm not sure where we could apply it in Superset

@mattitoo
Copy link
Contributor Author

We just upgraded to Superset 2.0.1 and wanted to see if the problem persists there. If it does, we will have a look at a possible fix.

@mdeshmu
Copy link
Contributor

mdeshmu commented Mar 26, 2023

@mattitoo we are facing same issue. did upgrade to 2.0.1 solve your problem?

@mattitoo
Copy link
Contributor Author

No, unfortunately this still happens.

@mdeshmu
Copy link
Contributor

mdeshmu commented Mar 28, 2023

For us, This was caused by value of visibility timeout being lower than time taken by task to complete the job. Increasing the sqs queue's visibility timeout stopped the duplicates.

@unnyns-307
Copy link

Hi, we also tried extend VISIBILITY_TIMEOUT for celery and it resolved this issue.
The report stop duplicating or skipping. Thank you @mdeshmu for your suggestion and thank you all for discussion on this issue. Cheers!

@rusackas
Copy link
Member

I'm tempted to close this as completed based on what I'm reading... is there anything that needs to be added to the docs and/or comments in config files so we can rest easier about doing so?

@zhaoyongjie
Copy link
Member

Hey @rusackas, as @unnyns-307 mentioned, we resolved this issue by appending the following line to the config.py file: broker_transport_options = {'visibility_timeout': 18000}. The snippet below might be helpful for other users experiencing the same issue.

      class CeleryConfig(object):
          broker_url = f"redis://{REDIS_HOST}:{REDIS_PORT}/0"
          broker_transport_options = {'visibility_timeout': 18000}
          imports = ("superset.sql_lab", "superset.tasks", "superset.tasks.thumbnails")
          result_backend = f"redis://{REDIS_HOST}:{REDIS_PORT}/0"

@MONOLIFE-12
Copy link

MONOLIFE-12 commented Apr 24, 2024

Hi @rusackas , @unnyns-307 , @zhaoyongjie , can someone help me?

I'm use as @unnyns-307 mentioned, but it does not resolve this issue.

broker_transport_options = {'visibility_timeout': 18000}

please help.

image

image

@sfirke
Copy link
Member

sfirke commented Aug 5, 2024

I looked at the Celery docs on Visibility Timeout and it says the default value is one hour. I take that to mean that you would only encounter the problematic behavior in this issue if your reports sometimes take over an hour to run -- can anyone confirm that was the situation or give a runtime value for one of the reports that was being sent in a loop?

I think this should be added to the Alerts & Reports documentation, but not the default config.py. One hour seems like a reasonable default. The "caveats" section of those docs say that increasing this value can have the negative side effect of reports being excessively delayed if Celery is restarted.

Would anyone in this thread be willing to contribute a brief PR to the Alerts & Reports docs page so we can close this? I think it should be a commented-out line of code in the config showing how to set a valid value in a way that works with the current version of Superset, and then another comment above it saying something like:

# if you have long-running reports that are being resent in a loop, extend the visibility timeout per https://github.com/apache/superset/issues/22664"

@sfirke sfirke added good first issue Good first issues for new contributors doc Namespace | Anything related to documentation and removed #bug Bug report labels Aug 5, 2024
@wuqicyber
Copy link

They are different task Ids. Have you started multiple Celery Beat Instance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Namespace | Anything related to documentation good first issue Good first issues for new contributors
Projects
None yet
Development

No branches or pull requests