Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress when running a query. #260

Open
bill-anderson opened this issue Jul 28, 2020 · 21 comments
Open

Progress when running a query. #260

bill-anderson opened this issue Jul 28, 2020 · 21 comments
Assignees
Labels
enhancement New feature or request Frontend pinned
Milestone

Comments

@bill-anderson
Copy link

  1. Is it possible to show a progress bar when running a query?
  2. Is it possible to show the total record count when viewing data prior to export?
@bill-anderson bill-anderson added the question Further information is requested label Jul 28, 2020
@edwinmp
Copy link
Collaborator

edwinmp commented Jul 29, 2020

Dev Note:

More worried about count operation ... hopefully it won't be just as expensive as the actual retrieval in extreme cases @akmiller01

@bill-anderson
Copy link
Author

https://ddw.devinit.org/queries/build/702/

This selects 1 activity

@edwinmp
Copy link
Collaborator

edwinmp commented Sep 10, 2020

Created a copy of it ... took 3 secs to save, 45 secs to load the first 10 rows, 1 min & 24 secs to export ... For a query with only 60 rows, that's really terrible performance ... I don't think anything on the frontend can help here, not by much ... This one took about 3 minutes to save

@wakibi @akmiller01 ideas?

@edwinmp edwinmp added the enhancement New feature or request label Sep 10, 2020
@wakibi
Copy link
Contributor

wakibi commented Sep 11, 2020

Possible yes to both questions @bill-anderson, the debate will be more about trade offs for performance (and extra time) waiting for the two to run prior to running the actual query. @akmiller01 your thoughts?

@akmiller01
Copy link
Contributor

I'm not actually sure about the feasibility of a progress bar. The biggest component of time is the SQL query running, and I don't think it's possible to assess progress.

While doing some tests yesterday, I actually found it's faster to preview a query that returns lots of results as opposed to a query that returns less than 10 rows. I believe this is because if you say "Select all limit 10" SQL will quickly access the first 10 rows and quickly print them. But if you say "Select where XXX limit 10" and the query only ever returns 9 rows, it will actively scan through the entire table until the end (it doesn't stop itself like the large query).

@wakibi
Copy link
Contributor

wakibi commented Sep 11, 2020

@akmiller01 it's possible to get an approximation of how long a query will take.

@akmiller01
Copy link
Contributor

If it's possible to calculate an approximation of how long a query will take quickly, it might be worth it. But otherwise if calculating the approximation takes a considerable amount of time itself, it would be counter-productive.

I'll run the iati_identifier indexer tonight, and we can see if that improves load-times where the only filter is an iati_identifier. If we can reduce load-times overall, we won't need to bother with a progress bar.

@k8hughes
Copy link
Collaborator

It would be good to have a time estimate for how long it would take to do this piece of work before we start on it.

@bill-anderson
Copy link
Author

I think we need to understand how indexing might help before doing any work on this.

@akmiller01 akmiller01 self-assigned this Sep 22, 2020
@wakibi
Copy link
Contributor

wakibi commented Nov 30, 2020

@akmiller01 @bill-anderson do I proceed with this now that we have done some optimisations and indexing to the DB?

@akmiller01
Copy link
Contributor

Hi @wakibi between all the development tasks, indexing got deployed but never implemented.

The heavy-lifting index on IATI identifier (iati_id_btree_idx on repo.iati_transactions) needs to be manually removed. And then we need to run docker-compose exec web ./data_updates/indexing.sh in a tmux to fully implement the new indexes.

And then maybe we can collect some stress-testing stats?

@k8hughes
Copy link
Collaborator

k8hughes commented Dec 8, 2020

Is someone working on this at the moment? If not can we move it back into to do or mark it blocked?

@wakibi do you know?

@wakibi
Copy link
Contributor

wakibi commented Dec 8, 2020

I am working on this, had forgotten about the second part - showing total record count. Factoring in that now

@k8hughes
Copy link
Collaborator

@wakibi which PR does this one link to?

@wakibi
Copy link
Contributor

wakibi commented Dec 10, 2020

@k8hughes #379

@k8hughes
Copy link
Collaborator

@akmiller01 would you have time to review this so we can deploy?

@k8hughes
Copy link
Collaborator

k8hughes commented Feb 9, 2021

The BE of this has been done but there is no FE yet. @edwinmp to look at the FE

@k8hughes k8hughes removed the question Further information is requested label Feb 17, 2021
@stale
Copy link

stale bot commented Apr 18, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Apr 18, 2021
@edwinmp edwinmp added pinned and removed wontfix This will not be worked on labels Apr 19, 2021
@edwinmp
Copy link
Collaborator

edwinmp commented May 4, 2021

@wakibi I don't think using an API here is going to do the trick as its run duration is not much different from the duration of the actual query. Also it seems to only work for simple queries and returns the error below for more complex ones (likely cos it returns no results for the analyse query)

image

For when it works, I suggest this runtime be calculated when the dataset is saved (asynchronously) and stored in a property of its own. That way, it's always available before the query is run and can be passed to a progress indicator.

@wakibi
Copy link
Contributor

wakibi commented May 6, 2021

I saw the error when I was testing the sub-query feature, it's due to queries that have aggregate functions. Got a fix for it (but it may have to come via the sub-query feature). About saving, sounds a good idea but the timing would be wrong for when the server is loaded. Will perhaps run it over a periods of 3 times and get the average, then store that. I think it's the only work around it.

@edwinmp edwinmp added this to the v2.10.0 milestone May 18, 2021
@edwinmp edwinmp modified the milestones: v2.10.0, v2.11.0 Jun 1, 2021
@wakibi wakibi linked a pull request Jun 15, 2021 that will close this issue
@k8hughes
Copy link
Collaborator

@edwinmp and @wakibi can you check what is needed here? it would be good to get this wrapped up and relased

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Frontend pinned
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants