Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transactions CSV (and Excel) is slow: Avoid repeated queries #346

Open
odscjames opened this issue Feb 22, 2022 · 2 comments
Open

Transactions CSV (and Excel) is slow: Avoid repeated queries #346

odscjames opened this issue Feb 22, 2022 · 2 comments

Comments

@odscjames
Copy link
Collaborator

Describe the bug

Transactions CSV (and Excel) is slow

Issue

For every hit on the transaction, we get lots of individual DB queries:

2022-02-22 11:26:12.415 UTC [6307] app@app LOG:  duration: 0.176 ms  statement: SELECT country_percentage.id AS country_percentage_id, country_percentage.percentage AS country_percentage_percentage, country_percentage.name AS country_percentage_name, country_percentage.country AS country_percentage_country, country_percentage.activity_id AS country_percentage_activity_id, country_percentage.transaction_id AS country_percentage_transaction_id 
	FROM country_percentage 
	WHERE 8005 = country_percentage.transaction_id
2022-02-22 11:26:12.416 UTC [6307] app@app LOG:  duration: 0.160 ms  statement: SELECT region_percentage.id AS region_percentage_id, region_percentage.percentage AS region_percentage_percentage, region_percentage.name AS region_percentage_name, region_percentage.region AS region_percentage_region, region_percentage.activity_id AS region_percentage_activity_id, region_percentage.transaction_id AS region_percentage_transaction_id 
	FROM region_percentage 
	WHERE 8005 = region_percentage.transaction_id
2022-02-22 11:26:12.418 UTC [6307] app@app LOG:  duration: 0.324 ms  statement: SELECT sector_percentage.id AS sector_percentage_id, sector_percentage.text AS sector_percentage_text, sector_percentage.activity_id AS sector_percentage_activity_id, sector_percentage.transaction_id AS sector_percentage_transaction_id, sector_percentage.sector AS sector_percentage_sector, sector_percentage.vocabulary AS sector_percentage_vocabulary, sector_percentage.percentage AS sector_percentage_percentage 
	FROM sector_percentage 
	WHERE 8005 = sector_percentage.transaction_id

Those 3 queries are repeated heavily.

These python functions get the extra data:

  • recipient_country_code
  • sector_code
  • recipient_region_code

There are also some queries to get extra info on organisation, participation and activity tables that did not repeat in my test data set (I think because SQLLachemy caches?), but should repeat on larger data sets.

2022-02-22 11:26:12.057 UTC [6307] app@app LOG:  duration: 2.174 ms  statement: SELECT activity.iati_identifier AS activity_iati_identifier, activity.hierarchy AS activity_hierarchy, activity.default_language AS activity_default_language, activity.last_updated_datetime AS activity_last_updated_datetime, activity.last_change_datetime AS activity_last_change_datetime, activity.resource_url AS activity_resource_url, activity.reporting_org_id AS activity_reporting_org_id, activity.start_planned AS activity_start_planned, activity.start_actual AS activity_start_actual, activity.end_planned AS activity_end_planned, activity.end_actual AS activity_end_actual, activity.title AS activity_title, activity.description AS activity_description, activity.default_currency AS activity_default_currency, activity.raw_xml AS activity_raw_xml, activity.version AS activity_version, activity.major_version AS activity_major_version, activity.created AS activity_created, activity.activity_status AS activity_activity_status, activity.collaboration_type AS activity_collaboration_type, activity.default_finance_type AS activity_default_finance_type, activity.default_flow_type AS activity_default_flow_type, activity.default_aid_type AS activity_default_aid_type, activity.default_tied_status AS activity_default_tied_status 
	FROM activity 
	WHERE activity.iati_identifier = '1301-4.0000'
2022-02-22 11:26:12.081 UTC [6307] app@app LOG:  duration: 0.211 ms  statement: SELECT organisation.id AS organisation_id, organisation.ref AS organisation_ref, organisation.name AS organisation_name, organisation.type AS organisation_type 
	FROM organisation 
	WHERE organisation.id = 145
2022-02-22 11:26:12.084 UTC [6307] app@app LOG:  duration: 0.479 ms  statement: SELECT participation.activity_identifier AS participation_activity_identifier, participation.organisation_id AS participation_organisation_id, participation.role AS participation_role 
	FROM participation 
	WHERE '1301-4.0000' = participation.activity_identifier

Possible solutions

These actual DB queries are really fast as you can see above, but they do add up and there are a lot of them.

If we are lucky some joining on the original query to avoid later queries will really boost this.

odscjames added a commit that referenced this issue Feb 23, 2022
Performance boost to /api/1/access/transaction.csv

#346
@odscjames
Copy link
Collaborator Author

First pull request done. However there is more we might be able to do here, there are still secondary SQLs happening.

In particular, activity is joined to the main transaction SQL just for sorting then not selected, so secondary SQLs are made to load the activity later! Adding orm.joinedload(Transaction.activity), would probably be good here but then the sort by starts to complain, so that needs more attention. Note that many transactions may have one activity here, so it's less certain there will be good gains here but it seems worth trying.

odscjames added a commit that referenced this issue Apr 7, 2022
Performance boost to /api/1/access/transaction/by_sector.csv

#346

Similiar to 863253c
odscjames added a commit that referenced this issue Apr 7, 2022
Performance boost to /api/1/access/transaction/by_country.csv

#346

Similiar to 863253c
odscjames added a commit that referenced this issue Apr 7, 2022
Performance boost to /api/1/access/transaction/by_sector.csv

#346

Similiar to 863253c
odscjames added a commit that referenced this issue Apr 7, 2022
Performance boost to /api/1/access/transaction/by_country.csv

#346

Similiar to 863253c
@odscjames
Copy link
Collaborator Author

#350 speeds up by_sector & by_country.

When doing the by_country one, there are still a lot of SQL fetches for individual organisations. Maybe there is another quick speed up possible here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant