Transactions CSV (and Excel) is slow: Avoid repeated queries #346

odscjames · 2022-02-22T11:53:12Z

Describe the bug

Transactions CSV (and Excel) is slow

Issue

For every hit on the transaction, we get lots of individual DB queries:

2022-02-22 11:26:12.415 UTC [6307] app@app LOG:  duration: 0.176 ms  statement: SELECT country_percentage.id AS country_percentage_id, country_percentage.percentage AS country_percentage_percentage, country_percentage.name AS country_percentage_name, country_percentage.country AS country_percentage_country, country_percentage.activity_id AS country_percentage_activity_id, country_percentage.transaction_id AS country_percentage_transaction_id 
	FROM country_percentage 
	WHERE 8005 = country_percentage.transaction_id
2022-02-22 11:26:12.416 UTC [6307] app@app LOG:  duration: 0.160 ms  statement: SELECT region_percentage.id AS region_percentage_id, region_percentage.percentage AS region_percentage_percentage, region_percentage.name AS region_percentage_name, region_percentage.region AS region_percentage_region, region_percentage.activity_id AS region_percentage_activity_id, region_percentage.transaction_id AS region_percentage_transaction_id 
	FROM region_percentage 
	WHERE 8005 = region_percentage.transaction_id
2022-02-22 11:26:12.418 UTC [6307] app@app LOG:  duration: 0.324 ms  statement: SELECT sector_percentage.id AS sector_percentage_id, sector_percentage.text AS sector_percentage_text, sector_percentage.activity_id AS sector_percentage_activity_id, sector_percentage.transaction_id AS sector_percentage_transaction_id, sector_percentage.sector AS sector_percentage_sector, sector_percentage.vocabulary AS sector_percentage_vocabulary, sector_percentage.percentage AS sector_percentage_percentage 
	FROM sector_percentage 
	WHERE 8005 = sector_percentage.transaction_id

Those 3 queries are repeated heavily.

These python functions get the extra data:

recipient_country_code
sector_code
recipient_region_code

There are also some queries to get extra info on organisation, participation and activity tables that did not repeat in my test data set (I think because SQLLachemy caches?), but should repeat on larger data sets.

2022-02-22 11:26:12.057 UTC [6307] app@app LOG:  duration: 2.174 ms  statement: SELECT activity.iati_identifier AS activity_iati_identifier, activity.hierarchy AS activity_hierarchy, activity.default_language AS activity_default_language, activity.last_updated_datetime AS activity_last_updated_datetime, activity.last_change_datetime AS activity_last_change_datetime, activity.resource_url AS activity_resource_url, activity.reporting_org_id AS activity_reporting_org_id, activity.start_planned AS activity_start_planned, activity.start_actual AS activity_start_actual, activity.end_planned AS activity_end_planned, activity.end_actual AS activity_end_actual, activity.title AS activity_title, activity.description AS activity_description, activity.default_currency AS activity_default_currency, activity.raw_xml AS activity_raw_xml, activity.version AS activity_version, activity.major_version AS activity_major_version, activity.created AS activity_created, activity.activity_status AS activity_activity_status, activity.collaboration_type AS activity_collaboration_type, activity.default_finance_type AS activity_default_finance_type, activity.default_flow_type AS activity_default_flow_type, activity.default_aid_type AS activity_default_aid_type, activity.default_tied_status AS activity_default_tied_status 
	FROM activity 
	WHERE activity.iati_identifier = '1301-4.0000'
2022-02-22 11:26:12.081 UTC [6307] app@app LOG:  duration: 0.211 ms  statement: SELECT organisation.id AS organisation_id, organisation.ref AS organisation_ref, organisation.name AS organisation_name, organisation.type AS organisation_type 
	FROM organisation 
	WHERE organisation.id = 145
2022-02-22 11:26:12.084 UTC [6307] app@app LOG:  duration: 0.479 ms  statement: SELECT participation.activity_identifier AS participation_activity_identifier, participation.organisation_id AS participation_organisation_id, participation.role AS participation_role 
	FROM participation 
	WHERE '1301-4.0000' = participation.activity_identifier

Possible solutions

These actual DB queries are really fast as you can see above, but they do add up and there are a lot of them.

If we are lucky some joining on the original query to avoid later queries will really boost this.

The text was updated successfully, but these errors were encountered:

Performance boost to /api/1/access/transaction.csv #346

odscjames · 2022-02-23T16:58:29Z

First pull request done. However there is more we might be able to do here, there are still secondary SQLs happening.

In particular, activity is joined to the main transaction SQL just for sorting then not selected, so secondary SQLs are made to load the activity later! Adding orm.joinedload(Transaction.activity), would probably be good here but then the sort by starts to complain, so that needs more attention. Note that many transactions may have one activity here, so it's less certain there will be good gains here but it seems worth trying.

Performance boost to /api/1/access/transaction/by_sector.csv #346 Similiar to 863253c

Performance boost to /api/1/access/transaction/by_country.csv #346 Similiar to 863253c

Performance boost to /api/1/access/transaction/by_sector.csv #346 Similiar to 863253c

Performance boost to /api/1/access/transaction/by_country.csv #346 Similiar to 863253c

odscjames · 2022-04-13T09:24:00Z

#350 speeds up by_sector & by_country.

When doing the by_country one, there are still a lot of SQL fetches for individual organisations. Maybe there is another quick speed up possible here.

odscjames added a commit that referenced this issue Feb 23, 2022

Transaction download: Eager load joined tables

863253c

Performance boost to /api/1/access/transaction.csv #346

odscjames mentioned this issue Feb 23, 2022

Transaction download: Eager load joined tables #347

Merged

odscjames added a commit that referenced this issue Apr 7, 2022

Transaction by sector download: Eager load joined tables

59b6b6e

Performance boost to /api/1/access/transaction/by_sector.csv #346 Similiar to 863253c

odscjames added a commit that referenced this issue Apr 7, 2022

Transaction by country download: Eager load joined tables

d83d13c

Performance boost to /api/1/access/transaction/by_country.csv #346 Similiar to 863253c

odscjames added a commit that referenced this issue Apr 7, 2022

Transaction by sector download: Eager load joined tables

3026771

Performance boost to /api/1/access/transaction/by_sector.csv #346 Similiar to 863253c

odscjames added a commit that referenced this issue Apr 7, 2022

Transaction by country download: Eager load joined tables

f15a80f

Performance boost to /api/1/access/transaction/by_country.csv #346 Similiar to 863253c

odscjames mentioned this issue Apr 13, 2022

Improve performance of transactions CSV by sector and country #350

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transactions CSV (and Excel) is slow: Avoid repeated queries #346

Transactions CSV (and Excel) is slow: Avoid repeated queries #346

odscjames commented Feb 22, 2022

odscjames commented Feb 23, 2022

odscjames commented Apr 13, 2022

Transactions CSV (and Excel) is slow: Avoid repeated queries #346

Transactions CSV (and Excel) is slow: Avoid repeated queries #346

Comments

odscjames commented Feb 22, 2022

Describe the bug

Issue

Possible solutions

odscjames commented Feb 23, 2022

odscjames commented Apr 13, 2022