Update db covering indexes to have columns likely to be unique leftmost #2712

LZRS · 2024-11-05T23:25:21Z

based on https://www.sqlite.org/queryplanner.html#_multi_column_indices and https://www.sqlite.org/optoverview.html

IMPORTANT: All PRs must be linked to an issue (except for extremely trivial and straightforward changes).

Fixes #[issue number]

Description
Refactor db covering indexes to have their leftmost columns to be those most likely to be unique (also mostly likely to be used)
Based on https://www.sqlite.org/queryplanner.html#_multi_column_indices

The left-most column is the primary key used for ordering the rows in the index. The second column is used to break ties in the left-most column. If there were a third column, it would be used to break ties for the first two columns. And so forth for all columns in the index.

And https://www.sqlite.org/optoverview.html

It is not necessary for every column of an index to appear in a WHERE clause term in order for that index to be used. However, there cannot be gaps in the columns of the index that are used

Alternative(s) considered
Have you considered any alternatives? And if so, why have you chosen the approach in this PR?

Screenshots (if applicable)

Checklist

I have read and acknowledged the Code of conduct.
I have read the Contributing page.
I have signed the Google Individual CLA, or I am covered by my company's Corporate CLA.
I have discussed my proposed solution with code owners in the linked issue(s) and we have agreed upon the general approach.
I have run ./gradlew spotlessApply and ./gradlew spotlessCheck to check my code follows the style guide of this project.
I have run ./gradlew check and ./gradlew connectedCheck to test my changes locally.
I have built and run the demo app(s) to verify my change fixes the issue and/or does not break the demo app(s).

to make them covering indexes

santosh-pingle · 2024-11-19T09:19:18Z

engine/src/main/java/com/google/android/fhir/db/impl/ResourceDatabase.kt

+        database.execSQL(
+          "CREATE INDEX IF NOT EXISTS `index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid` ON `TokenIndexEntity` (`index_value`, `resourceType`, `index_name`, `resourceUuid`);",
+        )
+        database.execSQL(


Can we refactor the code by grouping operations logically and separating concerns into helper functions? For example, create separate functions to drop and create the indices

Okay, sure. Do you mean grouping by the type of operation or the table? Which one would be better?

jingtang10

Hi @LZRS thanks for this draft PR - this is really important!

Some comments on the draft PR: it would be much easier to review smaller changes because this touches so many tables - maybe we can have 1 pr to change the indices and another one to change the queries.

secondly, can you use query planner in android studio to check if the indices are being hit? and do you have actual numbers with regards to time taken for these queries?

jingtang10 · 2024-11-19T10:05:30Z

engine/src/main/java/com/google/android/fhir/db/impl/entities/TokenIndexEntity.kt

@@ -28,7 +28,7 @@ import org.hl7.fhir.r4.model.ResourceType
 @Entity(
  indices =
    [
-      Index(value = ["resourceType", "index_name", "index_system", "index_value", "resourceUuid"]),
+      Index(value = ["index_value", "resourceType", "index_name", "resourceUuid"]),


if you look at this file https://github.com/google/android-fhir/blob/fbeeb0b4abff0ecf0b133d57326bec37ac1cc920/engine/src/test/java/com/google/android/fhir/search/SearchTest.kt

and search for TokenIndexEntity, you'll find some of the queries generated by our search api. a lot of them have resource type, index name, and index value - so changing this index here will probably make those queries worse.

but - we probably should fix the index, to move up the index value column.

I moved the index_value to be the leftmost column within the index since it should be the most distinguishable column, and removed the index_system column since it never gets used within the index because the queries generated use the IFNULL statement that forces evaluation from the actual rows

Given query

SELECT a.* FROM ResourceEntity a WHERE a.resourceType = 'Task' AND a.resourceUuid IN (SELECT resourceUuid FROM TokenIndexEntity WHERE resourceType = 'Task' AND index_name = 'identifier' AND index_value = 'test-task') AND a.resourceUuid IN (SELECT resourceUuid FROM TokenIndexEntity WHERE resourceType = 'Task' AND index_name = 'status' AND ( (index_value = 'ready' AND IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status') OR ((index_value = 'in-progress' AND IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status') OR (index_value = 'requested' AND IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status'))))

The query generated for previous indexes were

QUERY PLAN |--SEARCH a USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?) |--LIST SUBQUERY 1 | `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value_resourceUuid (resourceType=? AND index_name=?) `--LIST SUBQUERY 2 `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value_resourceUuid (resourceType=? AND index_name=?)

With the changes,

QUERY PLAN |--SEARCH a USING INDEX index_ResourceEntity_resourceUuid_resourceType (resourceUuid=? AND resourceType=?) |--LIST SUBQUERY 1 | `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?) `--LIST SUBQUERY 2 `--MULTI-INDEX OR |--INDEX 1 | `--SEARCH TokenIndexEntity USING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?) |--INDEX 2 | `--SEARCH TokenIndexEntity USING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?) `--INDEX 3 `--SEARCH TokenIndexEntity USING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?)

LZRS · 2024-11-19T13:21:18Z

Hi @LZRS thanks for this draft PR - this is really important!

Some comments on the draft PR: it would be much easier to review smaller changes because this touches so many tables - maybe we can have 1 pr to change the indices and another one to change the queries.

secondly, can you use query planner in android studio to check if the indices are being hit? and do you have actual numbers with regards to time taken for these queries?

Okay, I will do that and share some of the related query plans

LZRS force-pushed the optimize-db-index branch 4 times, most recently from 77fde5c to d9cb381 Compare November 8, 2024 18:32

Add misisng resourceUuid to IndexEntity indexes

295cefb

to make them covering indexes

LZRS force-pushed the optimize-db-index branch from d9cb381 to 295cefb Compare November 12, 2024 05:50

Merge remote-tracking branch 'upstream/master' into optimize-db-index

a4e3f51

LZRS force-pushed the optimize-db-index branch 2 times, most recently from fdf02e1 to 5a83d72 Compare November 16, 2024 17:05

Refactor indexes to experiment on performance

1e7cf59

LZRS force-pushed the optimize-db-index branch from 5a83d72 to 1e7cf59 Compare November 16, 2024 18:04

Update include/revInclude to have ReferenceIndexEntity leftmost

fbeeb0b

santosh-pingle reviewed Nov 19, 2024

View reviewed changes

jingtang10 reviewed Nov 19, 2024

View reviewed changes

jingtang10 requested changes Nov 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update db covering indexes to have columns likely to be unique leftmost #2712

Update db covering indexes to have columns likely to be unique leftmost #2712

LZRS commented Nov 5, 2024

santosh-pingle Nov 19, 2024

LZRS Nov 19, 2024

jingtang10 left a comment

jingtang10 Nov 19, 2024

LZRS Nov 19, 2024

LZRS Nov 19, 2024

LZRS commented Nov 19, 2024 •

edited

Loading

Update db covering indexes to have columns likely to be unique leftmost #2712

Are you sure you want to change the base?

Update db covering indexes to have columns likely to be unique leftmost #2712

Conversation

LZRS commented Nov 5, 2024

santosh-pingle Nov 19, 2024

Choose a reason for hiding this comment

LZRS Nov 19, 2024

Choose a reason for hiding this comment

jingtang10 left a comment

Choose a reason for hiding this comment

jingtang10 Nov 19, 2024

Choose a reason for hiding this comment

LZRS Nov 19, 2024

Choose a reason for hiding this comment

LZRS Nov 19, 2024

Choose a reason for hiding this comment

LZRS commented Nov 19, 2024 • edited Loading

LZRS commented Nov 19, 2024 •

edited

Loading