Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update db covering indexes to have columns likely to be unique leftmost #2712

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

LZRS
Copy link
Collaborator

@LZRS LZRS commented Nov 5, 2024

based on https://www.sqlite.org/queryplanner.html#_multi_column_indices and https://www.sqlite.org/optoverview.html

IMPORTANT: All PRs must be linked to an issue (except for extremely trivial and straightforward changes).

Fixes #[issue number]

Description
Refactor db covering indexes to have their leftmost columns to be those most likely to be unique (also mostly likely to be used)
Based on https://www.sqlite.org/queryplanner.html#_multi_column_indices

The left-most column is the primary key used for ordering the rows in the index. The second column is used to break ties in the left-most column. If there were a third column, it would be used to break ties for the first two columns. And so forth for all columns in the index.

And https://www.sqlite.org/optoverview.html

It is not necessary for every column of an index to appear in a WHERE clause term in order for that index to be used. However, there cannot be gaps in the columns of the index that are used

Alternative(s) considered
Have you considered any alternatives? And if so, why have you chosen the approach in this PR?

Type
Choose one: (Bug fix | Feature | Documentation | Testing | Code health | Builds | Releases | Other)

Screenshots (if applicable)

Checklist

  • I have read and acknowledged the Code of conduct.
  • I have read the Contributing page.
  • I have signed the Google Individual CLA, or I am covered by my company's Corporate CLA.
  • I have discussed my proposed solution with code owners in the linked issue(s) and we have agreed upon the general approach.
  • I have run ./gradlew spotlessApply and ./gradlew spotlessCheck to check my code follows the style guide of this project.
  • I have run ./gradlew check and ./gradlew connectedCheck to test my changes locally.
  • I have built and run the demo app(s) to verify my change fixes the issue and/or does not break the demo app(s).

@LZRS LZRS force-pushed the optimize-db-index branch 4 times, most recently from 77fde5c to d9cb381 Compare November 8, 2024 18:32
@LZRS LZRS force-pushed the optimize-db-index branch 2 times, most recently from fdf02e1 to 5a83d72 Compare November 16, 2024 17:05
database.execSQL(
"CREATE INDEX IF NOT EXISTS `index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid` ON `TokenIndexEntity` (`index_value`, `resourceType`, `index_name`, `resourceUuid`);",
)
database.execSQL(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refactor the code by grouping operations logically and separating concerns into helper functions? For example, create separate functions to drop and create the indices

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sure. Do you mean grouping by the type of operation or the table? Which one would be better?

Copy link
Collaborator

@jingtang10 jingtang10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LZRS thanks for this draft PR - this is really important!

Some comments on the draft PR: it would be much easier to review smaller changes because this touches so many tables - maybe we can have 1 pr to change the indices and another one to change the queries.

secondly, can you use query planner in android studio to check if the indices are being hit? and do you have actual numbers with regards to time taken for these queries?

@@ -28,7 +28,7 @@ import org.hl7.fhir.r4.model.ResourceType
@Entity(
indices =
[
Index(value = ["resourceType", "index_name", "index_system", "index_value", "resourceUuid"]),
Index(value = ["index_value", "resourceType", "index_name", "resourceUuid"]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you look at this file https://github.com/google/android-fhir/blob/fbeeb0b4abff0ecf0b133d57326bec37ac1cc920/engine/src/test/java/com/google/android/fhir/search/SearchTest.kt

and search for TokenIndexEntity, you'll find some of the queries generated by our search api. a lot of them have resource type, index name, and index value - so changing this index here will probably make those queries worse.

but - we probably should fix the index, to move up the index value column.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the index_value to be the leftmost column within the index since it should be the most distinguishable column, and removed the index_system column since it never gets used within the index because the queries generated use the IFNULL statement that forces evaluation from the actual rows

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given query

SELECT a.*
FROM ResourceEntity a
WHERE a.resourceType = 'Task'
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'identifier'
                           AND index_value = 'test-task')
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'status'
                           AND (
                             (index_value = 'ready' AND IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status') OR
                             ((index_value = 'in-progress' AND
                               IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status') OR
                              (index_value = 'requested' AND
                               IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status'))))

The query generated for previous indexes were

QUERY PLAN
|--SEARCH a USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?)
|--LIST SUBQUERY 1
|  `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value_resourceUuid (resourceType=? AND index_name=?)
`--LIST SUBQUERY 2
   `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value_resourceUuid (resourceType=? AND index_name=?)

With the changes,

QUERY PLAN
|--SEARCH a USING INDEX index_ResourceEntity_resourceUuid_resourceType (resourceUuid=? AND resourceType=?)
|--LIST SUBQUERY 1
|  `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?)
`--LIST SUBQUERY 2
   `--MULTI-INDEX OR
      |--INDEX 1
      |  `--SEARCH TokenIndexEntity USING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?)
      |--INDEX 2
      |  `--SEARCH TokenIndexEntity USING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?)
      `--INDEX 3
         `--SEARCH TokenIndexEntity USING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?)

@LZRS
Copy link
Collaborator Author

LZRS commented Nov 19, 2024

Hi @LZRS thanks for this draft PR - this is really important!

Some comments on the draft PR: it would be much easier to review smaller changes because this touches so many tables - maybe we can have 1 pr to change the indices and another one to change the queries.

secondly, can you use query planner in android studio to check if the indices are being hit? and do you have actual numbers with regards to time taken for these queries?

Okay, I will do that and share some of the related query plans

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: PR under Review
Development

Successfully merging this pull request may close these issues.

3 participants