Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve Git handler and introduce caching #1689

Merged
merged 5 commits into from
Sep 20, 2024

Conversation

dominik003
Copy link
Contributor

@dominik003 dominik003 commented Aug 5, 2024

Breaking Changes

A new key .Values.valkey.password is required. Fill it with a randomly generated password.

Changes

  1. Persist Git Repository Id: This PR stores the remote repository id (i.e. project id for GitLab and {owner}/{repo_name} for GitHub) along with the git model in the database, so we only need to fetch it once. If the git model path changes, we also need to update the repository id.
  2. Persist Job Identifier for Diagram Cache: This PR allows the job identifier corresponding to the last successful pipeline to be retrieved only once, rather than for each requested diagram. This is achieved by providing the job identifier along with the initial diagram metadata response, so that the frontend can always send the job identifier along with a requested diagram. However, this behavior is optional, so if no job identifier is provided, we will still look up the identifier for each request.
  3. Git Handler Improvements: This PR contains a number of improvements/refactorings to the git handler. For example, the project identifier has been removed from all function parameters and is now set on the object itself, unused functions have been removed, return types have been changed to properly match the actual type, documentation has been added to the abstract git handler, and more.
  4. Git Handler Caching: This PR introduces an initial caching approach for files and artifacts retrieved from git pipelines. This should reduce the number of API calls we make and thus significantly improve image load times. We use redis as a caching solution with the following keys {git path}:{git revision}:f:{file path} for files and {git path}:{git revision}:a:{job id}:{file_path} for artifacts. These keys should uniquely identify a file or artifact, and the hierarchical structure allows us to easily clear the cache for certain git models (i.e., we can remove all keys under {git path}:{git revision}:*).

Resolves #893

TODO:

  • Add fallback if valkey is not reachable
  • Mount correct directories to valkey container

@dominik003 dominik003 marked this pull request as draft August 5, 2024 15:37
@dominik003 dominik003 force-pushed the refactor-git-handler branch 3 times, most recently from d3fd5fe to 0a61036 Compare August 5, 2024 16:42
Copy link

codecov bot commented Aug 5, 2024

Codecov Report

Attention: Patch coverage is 70.28302% with 63 lines in your changes missing coverage. Please review.

Project coverage is 84.13%. Comparing base (7e18c2d) to head (9bc25bd).
Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
...jects/toolmodels/modelsources/git/handler/cache.py 35.29% 33 Missing ⚠️
...cts/toolmodels/modelsources/git/handler/handler.py 83.78% 2 Missing and 4 partials ⚠️
...ollab/projects/toolmodels/modelsources/git/crud.py 61.53% 2 Missing and 3 partials ⚠️
...ects/toolmodels/modelsources/git/github/handler.py 79.16% 4 Missing and 1 partial ⚠️
...pellacollab/projects/toolmodels/diagrams/routes.py 70.00% 3 Missing ⚠️
...ects/toolmodels/modelsources/git/gitlab/handler.py 83.33% 3 Missing ⚠️
...cts/toolmodels/modelsources/git/handler/factory.py 88.88% 2 Missing and 1 partial ⚠️
...llacollab/projects/toolmodels/modelbadge/routes.py 50.00% 2 Missing ⚠️
...projects/toolmodels/modelsources/git/validation.py 0.00% 2 Missing ⚠️
...lab/projects/toolmodels/modelsources/git/routes.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1689      +/-   ##
==========================================
- Coverage   84.31%   84.13%   -0.19%     
==========================================
  Files         193      194       +1     
  Lines        6344     6441      +97     
  Branches      693      708      +15     
==========================================
+ Hits         5349     5419      +70     
- Misses        846      867      +21     
- Partials      149      155       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dominik003 dominik003 force-pushed the refactor-git-handler branch from 0a61036 to bc03a70 Compare August 12, 2024 13:19
Copy link

The generated OpenAPI client is not up to date with the latest changes in the OpenAPI specification.
Please run make openapi locally and commit the changes.

@dominik003 dominik003 force-pushed the refactor-git-handler branch from bc03a70 to 595f7ba Compare August 19, 2024 14:42
Copy link

The generated OpenAPI client is not up to date with the latest changes in the OpenAPI specification.
Please run make openapi locally and commit the changes.

@dominik003 dominik003 force-pushed the refactor-git-handler branch from 595f7ba to 0a6a9fb Compare August 19, 2024 16:36
Copy link

github-actions bot commented Aug 19, 2024

A Storybook preview is available for commit 90c9982.
➡️ View Storybook
➡️ View Chromatic build
✅ Captured 41 snapshots. No changes detected.

Copy link

The generated OpenAPI client is not up to date with the latest changes in the OpenAPI specification.
Please run make openapi locally and commit the changes.

@dominik003 dominik003 force-pushed the refactor-git-handler branch 2 times, most recently from 2fb7b36 to d6fcb0f Compare August 26, 2024 14:11
@dominik003 dominik003 force-pushed the refactor-git-handler branch 3 times, most recently from ed83167 to 4532c46 Compare September 9, 2024 12:42
Copy link

github-actions bot commented Sep 9, 2024

The generated OpenAPI client is not up to date with the latest changes in the OpenAPI specification.
Please run make openapi locally and commit the changes.

Copy link

github-actions bot commented Sep 9, 2024

The generated OpenAPI client is not up to date with the latest changes in the OpenAPI specification.
Please run make openapi locally and commit the changes.

Copy link

github-actions bot commented Sep 9, 2024

The generated OpenAPI client is not up to date with the latest changes in the OpenAPI specification.
Please run make openapi locally and commit the changes.

@dominik003 dominik003 force-pushed the refactor-git-handler branch 3 times, most recently from 3f67356 to 8caa705 Compare September 9, 2024 13:50
- key: redis-config
path: redis.conf
containers:
- name: {{ .Release.Name }}-backend-redis

Check warning

Code scanning / SonarCloud

Storage limits should be enforced

<!--SONAR_ISSUE_KEY:AZHW_sqmuI64PkU6s7jd-->Specify a storage limit for this container. <p>See more on <a href="https://sonarcloud.io/project/issues?id=DSD-DBS_capella-collab-manager&issues=AZHW_sqmuI64PkU6s7jd&open=AZHW_sqmuI64PkU6s7jd&pullRequest=1689">SonarCloud</a></p>
@dominik003 dominik003 force-pushed the refactor-git-handler branch 2 times, most recently from b32b916 to bbc3f52 Compare September 9, 2024 15:06
MoritzWeber0 added a commit that referenced this pull request Sep 13, 2024
This commit requests changes on the PR #1689. In particular, the following has
changed:

- Replace cache key from path and revision with git_model_id (database id). The
  unique database ID of the Git model in the Collaboration Manager has to be
  part of the cache key. For safety reasons, I'd not share a cache between
  projects even though they link to the same repository. This doesn't really
  matter currently since we make new requests against the Git API every time,
  but we can't assure that for the future.
  - An attack could look like:
  - Project 1 links Git repository "https://example.com" to a model.
  - Project 2 links the same Git repository to a model with a different token.
  - The token of project 1 expires. Project 2 keeps updating the cache with a
    valid token.
  - Users of project 1 can still access the cache for project 2 without a valid
    token.
- Since a git_model only has one revision, it's safe to drop the revision for
  artifacts. The revision is only relevant for files from the repository. In
  those cases, the value was passed as parameter every time. Therefore, I've
  removed it from the Cache object.
- For license reasons, I switched from Redis to Valkey.
- Use a Kubernetes secret for the valkey password.
- For REST compatibility, I've changed the resource name of "empty_cache" to
  "cache".
- I've reenabled the model badge error handling.
- Added a new function loadDiagramCacheMetadata in
  model-diagram-dialog.component.ts to reduce duplicated code.
- Added the job id to the diagram cache code snippet.
- Changed the styling of the "Clear cache" button.
Copy link

The generated OpenAPI client is not up to date with the latest changes in the OpenAPI specification.
Please run make openapi locally and commit the changes.

dominik003 pushed a commit that referenced this pull request Sep 16, 2024
This commit requests changes on the PR #1689. In particular, the following has
changed:

- Replace cache key from path and revision with git_model_id (database id). The
  unique database ID of the Git model in the Collaboration Manager has to be
  part of the cache key. For safety reasons, I'd not share a cache between
  projects even though they link to the same repository. This doesn't really
  matter currently since we make new requests against the Git API every time,
  but we can't assure that for the future.
  - An attack could look like:
  - Project 1 links Git repository "https://example.com" to a model.
  - Project 2 links the same Git repository to a model with a different token.
  - The token of project 1 expires. Project 2 keeps updating the cache with a
    valid token.
  - Users of project 1 can still access the cache for project 2 without a valid
    token.
- Since a git_model only has one revision, it's safe to drop the revision for
  artifacts. The revision is only relevant for files from the repository. In
  those cases, the value was passed as parameter every time. Therefore, I've
  removed it from the Cache object.
- For license reasons, I switched from Redis to Valkey.
- Use a Kubernetes secret for the valkey password.
- For REST compatibility, I've changed the resource name of "empty_cache" to
  "cache".
- I've reenabled the model badge error handling.
- Added a new function loadDiagramCacheMetadata in
  model-diagram-dialog.component.ts to reduce duplicated code.
- Added the job id to the diagram cache code snippet.
- Changed the styling of the "Clear cache" button.
@dominik003 dominik003 force-pushed the refactor-git-handler branch 6 times, most recently from 8b89310 to 0e57c16 Compare September 16, 2024 15:10
@dominik003
Copy link
Contributor Author

@MoritzWeber0 I went over your comments and implemented them in my last commit. I mentioned open issues in my replies to your comments, but the main open issue is how to handle cache clearing (see my comment above), and what resource we should allocate to the valkey pod.
Another open question is how to properly set the cache TTL, which is currently set to 3600, i.e. 1 hour, which is probably too low. Theoretically, we could set it to infinite, since valkey should take care of removing keys when the cache is full, but personally, I would say that it makes sense to have a TTL (even if it is set quite high).

dominik003 and others added 4 commits September 20, 2024 15:08
This commit requests changes on the PR #1689. In particular, the following has
changed:

- Replace cache key from path and revision with git_model_id (database id). The
  unique database ID of the Git model in the Collaboration Manager has to be
  part of the cache key. For safety reasons, I'd not share a cache between
  projects even though they link to the same repository. This doesn't really
  matter currently since we make new requests against the Git API every time,
  but we can't assure that for the future.
  - An attack could look like:
  - Project 1 links Git repository "https://example.com" to a model.
  - Project 2 links the same Git repository to a model with a different token.
  - The token of project 1 expires. Project 2 keeps updating the cache with a
    valid token.
  - Users of project 1 can still access the cache for project 2 without a valid
    token.
- Since a git_model only has one revision, it's safe to drop the revision for
  artifacts. The revision is only relevant for files from the repository. In
  those cases, the value was passed as parameter every time. Therefore, I've
  removed it from the Cache object.
- For license reasons, I switched from Redis to Valkey.
- Use a Kubernetes secret for the valkey password.
- For REST compatibility, I've changed the resource name of "empty_cache" to
  "cache".
- I've reenabled the model badge error handling.
- Added a new function loadDiagramCacheMetadata in
  model-diagram-dialog.component.ts to reduce duplicated code.
- Added the job id to the diagram cache code snippet.
- Changed the styling of the "Clear cache" button.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

- Add vertical autoscaler to track resource usage of valkey
- Increase TTL for valkey data
- Clear cache when git model is updated
- Mount data and config properly to valkey
- Fallback to traditional loading if saving or loading in valkey fails
Copy link

sonarcloud bot commented Sep 20, 2024

Copy link

This report was generated by comparing 90c9982 with 7e18c2d.
If you would like to check difference, please check here.

change detected

ArtifactName: reg

item count
pass 242
change 16
new 4
delete 0
📝 Report

Differences

Model Components_Model Diagram Cache_Combined_desktop.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_Combined_mobile.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_Diagram Error_desktop.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_Diagram Error_mobile.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_Diagram Loaded_desktop.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_Diagram Loaded_mobile.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_Diagram Not Loaded_desktop.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_Diagram Not Loaded_mobile.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_Loading Without Scroll_desktop.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_Loading Without Scroll_mobile.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_Loading_desktop.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_Loading_mobile.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_No Diagram For Filter_desktop.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_No Diagram For Filter_mobile.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_No Diagram_desktop.png

actual Actual
expected Expected
difference Difference

Model Components_Model Diagram Cache_No Diagram_mobile.png

actual Actual
expected Expected
difference Difference

New Items

Model Components_Model Diagram Code Block_Code Block With Job ID_desktop.png

NewItem

Model Components_Model Diagram Code Block_Code Block With Job ID_mobile.png

NewItem

Model Components_Model Diagram Code Block_Code Block With Token_desktop.png

NewItem

Model Components_Model Diagram Code Block_Code Block With Token_mobile.png

NewItem

Copy link
Member

@MoritzWeber0 MoritzWeber0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate all the work that went into this PR. It not only makes loading the diagram cache and model badges faster, but also lays the foundation for future caching in the application. Thanks for that! 🥳

@MoritzWeber0 MoritzWeber0 merged commit 6828e3a into main Sep 20, 2024
31 of 32 checks passed
@MoritzWeber0 MoritzWeber0 deleted the refactor-git-handler branch September 20, 2024 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache diagram cache from Github
2 participants