PCR optimize instructor's course query #504

AaDalal · 2023-08-23T07:08:31Z

NOTE: this is not currently working! This is a useful baseline but it is only a starting point as there is still debugging to be done.
Use the USE_NEW_QUERIES toggle in review.views to switch between using the new (not working) queries and the old (working but slow) queries. After you flip this toggle you can run the server and visit a URL like http://127.0.0.1:8000/api/review/instructor/14021 to see what the new and old queries return, respectively.

This PR aims to optimize the query (in postgres) that aggregates reviews for an instructor's courses.

rohangpta · 2023-08-23T22:32:19Z

backend/review/views.py

@@ -372,6 +373,8 @@ def check_instructor_id(instructor_id):
    "difficulty",
 ]

+# TODO: remove before merging into prod


Commenting here as reminder

rohangpta · 2023-08-23T22:35:27Z

backend/review/views.py

@@ -446,6 +490,7 @@ def instructor_reviews(request, instructor_id):
    courses_res = dict()
    max_sem = dict()
    for r in courses.values():
+        print(r)


rohangpta · 2023-08-23T22:40:30Z

Is the idea behind this PR to remove some of the python logic that gets max sem and port over to SQL?

Why do we see these performance issues to begin with? We are caching pages for 1 month as per backend/review/urls.py and requests after that should be quite free. Should we try to also warm the cache somehow?

Have you spoken to your friend about scraping our data? I worry we're seeing some interference here

AaDalal · 2023-09-04T04:41:02Z

Is the idea behind this PR to remove some of the python logic that gets max sem and port over to SQL?

Why do we see these performance issues to begin with? We are caching pages for 1 month as per backend/review/urls.py and requests after that should be quite free. Should we try to also warm the cache somehow?

Have you spoken to your friend about scraping our data? I worry we're seeing some interference here

Yeah -- he's not doing any scraping

AaDalal · 2023-09-04T04:42:00Z

Is the idea behind this PR to remove some of the python logic that gets max sem and port over to SQL?

Why do we see these performance issues to begin with? We are caching pages for 1 month as per backend/review/urls.py and requests after that should be quite free. Should we try to also warm the cache somehow?

Have you spoken to your friend about scraping our data? I worry we're seeing some interference here

Not it's changing the output query logic (e.g., you can try flipping the toggle and looking at the generated SQL queries. The ones output by the current query logic are really complex, and these ones are faster to execute by the postgres query engine.

AaDalal added 2 commits August 23, 2023 02:34

add option to exclude core_metrics

2616531

Optimizing core_metrics for PCR instructor course reviews

284ade8

AaDalal marked this pull request as draft August 23, 2023 07:08

rohangpta reviewed Aug 23, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCR optimize instructor's course query #504

PCR optimize instructor's course query #504

AaDalal commented Aug 23, 2023 •

edited

Loading

rohangpta Aug 23, 2023

rohangpta Aug 23, 2023

rohangpta commented Aug 23, 2023

AaDalal commented Sep 4, 2023

AaDalal commented Sep 4, 2023 •

edited

Loading

PCR optimize instructor's course query #504

Are you sure you want to change the base?

PCR optimize instructor's course query #504

Conversation

AaDalal commented Aug 23, 2023 • edited Loading

rohangpta Aug 23, 2023

Choose a reason for hiding this comment

rohangpta Aug 23, 2023

Choose a reason for hiding this comment

rohangpta commented Aug 23, 2023

AaDalal commented Sep 4, 2023

AaDalal commented Sep 4, 2023 • edited Loading

AaDalal commented Aug 23, 2023 •

edited

Loading

AaDalal commented Sep 4, 2023 •

edited

Loading