Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add paginated HomePageCoursesV2 view with filtering & ordering #34173

Merged
merged 38 commits into from
Mar 20, 2024

Conversation

mariajgrimaldi
Copy link
Member

@mariajgrimaldi mariajgrimaldi commented Feb 2, 2024

Description

This PR adds a new version of the HomePageCourses API with pagination, filtering & ordering, intending to be used by the new Course Home MFE. The new API is based on HomePageCourses API with tweaks, so it gets courses with Course Overview helpers that allow filtering and ordering with query set API methods.

The scope of this implementation only considers the API endpoint used to list courses in the Course Home MFE which is the HomePageCoursesViewV2 that serves api/contentstore/v2/home/courses. Other API endpoints in V1 will not be implemented as part of this initiative.

API changes compared to V1

As I mentioned before, the HomePage Courses API version 2 is implemented based on the API version 1 with a few changes:

  • We're using a new get_course_context_v2 function to get the list of active and archived courses, and the in process course actions. That function is based on get_course_context but with pagination configurations.

  • Also, courses are not split into active/inactive but filtered using a status query parameter.

Other changes

Supporting information

For more information on this enhancement, please check:
https://openedx.atlassian.net/wiki/spaces/OEPM/pages/4032856101/Studio+Home+incremental+upgrades+-+product+approach

Testing instructions

1st, turn on the Django ORM only reads by setting: FEATURES["ENABLE_HOME_PAGE_COURSE_API_V2"] = True in your environment. Then, login as an administrator or as course staff (they should both work):

  1. You can use your browser to get your list of courses from Studio HTTP(s)://<CMS_HOST>/api/contentstore/v2/home/courses. Check that the output is the same as: HTTP(s)://<CMS_HOST>/api/contentstore/v1/home/courses
  2. Now, try filtering courses by their course ID, org or display name: /api/contentstore/v2/home/courses?search=demo
  3. Then, try ordering courses by their course ID, org or display name /api/contentstore/v2/home/courses?order=display_name , /api/contentstore/v2/home/courses?order=-display_name for alphabetic order, order=created or order=-created for creation date
    Or both 3. and 4.

Deadline

This effort is part of the Spanish consortium project, so it'd be ideal to merge this before the end of the project.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Feb 2, 2024
@openedx-webhooks
Copy link

openedx-webhooks commented Feb 2, 2024

Thanks for the pull request, @mariajgrimaldi! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@mariajgrimaldi mariajgrimaldi force-pushed the MJG/course-home-api-v2 branch 2 times, most recently from 064e3a6 to 150297d Compare February 2, 2024 17:43
@mariajgrimaldi mariajgrimaldi changed the title Mjg/course home api v2 feat: add CourseHomeCoursesV2 with filtering & ordering capabilities Feb 2, 2024
@mariajgrimaldi mariajgrimaldi changed the title feat: add CourseHomeCoursesV2 with filtering & ordering capabilities [WIP] feat: add CourseHomeCoursesV2 with filtering & ordering capabilities Feb 2, 2024
@mariajgrimaldi mariajgrimaldi changed the title [WIP] feat: add CourseHomeCoursesV2 with filtering & ordering capabilities [WIP] feat: add HomePageCoursesV2 with filtering & ordering capabilities Feb 5, 2024
@mariajgrimaldi
Copy link
Member Author

I've addressed your comments. Could you check again? Thanks!

@mariajgrimaldi mariajgrimaldi changed the title [WIP] feat: add HomePageCoursesV2 with filtering & ordering capabilities feat: add HomePageCoursesV2 with filtering & ordering capabilities Feb 6, 2024
@mariajgrimaldi mariajgrimaldi marked this pull request as ready for review February 6, 2024 13:54
@mariajgrimaldi
Copy link
Member Author

mariajgrimaldi commented Feb 6, 2024

Notes to reviewer

As I mentioned in the cover letter, this PR is heavily inspired by the HomePageCourses V1. We're using the same structure as the previous version implementation but with helpers that allow filtering and ordering without refactoring them to make them more suitable for this API-MFE architecture. So, there's room for improvement in these implementations to modernize them and make them more maintainable.

Comment on lines 423 to 456
def _accessible_courses_summary_iter_v2(request, org=None):
"""
List all courses available to the logged in user by iterating through all the courses.

Args:
request: the request object
org (string): if not None, this value will limit the courses returned. An empty
string will result in no courses, and otherwise only courses with the
specified org will be returned. The default value is None.
"""
def course_filter(course_summary):
"""
Filter out inaccessible courses for the current logged in user.

Args:
course_summary (CourseOverview): the course overview object.
"""
return has_studio_read_access(request.user, course_summary.id)

if org is not None:
courses_summary = [] if org == '' else CourseOverview.get_all_courses(orgs=[org])
else:
courses_summary = CourseOverview.get_all_courses()

search_query = request.GET.get('search')
order = request.GET.get('order')
active_only = get_bool_param(request, 'active_only', None)
archived_only = get_bool_param(request, 'archived_only', None)
courses_summary = get_courses_by_status(active_only, archived_only, courses_summary)
courses_summary = get_courses_by_search_query(search_query, courses_summary)
courses_summary = get_courses_order_by(order, courses_summary)
courses_summary = list(filter(course_filter, courses_summary))
in_process_course_actions = get_in_process_course_actions(request)
return courses_summary, in_process_course_actions
Copy link
Member Author

@mariajgrimaldi mariajgrimaldi Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes to reviewer

As I mentioned here, there's still room for improvement here. Here's a list of things I'd like to do but don't have the time to do so:

  • Use Custom Django filters instead of those get_courses_by... in the helpers _accessible_courses_summary_iter_v2 and _accessible_courses_list_from_groups_v2
  • Try refactoring get_courses_accessible_to_user_v2 so a single course getter could be used. Or at least smaller/simpler ones
  • Moving out this implementation to its own helper so the parent function it's easier to read

I'll still open this PR for review and hope to have more time to make these changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the idea was to replicate API v1 including some changes, it was done and it works. Later, we can make appropriate improvements to the code.

Copy link

@johnvente johnvente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I've tested this and is working for me as expected

@mariajgrimaldi
Copy link
Member Author

Hi there, folks @felipemontoya @feanil! This is our approach for the API that was in conversations for the Studio Home incremental upgrades. As the cover letter says, it's heavily inspired in Studio HomePage V1 included in these PRs:
#33204
#33909
But with the necessary changes for this feature to work. Let me know what you think about the approach!

Copy link
Contributor

@feanil feanil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL signature of v2 and the output format look like it's a superset of v1, does it make sense to make a whole new version instead of just enhancing v1? What's preventing us from updating v2 to do more work when more query params are provided? Is there something I'm missing about the signature or the output format?

def get_course_context_v2(request):
"""Get context of the homepage course tab from the Studio Home."""

from cms.djangoapps.contentstore.views.course import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this import inside this function? Can we add a comment to explain?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it outside the function along with the other imports and got:

cms-1  | Traceback (most recent call last):
cms-1  |   File "./manage.py", line 103, in <module>
cms-1  |     startup.run()
cms-1  |   File "/openedx/edx-platform/cms/startup.py", line 20, in run
cms-1  |     django.setup()
cms-1  |   File "/openedx/venv/lib/python3.8/site-packages/django/__init__.py", line 24, in setup
cms-1  |     apps.populate(settings.INSTALLED_APPS)
cms-1  |   File "/openedx/venv/lib/python3.8/site-packages/django/apps/registry.py", line 124, in populate
cms-1  |     app_config.ready()
cms-1  |   File "/openedx/edx-platform/cms/djangoapps/contentstore/apps.py", line 23, in ready
cms-1  |     from .signals import handlers  # pylint: disable=unused-import
cms-1  |   File "/openedx/edx-platform/cms/djangoapps/contentstore/signals/handlers.py", line 19, in <module>
cms-1  |     from cms.djangoapps.contentstore.courseware_index import (
cms-1  |   File "/openedx/edx-platform/cms/djangoapps/contentstore/courseware_index.py", line 15, in <module>
cms-1  |     from cms.djangoapps.contentstore.course_group_config import GroupConfiguration
cms-1  |   File "/openedx/edx-platform/cms/djangoapps/contentstore/course_group_config.py", line 12, in <module>
cms-1  |     from cms.djangoapps.contentstore.utils import reverse_usage_url
cms-1  |   File "/openedx/edx-platform/cms/djangoapps/contentstore/utils.py", line 88, in <module>
cms-1  |     from cms.djangoapps.contentstore.views.course import (
cms-1  |   File "/openedx/edx-platform/cms/djangoapps/contentstore/views/__init__.py", line 3, in <module>
cms-1  |     from .assets import *
cms-1  |   File "/openedx/edx-platform/cms/djangoapps/contentstore/views/assets.py", line 6, in <module>
cms-1  |     from cms.djangoapps.contentstore.asset_storage_handlers import (
cms-1  |   File "/openedx/edx-platform/cms/djangoapps/contentstore/asset_storage_handlers.py", line 35, in <module>
cms-1  |     from .utils import reverse_course_url, get_files_uploads_url, get_response_format, request_response_format_is_json
cms-1  | ImportError: cannot import name 'reverse_course_url' from partially initialized module 'cms.djangoapps.contentstore.utils' (most likely due to a circular import) (/openedx/edx-platform/cms/djangoapps/contentstore/utils.py)

I have already added the comment explaining why it needs to be done.


optimization_enabled = GlobalStaff().has_user(request.user) and ENABLE_GLOBAL_STAFF_OPTIMIZATION.is_enabled()

org = request.GET.get('org', '') if optimization_enabled else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like blank vs empty string will result in different behavior here? Can you add a comment explaining what the expected difference is between those two? Or maybe that explaination needs to go into the docstring of get_course_accessible_to_user_v2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is precisely how V1 behaves. So, I'll have to test out the behavior with those specifications. I'll make sure to answer once I know more.

Copy link
Member Author

@mariajgrimaldi mariajgrimaldi Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I researched the behavior: org= and org='' which correspond to '' and "''" respectively. So for the blank string "''" this line would be executed: https://github.com/openedx/edx-platform/pull/34173/files#diff-937adee5129f53ead6b3c7c988d70db9e9e95862f17a6f85a1eea2388d12b461R443, but for the empty string '' it will not. Unless there are organizations named '' or any other variation of the blank string -from what I see, you can actually insert rows into the course overview table with the org '' since there are no restrictions from the course table- then the result would be empty for the "''" case.
I'm not sure why this is a case considered, so I can't explain it.

Copy link
Member Author

@mariajgrimaldi mariajgrimaldi Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, these helpers (get_course_context_v2, etc) are based on the ones used by the v1 API with some modifications:

And so on. I thought it would be easier instead of modifying the behavior of the existing ones in case other structures depended on them. Now I think it's even more confusing having it this way, duplicating code in a V2 version helper.

So I'll be modifying the v1 helpers instead where I see fit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. But I broke some tests in the process. I'll fix them if this approach is approved.

@mariajgrimaldi
Copy link
Member Author

mariajgrimaldi commented Feb 15, 2024

The URL signature of v2 and the output format look like it's a superset of v1, does it make sense to make a whole new version instead of just enhancing v1? What's preventing us from updating v2 to do more work when more query params are provided? Is there something I'm missing about the signature or the output format?

@feanil: thanks for the reply. To clarify: this is our first approach, it's not a definitive solution to have a V2 API. We're open to this being a V1 enhancement if that's better. Here are some reasons why we think it's a breaking change, which means migrating to V2 and then deprecating V1. In V1, the output is:

        {
            "archived_courses": [...],
            "courses": [...],
            "in_process_course_actions": [],
        }

There's a differentiation between active/archived courses. In V2, the client should pass query params (active_only/archived_only or none to list both) to specify which courses to list, so it'd look like this:

        {
            "courses": [...], # list of active courses, archived or both
            "in_process_course_actions": [],
        }

Also, we're adding pagination in a different PR, which changes how to handle the data returned by the API. And the course serializer has more information, specifically the is_active and cms_link fields. Another change that would make sense is what I described here about in_process_course_actions, which would also change the API output, but it wasn't implemented.

As I mentioned, we're open to accommodating these changes as V1 improvements if that's better.

@mariajgrimaldi
Copy link
Member Author

mariajgrimaldi commented Feb 20, 2024

Some notes about performance

We ran simple performance tests to study the API's behavior with different stress loads and user roles. To do so, we used a tutor local nightly environment in an EC2 t2.large instance machine and Postman as the test runner. We started with a database equipped with ~42k course overviews:

There are 10 items per page
There are 41748 course overview records
There are 4175 pages
There are 1197 course overviews matching demo (either the org / display name)
There are 22067 active course overviews

Iterations: 1 per test
Consecutively

ID Request path Response time User role
Test 1 /api/contentstore/v2/home/courses?page=1 10.12s Global staff
Test 3 /api/contentstore/v2/home/courses?search=demo 637ms Global staff
Test 4 /api/contentstore/v2/home/courses?active_only=true 5.42s Global staff

Iterations: 1 per test
Consecutively

ID Request path Response time User role
Test 5 /api/contentstore/v2/home/courses?page=1 213ms Instructor
Test 7 /api/contentstore/v2/home/courses?search=demo 215ms Instructor

Iterations: 100 per test
Simultaneous

ID Request path Duration Avg. Resp. Time User role
Test 1 /api/contentstore/v2/home/courses?page=1 22m 10s 13250ms Global staff
Test 2 /api/contentstore/v2/home/courses?page=100...200 22m 6s 13211ms Global staff
Test 3 /api/contentstore/v2/home/courses?search=demo 11m 59s 7144ms Global staff
Test 4 /api/contentstore/v2/home/courses?active_only=true 13m 12s 7872ms Global staff

Iterations: 100 per test
Simultaneous

ID Request path Duration Avg. Resp. Time User role
Test 5 /api/contentstore/v2/home/courses?page=1 23s 831ms 194ms Instructor
Test 6 /api/contentstore/v2/home/courses?page=1...2 22s 801ms 181ms Instructor
Test 7 /api/contentstore/v2/home/courses?search=demo 23s 326ms 185ms Instructor

So, the global staff has access to the ~42k courses; meanwhile, the instructor can access 12. A staff user's average of ~13s is not a bad result given the test circumstances (like infrastructure details, amount of courses, simultaneous test runs, etc). We also ran a cprofile report in our environment, I'll attach some screenshots here for you to see:
image
image
Let me know if you want another type of information from this report.

@mariajgrimaldi mariajgrimaldi force-pushed the MJG/course-home-api-v2 branch from 08cd471 to 93e55c8 Compare March 19, 2024 21:31
Copy link
Member

@felipemontoya felipemontoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it again, it works nicely. Thanks for updating the default and making sure the tests cover this case thoroughly.

@felipemontoya
Copy link
Member

btw I should add that this time I also tested the course-authoring master to make sure that with the feature on, the old course-authoring home still works as usual.

@arbrandes arbrandes removed their request for review March 20, 2024 15:06
@mariajgrimaldi
Copy link
Member Author

I'll merge this now since all comments were addressed, and the PR has engineering approval. Thank you.

@mariajgrimaldi mariajgrimaldi merged commit 45178e0 into master Mar 20, 2024
67 checks passed
@mariajgrimaldi mariajgrimaldi deleted the MJG/course-home-api-v2 branch March 20, 2024 15:32
@openedx-webhooks
Copy link

@mariajgrimaldi 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

mariajgrimaldi added a commit to eduNEXT/frontend-app-authoring that referenced this pull request Sep 19, 2024
Turn on getting courses from the HomePageCourses API
which allows pagination, filtering and ordering. See openedx/edx-platform#34173
for more details on the API implementation.
KristinAoki pushed a commit to openedx/frontend-app-authoring that referenced this pull request Oct 16, 2024
* refactor!: turn on homepage course API V2 consumption by default

Turn on getting courses from the HomePageCourses API
which allows pagination, filtering and ordering. See openedx/edx-platform#34173
for more details on the API implementation.

* fix: home page initial a-z course sort

---------

Co-authored-by: Diana Olarte <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants