From 6be622e3a8563070e553bbfe61c5adf01a35176f Mon Sep 17 00:00:00 2001 From: Braden MacDonald Date: Sun, 17 Mar 2024 19:47:37 -0700 Subject: [PATCH] docs: added ADR --- .../docs/decisions/0001-meilisearch.rst | 151 ++++++++++++++++++ 1 file changed, 151 insertions(+) create mode 100644 openedx/core/djangoapps/content/search/docs/decisions/0001-meilisearch.rst diff --git a/openedx/core/djangoapps/content/search/docs/decisions/0001-meilisearch.rst b/openedx/core/djangoapps/content/search/docs/decisions/0001-meilisearch.rst new file mode 100644 index 000000000000..1f355409da0e --- /dev/null +++ b/openedx/core/djangoapps/content/search/docs/decisions/0001-meilisearch.rst @@ -0,0 +1,151 @@ +Studio Content Search Powered by Meilisearch +############################################ + +Status +****** + +Draft + + +Context +******* + +Existing search functionality +============================= + +The Open edX platform currently implements many different forms of search. For +example, users can search for course content, library content, forum posts, and +more. Most of the search functionality in the core platform is powered by the +Elasticsearch search engine (though other functionality developed by 2U, such as +in edx-enterprise, is powered by Algolia). + +Most uses of Elasticsearch in Open edX use +`edx-search `_ which provides a partial +abstraction over Elasticsearch. The edx-search library formerly used +`django-haystack `_ as an abstraction +layer across search engines, but "that was ripped out after the package was +abandoned upstream and it became an obstacle to upgrades and efficiently +utilizing Elasticsearch (the abstraction layer imposed significant limits)" +(thanks to Jeremy Bowman for this context). Due to these changes, the current +edx-search API is a mix of abstractions and direct usage of the Elasticsearch +API, which makes it confusing and difficult to work with. In addition, each +usage of edx-search has been implemented fairly differently. See +`State of edx-search `_ +for details (thanks to Andy Shultz). + +Other platform components use Elasticsearch more directly: + +* ``course-discovery`` and ``edx-notes-api`` do not use ``edx-search``, but are + very tied to Elasticsearch via the use of ``django-elasticsearch-dsl`` and + ``django-elasticsearch-drf``. +* ``cs_comments_service`` uses Elasticsearch via the official ruby gems. + +Problems with Elasticsearch +=========================== + +At the same time, there are many problems with the current reliance on +Elasticsearch: + +1. In 2021, the license of Elasticsearch changed from Apache 2.0 to a more + restrictive license that prohibits providing "the products to others as a + managed service". Consequently, AWS forked the search engine to create + OpenSearch and no longer offers Elasticsearch as a service. This is + problematic for many Open edX operators that use AWS and prefer to avoid + any third-party services. +2. Elasticsearch is very resource-intensive and often uses more than a gigabyte + of memory just for small search use cases. +3. Elasticsearch has poor support for multi-tenancy, which multiplies the + problem of resource usage for organizations with many small Open edX sites. +4. The existing usage of edx-search/Elasticsearch routes all search requests and + result processing through edxapp (the LMS) or other IDAs, increasing the + load on those applications. + +Need for Studio Search +====================== + +At the time of this ADR, we have a goal to implement new search functionality in +Studio, to support various course authoring workflows. + +Meilisearch +=========== + +Meilisearch ("MAY-lee search") is a new, promising search engine that offers a +compelling alternative to Elasticsearch. It is open source, feature rich, and +very fast and memory efficient (written in Rust, uses orders of magnitude less +memory than Elasticsearch for small datasets). It has a simple API with an +official python driver, and has official integrations with the popular +Instantsearch frontend library from Algolia. It has strong support for +multi-tenancy, and allows creating restricted API keys that incorporate a user's +permissions, so that search requests can be made directly from the user to +Meilisearch, rather than routing them through Django. Initial testing has shown +it to be much more developer friendly than Elasticsearch/OpenSearch. + +At the time of writing, there are only two known concerns with Meilisearch: + +1. It doesn't (yet) support High Availability via replication, although this is + planned and under development. It does have other features to support high + availability, such as very low restart time (in ms). +2. It doesn't support boolean operators in keyword search ("red AND panda"), + though it does of course support boolean operators in filters. This is a + product decision aimed at keeping the user experience simple, and is unlikely + to change. + + +Decision +******** + +1. We will implement the new Studio search functionality using Meilisearch, + as an experiment and to evaluate it more thoroughly. +2. The Studio search functionality will be disabled by default in the next + Open edX release (Redwood), so that Meilisearch will not be a requirement + for any default nor existing features. This will also allow us to evaluate it + before deciding to embrace it or replace it. +3. We will keep the Meilisearch-specific code isolated to the + new ``content/search`` Django app, so it's relatively easy to swap out later + if this experiment doesn't pan out. +4. We will not use ``edx-search`` for the new search functionality. + + +Consequences +************ + +1. Organizations that wish to try out the new Studio Search functionality in + the Redwood release will have to install and configure Meilisearch. +2. Building both the backend and frontend components of the Studio search + project will be much faster and simpler than if we used ElasticSearch, + edx-search, OpenSearch, django-haystack, etc. +3. Keyword search with boolean operators will not be supported in any of the new + search features. + + +Alternatives Considered +*********************** + +OpenSearch Only +=============== + +Moving existing search functionality to OpenSearch is a possibility, but though +it mostly addresses the licensing issue, it doesn't solve the problems of +resource usage, API complexity, edx-search API complexity, lack of Instantsearch +integration, and poor multi-tenancy. + +OpenSearch and Elasticsearch +============================ + +When OpenSearch was originally forked from Elasticsearch, it was completely API +compatible, but over time they have developed along divergent paths. Regardless +of whether ElasticSearch and OpenSearch are actually wire-compatible, recent +versions of all the official ElasticSearch clients have been made to actively +reject connections to OpenSearch, which is why you generally won't find client +libraries that work with both engines, and why there are OpenSearch forks of +everything on the client side as well as the server side. + +As there is no ready-to-use abstraction layer that would allow us to comfortably +support both, and no interest in maintaining one ourselves, this is not an +appealing option. + +Algolia +======= + +Algolia is a great search engine service, but as it is a proprietary product, it +is not suitable as a requirement for an open source platform like Open edX.