diff --git a/docs/conf.py b/docs/conf.py index 027e56808b6a..f37fc32f6160 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -58,6 +58,7 @@ 'sphinx.ext.autodoc', 'sphinx.ext.coverage', 'sphinx.ext.doctest', + 'sphinx.ext.graphviz', 'sphinx.ext.ifconfig', 'sphinx.ext.intersphinx', 'sphinx.ext.mathjax', diff --git a/scripts/user_retirement/README.rst b/scripts/user_retirement/README.rst index 20c99197ed6d..d5417f7f7e75 100644 --- a/scripts/user_retirement/README.rst +++ b/scripts/user_retirement/README.rst @@ -1,7 +1,7 @@ User Retirement Scripts ======================= -`This `_ directory contains python scripts which are migrated from the `tubular `_ respository. +`This `_ directory contains python scripts which are migrated from the `tubular `_ respository. These scripts are intended to drive the user retirement workflow which involves handling the deactivation or removal of user accounts as part of the platform's management process. These scripts could be called from any automation/CD framework. @@ -49,9 +49,9 @@ In-depth Documentation and Configuration Steps For in-depth documentation and essential configurations follow these docs -`Documentation `_ +`Documentation `_ -`Configuration Docs `_ +`Configuration Docs `_ Execute Script diff --git a/scripts/user_retirement/docs/driver_setup.rst b/scripts/user_retirement/docs/driver_setup.rst new file mode 100644 index 000000000000..c6d7b8e6f292 --- /dev/null +++ b/scripts/user_retirement/docs/driver_setup.rst @@ -0,0 +1,134 @@ +.. _driver-setup: + +############################################# +Setting Up the User Retirement Driver Scripts +############################################# + +`scripts/user_retirement `_ +is a directory of Python scripts designed to plug into various automation +tooling. It also contains readme file having details of how to run the scripts. +Included in this directory are two scripts intended to drive the user +retirement workflow. + +``get_learners_to_retire.py`` + Generates a list of users that are ready for immediate retirement. Users + are "ready" after a certain number of days spent in the ``PENDING`` state, + specified by the ``--cool_off_days`` argument. Produces an output intended + for consumption by Jenkins in order to spawn separate downstream builds for + each user. +``retire_one_learner.py`` + Retires the user specified by the ``--username`` argument. + +These two scripts share a required ``--config_file`` argument, which specifies +the driver configuration file for your environment (for example, production). +This configuration file is a YAML file that contains LMS auth secrets, API URLs, +and retirement pipeline stages specific to that environment. Here is an example +of a driver configuration file. + +.. code-block:: yaml + + client_id: + client_secret: + + base_urls: + lms: https://courses.example.com/ + ecommerce: https://ecommerce.example.com/ + credentials: https://credentials.example.com/ + + retirement_pipeline: + - ['RETIRING_EMAIL_LISTS', 'EMAIL_LISTS_COMPLETE', 'LMS', 'retirement_retire_mailings'] + - ['RETIRING_ENROLLMENTS', 'ENROLLMENTS_COMPLETE', 'LMS', 'retirement_unenroll'] + - ['RETIRING_LMS_MISC', 'LMS_MISC_COMPLETE', 'LMS', 'retirement_lms_retire_misc'] + - ['RETIRING_LMS', 'LMS_COMPLETE', 'LMS', 'retirement_lms_retire'] + +The ``client_id`` and ``client_secret`` keys contain the oauth credentials. +These credentials are simply copied from the output of the +``create_dot_application`` management command described in +:ref:`retirement-service-user`. + +The ``base_urls`` section in the configuration file defines the mappings of +IDA to base URLs used by the scripts to construct API URLs. Only the LMS is +mandatory here, but if any of your pipeline states contain API calls to other +services, those services must also be present in the ``base_urls`` section. + +The ``retirement_pipeline`` section defines the steps, state names, and order +of execution for each environment. Each item is a list in the form of: + +#. Start state name +#. End state name +#. IDA to call against (LMS, ECOMMERCE, or CREDENTIALS currently) +#. Method name to call in + `edx_api.py `_ + +For example: ``['RETIRING_CREDENTIALS', 'CREDENTIALS_COMPLETE', 'CREDENTIALS', +'retire_learner']`` will set the user's state to ``RETIRING_CREDENTIALS``, call +a pre-instantiated ``retire_learner`` method in the ``CredentialsApi``, then set +the user's state to ``CREDENTIALS_COMPLETE``. + +******** +Examples +******** + +The following are some examples of how to use the driver scripts. + +================== +Set Up Environment +================== + +Follow this `readme `_ to set up your execution environment. + +========================= +List of Targeted Learners +========================= + +Generate a list of learners that are ready for retirement (those learners who +have selected and confirmed account deletion and have been in the ``PENDING`` +state for the time specified ``cool_off_days``). + +.. code-block:: bash + + mkdir learners_to_retire + get_learners_to_retire.py \ + --config_file=path/to/config.yml \ + --output_dir=learners_to_retire \ + --cool_off_days=5 + +===================== +Run Retirement Script +===================== + +After running these commands, the ``learners_to_retire`` directory contains +several INI files, each containing a single line in the form of ``USERNAME +=``. Iterate over these files while executing the +``retire_one_learner.py`` script on each learner with a command like the following. + +.. code-block:: bash + + retire_one_learner.py \ + --config_file=path/to/config.yml \ + --username= + + +************************************************** +Using the Driver Scripts in an Automated Framework +************************************************** + +At edX, we call the user retirement scripts from +`Jenkins `_ jobs on one of our internal Jenkins +services. The user retirement driver scripts are intended to be agnostic +about which automation framework you use, but they were only fully tested +from Jenkins. + +For more information about how we execute these scripts at edX, see the +following wiki articles: + +* `User Retirement Jenkins Implementation `_ +* `How to: retirement Jenkins jobs development and testing `_ + +And check out the Groovy DSL files we use to seed these jobs: + +* `platform/jobs/RetirementJobs.groovy in edx/jenkins-job-dsl `_ +* `platform/jobs/RetirementJobEdxTriggers.groovy in edx/jenkins-job-dsl `_ + +.. include:: ../../../../links/links.rst + diff --git a/scripts/user_retirement/docs/implementation_overview.rst b/scripts/user_retirement/docs/implementation_overview.rst new file mode 100644 index 000000000000..37a814c1d583 --- /dev/null +++ b/scripts/user_retirement/docs/implementation_overview.rst @@ -0,0 +1,117 @@ +.. _Implmentation: + +####################### +Implementation Overview +####################### + +In the Open edX platform, the user experience is enabled by several +services, such as LMS, Studio, ecommerce, credentials, discovery, and more. +Personally Identifiable Identification (PII) about a user can exist in many of +these services. As a consequence, to remove a user's PII, you must be able +to request each service containing PII to remove, delete, or unlink the +data for that user in that service. + +In the user retirement feature, a centralized process (the *driver* scripts) +orchestrates all of these requests. For information about how to configure the +driver scripts, see :ref:`driver-setup`. + +**************************** +The User Retirement Workflow +**************************** + +The user retirement workflow is a configurable pipeline of building-block +APIs. These APIs are used to: + + * "Forget" a retired user's PII + * Prevent a retired user from logging back in + * Prevent re-use of the username or email address of a retired user + +Depending on which third parties a given Open edX instance integrates with, +the user retirement process may need to call out to external services or to +generate reports for later processing. Any such reports must subsequently be +destroyed. + +Configurability and adaptability were design goals from the beginning, so this +user retirement tooling should be able to accommodate a wide range of Open edX +sites and custom use cases. + +The workflow is designed to be linear and rerunnable, allowing recovery and +continuation in cases where a particular stage fails. Each user who has +requested retirement will be individually processed through this workflow, so +multiple users could be in the same state simultaneously. The LMS is the +authoritative source of information about the state of each user in the +retirement process, and the arbiter of state progressions, using the +``UserRetirementStatus`` model and associated APIs. The LMS also holds a +table of the states themselves (the ``RetirementState`` model), rather than +hard-coding the states. This was done because we cannot predict all the +possible states required by all members of the Open edX community. + +This example state diagram outlines the pathways users follow throughout the +workflow: + +.. digraph:: retirement_states_example + :align: center + + ranksep = "0.3"; + + node[fontname=Courier,fontsize=12,shape=box,group=main] + { rank = same INIT[style=invis] PENDING } + INIT -> PENDING; + "..."[shape=none] + PENDING -> RETIRING_ENROLLMENTS -> ENROLLMENTS_COMPLETE -> RETIRING_FORUMS -> FORUMS_COMPLETE -> "..." -> COMPLETE; + + node[group=""]; + RETIRING_ENROLLMENTS -> ERRORED; + RETIRING_FORUMS -> ERRORED; + PENDING -> ABORTED; + + subgraph cluster_terminal_states { + label = "Terminal States"; + labelloc = b // put label at bottom + {rank = same ERRORED COMPLETE ABORTED} + } + +Unless an error occurs internal to the user retirement tooling, a user's +retirement state should always land in one of the terminal states. At that +point, either their entry should be cleaned up from the +``UserRetirementStatus`` table or, if the state is ``ERRORED``, the +administrator needs to examine the error and resolve it. For more information, +see :ref:`recovering-from-errored`. + +******************* +The User Experience +******************* + +From the learner's perspective, the vast majority of this process is obscured. +The Account page contains a new section titled **Delete My Account**. In this +section, a learner may click the **Delete My Account** button and enter +their password to confirm their request. Subsequently, all of the learner's +browser sessions are logged off, and they become locked out of their account. + +An informational email is immediately sent to the learner to confirm the +deletion of their account. After this email is sent, the learner has a limited +amount of time (defined by the ``--cool_off_days`` argument described in +:ref:`driver-setup`) to contact the site administrators and rescind their +request. + +At this point, the learner's account has been deactivated, but *not* retired. +An entry in the ``UserRetirementStatus`` table is added, and their state set to +``PENDING``. + +By default, the **Delete My Account** section is visible and the button is +enabled, allowing account deletions to queue up. The +``ENABLE_ACCOUNT_DELETION`` feature in django settings toggles the visibility +of this section. See :ref:`django-settings`. + +================ +Third Party Auth +================ + +Learners who registered using social authentication must first unlink their +LMS account from their third-party account. For those learners, the **Delete +My Account** button will be disabled until they do so; meanwhile, they will be +instructed to follow the procedure in this help center article: `How do I link +or unlink my edX account to a social media +account? `_. + +.. include:: ../../../../links/links.rst diff --git a/scripts/user_retirement/docs/index.rst b/scripts/user_retirement/docs/index.rst new file mode 100644 index 000000000000..383c7a6aa8e1 --- /dev/null +++ b/scripts/user_retirement/docs/index.rst @@ -0,0 +1,38 @@ +.. _Enabling User Retirement: + +#################################### +Enabling the User Retirement Feature +#################################### + +There have been many changes to privacy laws (for example, GDPR or the +European Union General Data Protection Regulation) intended to change the way +that businesses think about and handle Personally Identifiable Information +(PII). + +As a step toward enabling Open edX to support some of the key updates in privacy +laws, edX has implemented APIs and tooling that enable Open edX instances to +retire registered users. When you implement this user retirement feature, your +Open edX instance can automatically erase PII for a given user from systems that +are internal to Open edX (for example, the LMS, forums, credentials, and other +independently deployable applications (IDAs)), as well as external systems, such +as third-party marketing services. + +This section is intended not only for instructing Open edX admins to perform +the basic setup, but also to offer some insight into the implementation of the +user retirement feature in order to help the Open edX community build +additional APIs and states that meet their special needs. Custom code, +plugins, packages, or XBlocks in your Open edX instance might store PII, but +this feature will not magically find and clean up that PII. You may need to +create your own custom code to include PII that is not covered by the user +retirement feature. + +.. toctree:: + :maxdepth: 1 + + implementation_overview + service_setup + driver_setup + special_cases + +.. include:: ../../../../links/links.rst + diff --git a/scripts/user_retirement/docs/service_setup.rst b/scripts/user_retirement/docs/service_setup.rst new file mode 100644 index 000000000000..4fd59fcfd3ad --- /dev/null +++ b/scripts/user_retirement/docs/service_setup.rst @@ -0,0 +1,179 @@ +.. _Service Setup: + +##################################### +Setting Up User Retirement in the LMS +##################################### + +This section describes how to set up and configure the user retirement feature +in the Open edX LMS. + +.. _django-settings: + +*************** +Django Settings +*************** + +The following Django settings control the behavior of the user retirement +feature. Note that some of these settings values are lambda functions rather +than standard string literals. This is intentional; it is a pattern for +defining *derived* settings specific to Open edX. Read more about it in +`openedx/core/lib/derived.py +`_. + +.. list-table:: + :header-rows: 1 + + * - Setting Name + - Default + - Description + * - RETIRED_USERNAME_PREFIX + - ``'retired__user_'`` + - The prefix part of hashed usernames. Used in ``RETIRED_USERNAME_FMT``. + * - RETIRED_EMAIL_PREFIX + - ``'retired__user_'`` + - The prefix part of hashed emails. Used in ``RETIRED_EMAIL_FMT``. + * - RETIRED_EMAIL_DOMAIN + - ``'retired.invalid'`` + - The domain part of hashed emails. Used in ``RETIRED_EMAIL_FMT``. + * - RETIRED_USERNAME_FMT + - ``lambda settings: + settings.RETIRED_USERNAME_PREFIX + '{}'`` + - The username field for a retired user gets transformed into this format, + where ``{}`` is replaced with the hash of their username. + * - RETIRED_EMAIL_FMT + - ``lambda settings: + settings.RETIRED_EMAIL_PREFIX + '{}@' + + settings.RETIRED_EMAIL_DOMAIN`` + - The email field for a retired user gets transformed into this format, where + ``{}`` is replaced with the hash of their email. + * - RETIRED_USER_SALTS + - None + - A list of salts used for hashing usernames and emails. Only the last item in this list is used as a salt for all new retirements, but historical salts are preserved in order to guarantee that all hashed usernames and emails can still be checked. The default value **MUST** be overridden! + * - RETIREMENT_SERVICE_WORKER_USERNAME + - ``'RETIREMENT_SERVICE_USER'`` + - The username of the retirement service worker. + * - RETIREMENT_STATES + - See `lms/envs/common.py `_ + in the ``RETIREMENT_STATES`` setting + - A list that defines the name and order of states for the retirement + workflow. See `Retirement States`_ for details. + * - FEATURES['ENABLE_ACCOUNT_DELETION'] + - True + - Whether to display the "Delete My Account" section the account settings page. + + +================= +Retirement States +================= + +The state of each user's retirement is stored in the LMS database, and the +state list itself is also separately stored in the database. We expect the +list of states will be variable over time and across different Open edX +installations, so it is the responsibility of the administrator to populate +the state list. + +The default states are defined in `lms/envs/common.py +`_ +in the ``RETIREMENT_STATES`` setting. There must be, at minimum, a ``PENDING`` +state at the beginning, and ``COMPLETED``, ``ERRORED``, and ``ABORTED`` states +at the end of the list. Also, for every ``RETIRING_foo`` state, there must be +a corresponding ``foo_COMPLETE`` state. + +Override these states if you need to add any states. Typically, these +settings are set in ``lms.yml``. + +After you have defined any custom states, populate the states table with the +following management command: + +.. code-block:: bash + + $ ./manage.py lms --settings= populate_retirement_states + + All states removed and new states added. Differences: + Added: set([u'RETIRING_ENROLLMENTS', u'RETIRING_LMS', u'LMS_MISC_COMPLETE', u'RETIRING_LMS_MISC', u'ENROLLMENTS_COMPLETE', u'LMS_COMPLETE']) + Removed: set([]) + Remaining: set([u'ERRORED', u'PENDING', u'ABORTED', u'COMPLETE']) + States updated successfully. Current states: + PENDING (step 1) + RETIRING_ENROLLMENTS (step 11) + ENROLLMENTS_COMPLETE (step 21) + RETIRING_LMS_MISC (step 31) + LMS_MISC_COMPLETE (step 41) + RETIRING_LMS (step 51) + LMS_COMPLETE (step 61) + ERRORED (step 71) + ABORTED (step 81) + COMPLETE (step 91) + +In this example, some states specified in settings were already present, so +they were listed under ``Remaining`` and were not re-added. The command output +also prints the ``Current states``; this represents all the states in the +states table. The ``populate_retirement_states`` command is idempotent, and +always attempts to make the states table reflect the ``RETIREMENT_STATES`` +list in settings. + +.. _retirement-service-user: + +*********************** +Retirement Service User +*********************** + +The user retirement driver scripts authenticate with the LMS and IDAs as the +retirement service user with oauth client credentials. Therefore, to use the +driver scripts, you must create a retirement service user, and generate a DOT +application and client credentials, as in the following command. + +.. code-block:: bash + + app_name=retirement + user_name=retirement_service_worker + ./manage.py lms --settings= manage_user $user_name $user_name@example.com --staff --superuser + ./manage.py lms --settings= create_dot_application $app_name $user_name + +.. note:: + The client credentials (client ID and client secret) will be printed to the + terminal, so take this opportunity to copy them for future reference. You + will use these credentials to configure the driver scripts. For more + information, see :ref:`driver-setup`. + +The retirement service user needs permission to perform retirement tasks, and +that is done by specifying the ``RETIREMENT_SERVICE_WORKER_USERNAME`` variable +in Django settings: + +.. code-block:: python + + RETIREMENT_SERVICE_WORKER_USERNAME = 'retirement_service_worker' + +************ +Django Admin +************ + +The Django admin interface contains the following models under ``USER_API`` +that relate to user retirement. + +.. list-table:: + :widths: 15 30 55 + :header-rows: 1 + + * - Name + - URI + - Description + * - Retirement States + - ``/admin/user_api/retirementstate/`` + - Represents the table of states defined in ``RETIREMENT_STATES`` and + populated with ``populate_retirement_states``. + * - User Retirement Requests + - ``/admin/user_api/userretirementrequest/`` + - Represents the table that tracks the user IDs of every learner who + has ever requested account deletion. This table is primarily used for + internal bookkeeping, and normally isn't useful for administrators. + * - User Retirement Statuses + - ``/admin/user_api/userretirementstatus/`` + - Model for managing the retirement state for each individual learner. + +In special cases where you may need to manually intervene with the pipeline, +you can use the User Retirement Statuses management page to change the +state for an individual user. For more information about how to handle these +cases, see :ref:`handling-special-cases`. + +.. include:: ../../../../links/links.rst diff --git a/scripts/user_retirement/docs/special_cases.rst b/scripts/user_retirement/docs/special_cases.rst new file mode 100644 index 000000000000..ae544c3208c6 --- /dev/null +++ b/scripts/user_retirement/docs/special_cases.rst @@ -0,0 +1,86 @@ +.. _handling-special-cases: + +###################### +Handling Special Cases +###################### + +.. _recovering-from-errored: + +Recovering from ERRORED +*********************** + +If a retirement API indicates failure (4xx or 5xx status code), the driver +immediately sets the user's state to ``ERRORED``. To debug this error state, +check the ``responses`` field in the user's row in +``user_api_userretirementstatus`` (User Retirement Status) for any relevant +logging. Once the issue is resolved, you need to manually set the user's +``current_state`` to the state immediately prior to the state which should be +re-tried. You can do this using the Django admin. In this example, a user +retirement errored during forums retirement, so we manually reset their state +from ``ERRORED`` to ``ENROLLMENTS_COMPLETE``. + +.. digraph:: retirement_states_example + :align: center + + //rankdir=LR; // Rank Direction Left to Right + ranksep = "0.3"; + + edge[color=grey] + + node[fontname=Courier,fontsize=12,shape=box,group=main] + { rank = same INIT[style=invis] PENDING } + { + edge[style=bold,color=black] + INIT -> PENDING; + "..."[shape=none] + PENDING -> RETIRING_ENROLLMENTS -> ENROLLMENTS_COMPLETE -> RETIRING_FORUMS; + } + RETIRING_FORUMS -> FORUMS_COMPLETE -> "..." -> COMPLETE + + node[group=""]; + RETIRING_ENROLLMENTS -> ERRORED; + RETIRING_FORUMS -> ERRORED[style=bold,color=black]; + PENDING -> ABORTED; + + subgraph cluster_terminal_states { + label = "Terminal States"; + labelloc = b // put label at bottom + {rank = same ERRORED COMPLETE ABORTED} + } + + ERRORED -> ENROLLMENTS_COMPLETE[style="bold,dashed",color=black,label=" via django\nadmin"] + +Now, the user retirement driver scripts will automatically resume this user's +retirement the next time they are executed. + +Rerunning some or all states +***************************** + +If you decide you want to rerun all retirements from the beginning, set +``current_state`` to ``PENDING`` for all retirements with ``current_state`` == +``COMPLETE``. This would be useful in the case where a new stage in the user +retirement workflow is added after running all retirements (but before the +retirement queue is cleaned up), and you want to run all the retirements +through the new stage. Or, perhaps you were developing a stage/API that +didn't work correctly but still indicated success, so the pipeline progressed +all users into ``COMPLETED``. Retirement APIs are designed to be idempotent, +so this should be a no-op for stages already run for a given user. + +Cancelling a retirement +*********************** + +Users who have recently requested account deletion but are still in the +``PENDING`` retirement state may request to rescind their account deletion by +emailing or otherwise contacting the administrators directly. edx-platform +offers a Django management command that administrators can invoke manually to +cancel a retirement, given the user's email address. It restores a given +user's login capabilities and removes them from all retirement queues. The +syntax is as follows: + +.. code-block:: bash + + $ ./manage.py lms --settings= cancel_user_retirement_request + +Keep in mind, this will only work for users which have not had their retirement +states advance beyond ``PENDING``. Additionally, the user will need to reset +their password in order to restore access to their account.