Merge pull request juju#15544 from barrettj12/docs-check

juju#15544 juju#15406 introduced a Docs workflow which syncs CLI docs to Discourse on `push` to `develop`. No feedback was provided on PRs - so merging a PR could potentially cause this workflow to fail without warning. This PR adds a pre-merge check on PRs to ensure that the doc sync will succeed after merging. The `discourse-sync` script has been majorly refactored, and now has four commands: - `check`: ensures that each doc has an entry in the `$TOPIC_IDS` yaml file, and that the corresponding topic exists on Discourse. - `sync`: syncs the generated documentation to Discourse. - `create <doc-names> ...`: create new Discourse topics for the provided docs, and add the new topic IDs to the `$TOPIC_IDS` file. - `delete`: delete all topics in the `$TOPIC_IDS` file. The Docs workflow has been updated to match. The `check` command will now be run on `pull_request`, while `sync` will be run on `push`. If a PR adds a new Juju command, then the `check` command will find that there is no corresponding topic ID, and fail. This will warn the PR author and allow them to manually create the new topic (using the `create` command) and check in the changes.
manadart · May 2, 2023 · b27ec03 · b27ec03
2 parents e2c7b4c + edccf70
commit b27ec03
Show file tree

Hide file tree

Showing 3 changed files with 215 additions and 84 deletions.
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -3,6 +3,9 @@ on:
   push:
     branches:
       - 'develop'
+  pull_request:
+    branches:
+      - 'develop'
   workflow_dispatch:
 
 jobs:
@@ -36,11 +39,22 @@ jobs:
           juju documentation --split --out=$DOCS_DIR --discourse-ids $TOPIC_IDS
         # TODO: save $DOCS_DIR as an artifact
 
+      - name: pip install requirements
+        run: |
+          python3 -m pip install -r ./scripts/discourse-sync/requirements.txt
+
+      - name: Check docs on Discourse
+        if: github.event_name == 'pull_request'
+        env:
+          DOCS_DIR: ${{ steps.gen.outputs.dir }}
+        run: |
+          python3 ./scripts/discourse-sync/main.py check
+
       - name: Sync docs to Discourse
+        if: github.event_name == 'push'
         env:
           DISCOURSE_API_USERNAME: ${{ secrets.DISCOURSE_API_USERNAME }}
           DISCOURSE_API_KEY: ${{ secrets.DISCOURSE_API_KEY }}
           DOCS_DIR: ${{ steps.gen.outputs.dir }}
         run: |
-          python3 -m pip install -r ./scripts/discourse-sync/requirements.txt
-          python3 ./scripts/discourse-sync/main.py
+          python3 ./scripts/discourse-sync/main.py sync
diff --git a/scripts/discourse-sync/README.md b/scripts/discourse-sync/README.md
@@ -1,26 +1,28 @@
 # discourse-sync
 
-This Python script is used to sync our CLI docs to Discourse using the output
-of the `juju documentation` command. It requires the following environment
-variables to be set:
+This Python script syncs Markdown docs to Discourse.
+
+## Commands
+
+| Command                  | Description                                                                                                                                                   |
+|--------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `check`                  | Check that all docs in `$DOCS_DIR` have a corresponding entry in the `$TOPIC_IDS` file, and that this topic exists on Discourse. To be run on a pull request. |
+| `sync`                   | Sync all docs in `$DOCS_DIR` to Discourse. To be run on each commit to the main branch.                                                                       |
+| `create <doc-names> ...` | Create new topics for the provided doc names, and update the `$TOPIC_IDS` file.                                                                               |
+| `delete`                 | Delete all topics with IDs listed in the `$TOPIC_IDS` file.                                                                                                   |
 
-| Variable name            | Description                                                                                                                                                               |
-|--------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `DISCOURSE_HOST`         | URL for the Discourse server to sync docs to. If not set, this defaults to `https://discourse.charmhub.io/`.                                                              |
-| `DISCOURSE_API_USERNAME` | Username to use for Discourse API requests. Must be a user with access to the API key provided in `DISCOURSE_API_KEY`. Use your own Discourse username if running locally. |
-| `DISCOURSE_API_KEY`      | [API key](https://meta.discourse.org/t/create-and-configure-an-api-key/230124) for accessing the Discourse server's API.                                                  |
-| `DOCS_DIR`               | Path to a directory containing Markdown files to sync (i.e. the argument provided to the `--out` flag of `juju documentation`).                                           |
-| `POST_IDS`               | Path to a YAML file mapping each doc name to its post ID on Discourse.                                                                                                    |
-| `CI`                     | Set to `true` if this workflow is running in CI.                                                                                                                           |                                                                                                                          |
 
-When we discover a doc with no corresponding entry in the `POST_IDS` file, one
-of two things can happen, depending on whether we are running the script inside
-a CI workflow:
-- When running in CI (i.e. `$CI == 'true'`), we can't make persistent changes
-  to the `POST_IDS` file, so log a warning for now, and exit with a nonzero
-  return status at the end.
-- When running locally (i.e. `CI` is unset), we create a new Discourse post for
-  the doc, and add the URL as a new entry in the `POST_IDS` file.
+## Configuration
+
+The script can be configured using the following environment variables:
+
+| Variable name            | Description                                                                                                                                                                |
+|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `DISCOURSE_HOST`         | URL for the Discourse server to sync docs to. If not set, this defaults to `https://discourse.charmhub.io/`.                                                               |
+| `DISCOURSE_API_USERNAME` | Username to use for Discourse API requests. Must be a user with access to the API key provided in `DISCOURSE_API_KEY`. Use your own Discourse username if running locally. |
+| `DISCOURSE_API_KEY`      | [API key](https://meta.discourse.org/t/create-and-configure-an-api-key/230124) for accessing the Discourse server's API.                                                   |
+| `DOCS_DIR`               | Path to a directory containing Markdown files to sync (i.e. the argument provided to the `--out` flag of `juju documentation`).                                            |
+| `TOPIC_IDS`              | Path to a YAML file mapping each doc name to its topic ID on Discourse.                                                                                                    |
 
 
 ## Suggested usage
@@ -31,6 +33,6 @@ export DISCOURSE_API_KEY=[api-key]
 export DOCS_DIR=./docs
 export TOPIC_IDS=./.github/discourse-topic-ids.yaml
 
-juju documentation --split --out=$DOCS_DIR --no-index --discourse-ids $TOPIC_IDS
-python3 ./scripts/discourse-sync/main.py
+juju documentation --split --out=$DOCS_DIR --discourse-ids=$TOPIC_IDS # --no-index 
+python3 ./scripts/discourse-sync/main.py sync
 ```
diff --git a/scripts/discourse-sync/main.py b/scripts/discourse-sync/main.py
@@ -2,99 +2,214 @@
 import sys
 import yaml
 from pydiscourse import DiscourseClient
+from pydiscourse.exceptions import DiscourseClientError
+
+
+# Get configuration from environment variables
+DISCOURSE_HOST = os.environ.get('DISCOURSE_HOST', 'https://discourse.charmhub.io/')
+DISCOURSE_API_USERNAME = os.environ.get('DISCOURSE_API_USERNAME')
+DISCOURSE_API_KEY = os.environ.get('DISCOURSE_API_KEY')
+DOCS_DIR = os.environ.get('DOCS_DIR')
+TOPIC_IDS = os.environ.get('TOPIC_IDS')
+
+client = DiscourseClient(
+    host=DISCOURSE_HOST,
+    api_username=DISCOURSE_API_USERNAME,
+    api_key=DISCOURSE_API_KEY,
+)
 
 
 def main():
-    # Get configuration from environment variables
-    DISCOURSE_HOST = os.environ.get('DISCOURSE_HOST', 'https://discourse.charmhub.io/')
-    DISCOURSE_API_USERNAME = os.environ.get('DISCOURSE_API_USERNAME')
-    DISCOURSE_API_KEY = os.environ.get('DISCOURSE_API_KEY')
-    DOCS_DIR = os.environ.get('DOCS_DIR')
-    TOPIC_IDS = os.environ.get('TOPIC_IDS')
-    CI = os.environ.get('CI')
-
-    client = DiscourseClient(
-        host=DISCOURSE_HOST,
-        api_username=DISCOURSE_API_USERNAME,
-        api_key=DISCOURSE_API_KEY,
-    )
+    if len(sys.argv) < 1:
+        sys.exit('no command provided, must be one of: check, sync, create, delete')
+
+    command = sys.argv[1]
+    if command == 'check':
+        check()
+    elif command == 'sync':
+        sync()
+    elif command == 'create':
+        create()
+    elif command == 'delete':
+        delete()
+    else:
+        exit(f'unknown command "{command}"')
+
+
+def check():
+    """
+    Check all docs in the DOCS_DIR have a corresponding entry in the TOPIC_IDS
+    file, and that the corresponding topic exists on Discourse.
+    """
+    topic_ids = get_topic_ids()
+    no_topic_id = []
+    no_discourse_topic = []
+
+    for entry in os.scandir(DOCS_DIR):
+        if not is_markdown_file(entry):
+            print(f'entry {entry.name}: not a Markdown file: skipping')
+            continue
+
+        doc_name = removesuffix(entry.name, ".md")
+
+        if doc_name not in topic_ids:
+            print(f'doc {doc_name}: no topic ID found')
+            no_topic_id.append(doc_name)
+            continue
+
+        topic_id = topic_ids[doc_name]
+        print(f'doc {doc_name} (topic #{topic_id}): checking topic on Discourse')
+        try:
+            client.topic(
+                slug='',
+                topic_id=topic_ids[doc_name],
+            )
+        except DiscourseClientError:
+            print(f'doc {doc_name} (topic #{topic_id}): not found on Discourse')
+            no_discourse_topic.append(doc_name)
+
+    if no_topic_id:
+        print(f"The following docs don't have corresponding entries in {TOPIC_IDS}.")
+        print(f"Please create new Discourse topics for them, and add the new topic IDs to {TOPIC_IDS}.")
+        for doc_name in no_topic_id:
+            print(f' - {doc_name}')
+
+    if no_discourse_topic:
+        print("The following docs don't have corresponding topics on Discourse.")
+        print(f"Please create new Discourse topics for them, and update the topic IDs in f{TOPIC_IDS}.")
+        for doc_name in no_discourse_topic:
+            print(f' - {doc_name} (topic #{topic_ids[doc_name]})')
+
+    if no_topic_id or no_discourse_topic:
+        sys.exit(1)
 
-    with open(TOPIC_IDS, 'r') as file:
-        topic_ids = yaml.safe_load(file)
-        if topic_ids is None:
-            topic_ids = {}
 
-    # Keep track of docs which don't have a matching topic ID.
-    # If any are found while running in CI, we need to fail and ask the
-    # developer to manually intervene.
-    not_found_docs = []
+def sync():
+    """
+    Sync all docs in the DOCS_DIR with their corresponding topics on Discourse.
+    """
+    topic_ids = get_topic_ids()
+    couldnt_sync = {}  # doc_name -> reason
 
     for entry in os.scandir(DOCS_DIR):
         if not is_markdown_file(entry):
-            print(f'file {entry.name}: not a Markdown file: skipping')
+            print(f'entry {entry.name}: not a Markdown file: skipping')
             continue
 
         doc_name = removesuffix(entry.name, ".md")
         content = open(entry.path, 'r').read()
 
-        if topic_ids and doc_name in topic_ids:
-            print(f'doc {doc_name} (topic #{topic_ids[doc_name]}): checking for changes')
+        if doc_name not in topic_ids:
+            couldnt_sync[doc_name] = 'no topic ID in yaml file'
+            continue
+
+        topic_id = topic_ids[doc_name]
+        print(f'doc {doc_name} (topic #{topic_id}): checking for changes')
+        try:
             # API call to get the post ID from the topic ID
             # TODO: we could save the post IDs in a separate yaml file and
             #   avoid this extra API call
             topic = client.topic(
                 slug='',
-                topic_id=topic_ids[doc_name],
+                topic_id=topic_id,
             )
-            post_id = topic['post_stream']['posts'][0]['id']
+        except DiscourseClientError:
+            couldnt_sync[doc_name] = f'no topic with ID #{topic_id} on Discourse'
+            continue
 
-            # Get current contents of post
+        post_id = topic['post_stream']['posts'][0]['id']
+        # Get current contents of post
+        try:
             post2 = client.post_by_id(
                 post_id=post_id
             )
-            current_contents = post2['raw']
-            if current_contents == content.rstrip('\n'):
-                print(f'doc {doc_name} (topic #{topic_ids[doc_name]}): already up-to-date: skipping')
-                continue
+        except DiscourseClientError as e:
+            couldnt_sync[doc_name] = f"couldn't get post for topic ID #{topic_id}: {e}"
+            continue
 
-            # Update Discourse post
-            print(f'doc {doc_name} (topic #{topic_ids[doc_name]}): updating')
+        current_contents = post2['raw']
+        if current_contents == content.rstrip('\n'):
+            print(f'doc {doc_name} (topic #{topic_ids[doc_name]}): already up-to-date: skipping')
+            continue
+
+        # Update Discourse post
+        print(f'doc {doc_name} (topic #{topic_ids[doc_name]}): updating')
+        try:
             client.update_post(
                 post_id=post_id,
                 content=content,
             )
+        except DiscourseClientError as e:
+            couldnt_sync[doc_name] = f"couldn't update post with ID #{post_id}: {e}"
+            continue
 
-        elif CI == 'true':
-            # Running in CI - we can't edit the TOPIC_IDS yaml file.
-            # Log a warning and return a non-zero exit code later
-            print(f'WARNING: no topic ID found for doc {doc_name}')
-            not_found_docs.append(doc_name)
-
-        else:
-            # Create new Discourse post
-            print(f'doc {doc_name}: no topic ID found: creating new post')
-            post = client.create_post(
-                title=post_title(doc_name),
-                category_id=22,
-                content=content,
-                tags=['olm', 'autogenerated'],
-            )
-            new_topic_id = post['topic_id']
-            print(f'doc {doc_name}: created new topic #{new_topic_id}')
-
-            # Save topic ID in yaml map for later
-            topic_ids[doc_name] = new_topic_id
-            with open(TOPIC_IDS, 'w') as file:
-                yaml.safe_dump(topic_ids, file)
-
-    if len(not_found_docs) > 0:
-        print("The following docs don't have corresponding entries in the TOPIC_IDS yaml file. Please run "
-              "scripts/discourse-sync locally to create new topics and update the topic IDs.")
-        for doc_name in not_found_docs:
-            print(f' - {doc_name}')
+    if len(couldnt_sync) > 0:
+        print("Failed to sync the following docs:")
+        for doc_name, reason in couldnt_sync.items():
+            print(f' - {doc_name}: {reason}')
         sys.exit(1)
 
 
+def create():
+    """
+    Create new Discourse topics for each doc name provided.
+    """
+    topic_ids = get_topic_ids()
+    docs = sys.argv[2:]
+
+    for doc_name in docs:
+        if doc_name in topic_ids:
+            print(f'skipping doc {doc_name}, it already has a topic ID')
+            continue
+
+        path = os.path.join(DOCS_DIR, doc_name+'.md')
+        try:
+            content = open(path, 'r').read()
+        except OSError as e:
+            print(f"couldn't open {path}: {e}")
+            continue
+
+        # Create new Discourse post
+        print(f'creating new post for doc {doc_name}')
+        post = client.create_post(
+            title=post_title(doc_name),
+            category_id=22,
+            content=content,
+            tags=['olm', 'autogenerated'],
+        )
+        new_topic_id = post['topic_id']
+        print(f'doc {doc_name}: created new topic #{new_topic_id}')
+
+        # Save topic ID in yaml map for later
+        topic_ids[doc_name] = new_topic_id
+        with open(TOPIC_IDS, 'w') as file:
+            yaml.safe_dump(topic_ids, file)
+
+
+def delete():
+    """
+    Delete all Discourse topics in the TOPIC_IDS file.
+    """
+    topic_ids = get_topic_ids()
+
+    for doc_name, topic_id in topic_ids.items():
+        print(f'deleting doc {doc_name} (topic #{topic_id})')
+        client.delete_topic(
+            topic_id=topic_id
+        )
+
+        # Update topic ID yaml map
+        del topic_ids[doc_name]
+        with open(TOPIC_IDS, 'w') as file:
+            yaml.safe_dump(topic_ids, file)
+
+
+def get_topic_ids():
+    with open(TOPIC_IDS, 'r') as file:
+        topic_ids = yaml.safe_load(file)
+        return topic_ids or {}
+
+
 def is_markdown_file(entry: os.DirEntry) -> bool:
     return entry.is_file() and entry.name.endswith(".md")