Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce pa11y runtimes #3829

Merged
merged 34 commits into from
May 15, 2024
Merged

Reduce pa11y runtimes #3829

merged 34 commits into from
May 15, 2024

Conversation

caleywoods
Copy link
Contributor

@caleywoods caleywoods commented May 2, 2024

Pull request summary

Attempts to address #3752.

We're currently running pa11y scans using the sitemap which scans all of the URLs on the site which results in slowness. These changes use a two pronged approach to reducing the pa11y runtimes.

The Jekyll plugin

The first part is a Jekyll plugin that uses Jekyll hooks (:documents, :pages, and :site currently) to output a file named pa11y_targets to the filesystem during the build process. The plugin has the following behavior:

  • When changes are made to a post or to any file in a user defined collection, those will be picked up by the :documents Jekyll hook and be included in the list of files to be scanned by pa11y. This takes care of the majority of file changes
  • If a change is made to a file in the pages/ directory, the :pages Jekyll hook captures those to be included in the list of files to be scanned by pa11y (I'm not sure why :documents doesn't get these)
  • If a layout is changed the plugin determines what files use that layout and includes all of them to be scanned by pa11y
  • If a change is made to a "site-wide" file in assets, _includes, or _sass, the plugin does a sampling of pages across the site. These pages are: the homepage, three files that use each layout from _config.yml's defaults, as well as the first, middle, and last blog archive pages (/blog/page/:num)
  • All files that are considered changed are written to the pa11y_targets file and each path is separated by a newline

There's RSpec tests to go along with this plugin.

The CircleCi config

The second part is a tweak to .circleci/config.yml when the Jekyll build step runs. This change does two things:

  • It makes the contents of the pa11y_targets file be base64 encoded and exported into an environment variable called PA11Y_TARGETS in CircleCI's $BASH_ENV area which is stateful between job steps
  • Two job steps later when pa11y is initiated, it reads the list of files to scan out of the previously stored PA11Y_TARGETS environment variable

Results

Without these changes, the full build takes around 25 minutes in CircleCI with the pa11y scan representing the bulk of that time taking around 20-21 minutes. Testing these changes showed CircleCI full builds in under 4 minutes with the pa11y scan portion being reduced to < 10 seconds in most cases. These are big gains for the common scenario of single file changes when someone is creating or editing a blog post.

This could still have some long running build times/pa11y scans depending on what files are touched. An example would be _layouts/post.html, when I checked locally, changing that particular layout resulted in pa11y scanning a little over 500 files.

This work attempts to solve the problem of overly long CI runs, in which single PRs were taking 25+ minutes, because each PR scans every page with pa11y-ci. See #3752.

This commit adds a Jekyll::Hook which keeps track of which destination files (built HTML files) should be checked with pa11y-ci for each PR. The files to check are based on which files changed in the last commit.

The design of the code lets you swap out different "differs", which are the objects responsible for getting the list of files. This design makes the code easier to test, and makes it more adaptable.

One problem I'm seeing so far is that we only check what files were changed *in the last commit*. This means that, if you're ever running CI locally, you'd have to commit your changes to have them picked up by the "differ". We should probably change this in a future commit, and have a "differ" for local development and a "differ" for CI. To summarize, the differs should differ.

We also have yet to implement the case where a stylesheet or something site-wide changes.
This commit adds site-wide pa11y checking. If a "global" folder (like _includes or _sass) sees a file change, we add a few files from several folders across the site, and include them in the list of files for pa11y to check.

This also switches file "differs" based on the environment. If CI=true, then we diff the files from the last commit. Otherwise, we assume we're developing locally and get the files currently changed.

This feels like draft code, the design feels off. There's constants set at file load/runtime, and with the new class involved, the file is a lot harder to read.
Fixes design issues in the previous commit.

This commit:

- Abstracts and simplifies the page-identification strategy, reducing
  code about special cases
- Makes it easier to test, now that file paths can be injected
- Adds minimal tests
@caleywoods caleywoods requested a review from a team as a code owner May 2, 2024 20:14
@caleywoods caleywoods force-pushed the caley/circle-ci-experiments branch from 54b7a2c to 1e8d18a Compare May 2, 2024 23:11
@caleywoods caleywoods changed the title [WIP] Testing circleci environment variables [WIP] Reduce pa11y runtimes May 14, 2024
@caleywoods caleywoods requested a review from cantsin May 15, 2024 13:19
@caleywoods caleywoods changed the title [WIP] Reduce pa11y runtimes Reduce pa11y runtimes May 15, 2024
@caleywoods
Copy link
Contributor Author

Tagging @beechnut - I know you're not on TLC crew quite yet but you did a lot of great work on this and wanted to make sure you had a chance to look at it.

_plugins/pa11y.rb Outdated Show resolved Hide resolved
@caleywoods
Copy link
Contributor Author

I also wonder if it's detrimental to never run a full pa11y scan. Should we consider adding a scheduled GitHub action to run a full scan maybe every two weeks?

_plugins/pa11y.rb Outdated Show resolved Hide resolved
@beechnut
Copy link
Contributor

@caleywoods Yes, part of the scope of work is to add a nightly full scan. We shouldn't remove full scans from the pipeline without replacing them beforehand or in the same PR.

Copy link
Member

@cantsin cantsin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@caleywoods
Copy link
Contributor Author

@caleywoods Yes, part of the scope of work is to add a nightly full scan. We shouldn't remove full scans from the pipeline without replacing them beforehand or in the same PR.

I've opened #3832 to track this and I plan on working on it this week.

@caleywoods caleywoods merged commit 56e2843 into main May 15, 2024
5 checks passed
@caleywoods caleywoods deleted the caley/circle-ci-experiments branch May 15, 2024 21:39
This was referenced May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants