This document lists principles that guided Tast's design and that should be kept in mind when considering future changes.
[TOC]
Iteratively running tests (regardless of whether one is adding or modifying tests, trying to reproduce a failure, or verifying that a system change has fixed a failing test) should be fast from the perspective of a developer waiting for the command to complete.
- Operations should not take additional time to complete due to the choice of programming language.
- Building and deploying tests and associated data must be fast.
emerge
adds ten seconds or more of overhead even when building a C++ program with no dependencies and an emptymain()
function and shouldn't be part of the build/deploy/run cycle in its present form. - The test system's overhead should be minimized. Nothing should be copied to the DUT when the test hasn't changed. All communication with the DUT should happen over a single persistent SSH connection, and round trips should be minimized on the critical path — otherwise, network latency kills performance when running tests on a DUT in a different geographical location.
- Developers shouldn't need to edit tests on-device in order to iterate quickly. ChromeOS systems do not make for pleasant development environments. (They may support pleasant development environments within VMs, but that doesn't help for testing.)
- Running a test shouldn't result in code being compiled. If a test needs additional executables to be installed on the DUT, then those executables should already be present in the system image. We already have a packaging system; use it.
- Information about a test (e.g. its inclusion in a suite) should be available without needing to evaluate hundreds of scripts. Don't emulate a declarative language using an imperative interpreted language.
- Minimize the number of moving pieces when a test is run on a DUT. The framework, and tests themselves, should do everything in their power to avoid operations that might fail. Avoid runtime dependencies on external resources like databases, websites, and other network services.
- Minimize boilerplate. For example, test names shouldn't appear repeatedly in
the source (e.g. directory names, control files, filenames, test
implementations,
.ebuild
files). We'd frown if we saw the same lengthy string constant repeated five or more times in a C++ program. In cases where repetition is unavoidable, there should be automatic checks that the names are consistent in all locations. - Developers shouldn't need to know the specifics of how the test system is
integrated into ChromeOS. In the common case, they shouldn't need to edit
.ebuild
files when adding a test, runcros_workon
when making changes, or set USE flags or build and deploy packages to run tests.
- A given run's output directory should be structured in a way that is easy to navigate.
- Logs must be easy to read. The default log level should include messages that describe what's happening at any given time (e.g. no radio silence while the test is running on a remote host: see issue 715865), but no non-fatal warnings and errors. A separate log file should be written with full verbose output, and it should be trivial for both machines and humans to find the overall pass/fail status of all tests and the verbose output from an individual test.
- Errors should be passed back to the top level of the test and logged there. When fatal errors are reported from deep in support libraries, test results are often difficult to interpret due to the lack of context present in the errors.
- Detailed timing information should be written in a format readable by both humans and machines to make it easy to see why a test run was slow and track long-term performance trends.
- System log information generated by the DUT while tests were running should be captured. It should be easy to compare timestamps in test results to timestamps from the DUT's system logs, even in the presence of clock skew.
- The framework should focus on running tests. Tasks like allocating DUTs and scheduling tests on them, reimaging or repairing DUTs, and displaying and archiving test results belong elsewhere.
- There should be a clear separation between code that's used by tests and code that runs on developers' workstations or bots to deploy and run tests.
- Avoid magic. Code that spells out what it's doing is easier to debug than
code that relies on action at a distance (e.g. overriding
__getattr__
or usingsetattr
to dynamically set attributes in Python). Make code easy to trace unless there's an extremely compelling reason to do something fancy. - Avoid making test libraries ornate. Nobody wants to puzzle their way through complicated object hierarchies while trying to debug a failing test.
- The code that supports tests must itself be thoroughly covered by unit tests.
- Make it easy to disable a broken test until it can be fixed by its owners.
Don't overwhelm developers with choices.
- Keep logging simple. There should be one way to report test failures and one way to log informative messages. Don't permit non-fatal "warning"-level errors, as nobody does anything about them and they end up permanently cluttering logs.
- Tests should be straightforward to read. Instead of distributing work across
superclasses and overridden methods with non-obvious semantics (e.g.
initialize()
,setup()
,warmup()
,run_once()
,postprocess()
), implement each test in a single function, with initialization appearing at the beginning and teardown happening at exit (per language affordances).