-
Notifications
You must be signed in to change notification settings - Fork 1
Overview
Focstest lets you check your homework for Olin's Foundations of Computer Science (FoCS) course automatically.
The name is a portmanteau of "doctest" and FoCS, because the FoCS homework problems are presented similar to Python's doctests, with the OCaml REPL input and expected output:
Code a function linear of type
float -> float -> float -> float -> float -> float
wherelinear a b x y c
takes the valuec
which should live in the interval[a, b]
and rescales it so that it lives at the "corresponding" place in the interval[x, y]
. This is called linear scaling. So ifc
is the midpoint of[a, b]
, thenlinear a b x y c
should return the midpoint of[x, y]
; ifc
is a quarter of the way froma
tob
, thenlinear a b x y c
is a quarter of the way fromx
toy
; and so on.If
a
andb
are the same, raise an exception usingfailwith
. (Why?)Sample output:
# linear 0. 1. 50. 100. 0.;; - : float = 50. # linear 0. 1. 50. 100. 1.;; - : float = 100. # linear 50. 100. 0. 1. 75.;; - : float = 0.5 # linear 50. 100. 0. 1. 80.;; - : float = 0.6 # linear 50. 100. 0. 1. 110.;; Exception: Failure "value out of range".
Function stubs for each question are provided in an OCaml file:
(* Question 1 *)
let linear (a:float) (b:float) (x:float) (y:float) (c:float):float =
failwith "Not implemented"
The students complete the functions to meet the requirements and tests from the webpage:
(* Question 1 *)
let linear (a:float) (b:float) (x:float) (y:float) (c:float):float =
if c < a || c > b then failwith "value out of range" else
(* provide a better error than Division_by_zero *)
if a = b then failwith "input range of zero; a and b are equal" else
let ab_range = b -. a
and xy_range = y -. x
and relative_c = c -. a
in
relative_c /. ab_range *. xy_range +. x
Focstest parses the sample outputs from the webpage and runs your in-progress homework solutions against them, comparing the outputs and checking for errors.
It began as a script I made while a FoCS student, and later as a course assistant. At the time I didn't know how to write "real" programs in OCaml that interfaced with libraries and IO, so I reached for Python because I was most comfortable with it. Unfortunately, using another language to run the OCaml REPL presents some challenges that are discussed more in Running the Tests.
At a high-level, there are three phases of work handled by three functions:
- Fetching the webpage:
get_html
- Parsing the tests:
parse_html_tests
- Running the tests:
run_test
(and the classPrintingTestRunner
)
The interactions between the student, Focstest, FoCS website, and the OCaml interpreter are diagrammed below.
First, the webpage that contains the tests and questions must be fetched.
Each homework's url looks like https://rpucella.net/courses/focs-sp22/homeworks/1/index.html and the template OCaml files look like homework1.ml
.
To save the user from passing the long url for the homework for each invocation of focstest
, the homework number is parsed from the OCaml file and used to construct the url.
The user can specify a url in cases where the parsing fails or naming pattern differs.
To cut down on repeated network requests, the url's contents are saved to a temporary cache directory with a default maximum age of 30 minutes. If the user does not have a network connection, and a cached version of any age exists, the cached version is used.
This caching logic is implemented in the fetch_url
function and described visually below:
Alternatively, a local HTML file can be specified and used directly.
While doctest
needs to parse tests out of arbitrary Python docstrings, the FoCS tests are already separated into <pre><code></code></pre>
HTML elements.
The hard work of extracting the code block elements is done with the beautifulsoup
library.
To parse the test inputs, the code block text is passed to a multi-line regex that matches on the REPL prompt #
and the expression terminator ;;
.
The lines of text following each parsed input are parsed as the expected output of the tests.
Depending on the test, the expected output may be the value and type signature of an expression, a raised exception, or printed output.
For each test, the OCaml interpreter binary (ocaml
) is executed and passed a statement to load the homework file (#use "homework1.ml";;
) with the test input, e.g. 1 + 2;;
.
After waiting for the process to finish with a suitable timeout, the output is split on the interpreter prompt (#
) to separate the output of the REPL startup, file loading, and the test itself (e.g. - : int = 3
).
The file and test outputs are scanned for exceptions or errors.
If loading the homework file returns errors, the test results may be invalid, so an OcamlFileError
exception is raised that stops the program and alerts the user to the issue.
Errors returned from test input may mean several things:
- the test checks that an exception is raised
- the test demonstrates a syntax or type error
- there is an error in code called by the test
- the test calls an incomplete function stub that raises a "Not implemented" exception.
In some of those cases the output of the test is valid, so the parsed error is returned as a value alongside the test output, to be inspected later.
The actual loop of test running is done with a class called BaseTestSuiteRunner
, which handles running multiple groups of tests ("suites") and tracking their results.
This class is extended by PrintingTestRunner
, which handles printing the colored output of tests and suites to the terminal.
The current design runs the ocaml
interpreter binary for each test, loading the homework file before running the test input.
This method works consistently, isolates any problems with parsing output to an individual test case, and meets a variety of goals:
- runs cross-platform (students commonly use Linux, Windows, and MacOS)
- detects errors from the loaded homework file
- evaluate with a timeout to catch infinite loops and long-running tests
On the other hand, using a new OCaml environment for each test has two big downsides:
- Increased run time (it can take 50 to 200 milliseconds to load an unimplemented homework file), running 80 empty tests can take 10 seconds.
- Tests that depend on a constant or function defined in a previous test will fail.
I tried to mitigate the first problem by letting the user select which suites to run, and skipping suites when the first test raises a "Not implemented" error. Running several processes at once could further improve the speed, but hasn't been attempted.
I don't have a mitigation for the second issue, but the provided tests are rarely structured that way.
From what I've researched, there are two potential solutions that still meet most of the original goals:
Python's asyncio.subprocess
module allows cross-platform reading/writing of a sub-process's stdout/stdin with a timeout.
This would mean that a single OCaml process could be created, with each test output read as it's evaluated instead of at the end.
The subprocess
library does technically allow reading and writing directly to stdin/stdout, but doing so correctly to avoid pipe blocking and allow for reading with a timeout would be complicated.
As an example, the implementation of Popen.communicate
is nontrivial and has different implementations for Windows and Unix, whereas asyncio's Process.communicate
is pretty straightforward.
Since originally writing Focstest, I've learned more about writing OCaml programs. There are ways to interface directly with the interpreter, but it would still involve some text parsing of the output and wouldn't allow adding timeouts from what I understand.