Support incremental checking #9

yuvipanda · 2018-06-22T22:17:16Z

Individual 'check' functions run tests against the state of the environment as it was when the user ran the check code. However, grade_notebook runs tests against the environment as it was when all the user code has been written. This has caused a lot of subtle, hard to debug errors.

For example, the following code works ok:

a = 5

check('tests/q1.py')

a = 10

check('tests/q2.py')

Assuming q1.py tests if a is 5 and q2 tests if a is 10.

However, if you add a 'grade_notebook' at the end, it'll always fail for q1, since a will be ten.

Re-assignment of variables used in tests will always fail. We could ask users to be careful about it, but that seems like an unnecessary restriction. Both nbgrader & okpy allow users to do incremental checking, so gradememaybe should also allow this.

okpy does this by using an object that collects grades. nbgrader does this by keeping tests directly in the notebook.

The approach I'd like to take here instead is to use AST rewriting to find all the 'check' calls and rewrite them to something else when grading!

The code above will be rewritten to:

a = 5

check_results_abcdefg.append(check_abcdefg('tests/q1.py'))

a = 10

check_results_abcdefg.append(check_abcdefg('tests/q2.py'))

check_results_abcdefg is a list (or other object) that can collect TestResult objects generated by check_abcdefg. A randomly generated unique suffix is added to the check functions to make it harder (but not impossible!) for students to cheat by simply redefining a 'check' function. The check_results_abcdefg object and check_abcdefg function will be injected by the grading function as well. After all the code has been evaluated, check_results_abcdefg can be evaluated to find the final grade.

This imposes the following constraints on test notebook authors:

Check cells are empty other than check calls (comments are allowed!). We execute code cell by cell, and this ensures that check functions aren't skipped due to errors
Check cells shouldn't be deleted by students. This should be enforced with a notebook extension outside the scope of gradememaybe.

There should be a linter that validates these.

The text was updated successfully, but these errors were encountered:

choldgraf · 2018-06-22T23:32:13Z

Have you considered using tags in a cell to correspond to tests in the test suite? E.g. the grade_notebooks would:

Look in the ok file if any of the tests are linked against a specific tag.
If it finds one, also check if one and only one cell in the notebook has this tag.
If so, run all the cells up to the tagged cell and then run the equivalent of "check".
Repeat this for any other "tagged" tests (so multiple tests can have the same tag, but only one cell per notebook can have this tag)
For any tests that don't have tags, run them after all cells in the notebook are run.

yuvipanda · 2018-06-24T20:58:15Z

Interesting other option, @choldgraf! Thank you for bringing it up!

I spent some time considering it, and it led me to write up a design doc (similar to what we have for r2d) https://github.com/grading/gradememaybe/blob/master/docs/design.md. It has some info on why I don't like relying on notebook metadata for critical functions. Your thoughts on that would be most welcome :)

yuvipanda · 2018-06-24T21:01:51Z

I also realized that one of the restrictions I have is unnecessarily strict:

Check cells shouldn't be deleted by students. This should be enforced with a notebook extension outside the scope of gradememaybe.

The actual restriction should be:

When grading, we will provide the grader with a list of questions to be graded. In the student assignment, there should be at least one check call for each of these questions.

This is easier to check & enforce, and does not require any external tooling for enforcing integrity. Students can be prevented from accidentally deleting check cells by marking them readonly, which is a feature built into most notebook frontends.

yuvipanda · 2018-06-24T21:11:06Z

I also wanted to provide more historical context here, to explain my thought process a bit more.

Relying on notebook metadata is the approach taken by nbgrader. It works great for many use cases, and is extremely flexible. However, this has two major disadvantages:

You can't use this with plain .py files easily (AFAIK)
You can't rely on the file metadata that comes back from the user, since they could have modified it to be anything! Fixing this requires building a bunch of integrity checking mechanisms - nbgrader has checksums & requires a database for this reason. This adds required state to the system, and I hate state!

It's not entirely clear to me the approach okpy takes with notebooks, but my understanding of it is:

The equivalent of 'check' calls write things to stdout
There is a script that parses things from stdout and makes grades.

This has a bunch of problems, primarily:

String parsing is hard, complex and avoidable
Not sure, but students can easily game this by modifying stdout and replacing the check functions with whatever they want.
Writing things to stdout is not the best student experience when using notebooks - we can provide rich outputs, not just strings in monospace font.

AST rewriting (which is what I hope to implement) is actually stolen from pytest. It's cons will become more apparent once we actually implement it :)

I also <3 okpy & nbgrader, and they both have a lot of pros I have not listed here :)

papajohn · 2019-01-14T23:32:12Z

I thought @vipasu told me that he had addressed this issue. Is that true? If so, can we close it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support incremental checking #9

Support incremental checking #9

yuvipanda commented Jun 22, 2018 •

edited

Loading

choldgraf commented Jun 22, 2018

yuvipanda commented Jun 24, 2018

yuvipanda commented Jun 24, 2018

yuvipanda commented Jun 24, 2018 •

edited

Loading

papajohn commented Jan 14, 2019

Support incremental checking #9

Support incremental checking #9

Comments

yuvipanda commented Jun 22, 2018 • edited Loading

choldgraf commented Jun 22, 2018

yuvipanda commented Jun 24, 2018

yuvipanda commented Jun 24, 2018

yuvipanda commented Jun 24, 2018 • edited Loading

papajohn commented Jan 14, 2019

yuvipanda commented Jun 22, 2018 •

edited

Loading

yuvipanda commented Jun 24, 2018 •

edited

Loading