Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support incremental checking #9

Open
yuvipanda opened this issue Jun 22, 2018 · 5 comments
Open

Support incremental checking #9

yuvipanda opened this issue Jun 22, 2018 · 5 comments

Comments

@yuvipanda
Copy link
Contributor

yuvipanda commented Jun 22, 2018

Individual 'check' functions run tests against the state of the environment as it was when the user ran the check code. However, grade_notebook runs tests against the environment as it was when all the user code has been written. This has caused a lot of subtle, hard to debug errors.

For example, the following code works ok:

a = 5

check('tests/q1.py')

a = 10

check('tests/q2.py')

Assuming q1.py tests if a is 5 and q2 tests if a is 10.

However, if you add a 'grade_notebook' at the end, it'll always fail for q1, since a will be ten.

Re-assignment of variables used in tests will always fail. We could ask users to be careful about it, but that seems like an unnecessary restriction. Both nbgrader & okpy allow users to do incremental checking, so gradememaybe should also allow this.

okpy does this by using an object that collects grades. nbgrader does this by keeping tests directly in the notebook.

The approach I'd like to take here instead is to use AST rewriting to find all the 'check' calls and rewrite them to something else when grading!

The code above will be rewritten to:

a = 5

check_results_abcdefg.append(check_abcdefg('tests/q1.py'))

a = 10

check_results_abcdefg.append(check_abcdefg('tests/q2.py'))

check_results_abcdefg is a list (or other object) that can collect TestResult objects generated by check_abcdefg. A randomly generated unique suffix is added to the check functions to make it harder (but not impossible!) for students to cheat by simply redefining a 'check' function. The check_results_abcdefg object and check_abcdefg function will be injected by the grading function as well. After all the code has been evaluated, check_results_abcdefg can be evaluated to find the final grade.

This imposes the following constraints on test notebook authors:

  1. Check cells are empty other than check calls (comments are allowed!). We execute code cell by cell, and this ensures that check functions aren't skipped due to errors
  2. Check cells shouldn't be deleted by students. This should be enforced with a notebook extension outside the scope of gradememaybe.

There should be a linter that validates these.

@choldgraf
Copy link
Collaborator

Have you considered using tags in a cell to correspond to tests in the test suite? E.g. the grade_notebooks would:

  1. Look in the ok file if any of the tests are linked against a specific tag.
  2. If it finds one, also check if one and only one cell in the notebook has this tag.
  3. If so, run all the cells up to the tagged cell and then run the equivalent of "check".
  4. Repeat this for any other "tagged" tests (so multiple tests can have the same tag, but only one cell per notebook can have this tag)
  5. For any tests that don't have tags, run them after all cells in the notebook are run.

@yuvipanda
Copy link
Contributor Author

Interesting other option, @choldgraf! Thank you for bringing it up!

I spent some time considering it, and it led me to write up a design doc (similar to what we have for r2d) https://github.com/grading/gradememaybe/blob/master/docs/design.md. It has some info on why I don't like relying on notebook metadata for critical functions. Your thoughts on that would be most welcome :)

@yuvipanda
Copy link
Contributor Author

I also realized that one of the restrictions I have is unnecessarily strict:

Check cells shouldn't be deleted by students. This should be enforced with a notebook extension outside the scope of gradememaybe.

The actual restriction should be:

When grading, we will provide the grader with a list of questions to be graded. In the student assignment, there should be at least one check call for each of these questions.

This is easier to check & enforce, and does not require any external tooling for enforcing integrity. Students can be prevented from accidentally deleting check cells by marking them readonly, which is a feature built into most notebook frontends.

@yuvipanda
Copy link
Contributor Author

yuvipanda commented Jun 24, 2018

I also wanted to provide more historical context here, to explain my thought process a bit more.

Relying on notebook metadata is the approach taken by nbgrader. It works great for many use cases, and is extremely flexible. However, this has two major disadvantages:

  1. You can't use this with plain .py files easily (AFAIK)
  2. You can't rely on the file metadata that comes back from the user, since they could have modified it to be anything! Fixing this requires building a bunch of integrity checking mechanisms - nbgrader has checksums & requires a database for this reason. This adds required state to the system, and I hate state!

It's not entirely clear to me the approach okpy takes with notebooks, but my understanding of it is:

  1. The equivalent of 'check' calls write things to stdout
  2. There is a script that parses things from stdout and makes grades.

This has a bunch of problems, primarily:

  1. String parsing is hard, complex and avoidable
  2. Not sure, but students can easily game this by modifying stdout and replacing the check functions with whatever they want.
  3. Writing things to stdout is not the best student experience when using notebooks - we can provide rich outputs, not just strings in monospace font.

AST rewriting (which is what I hope to implement) is actually stolen from pytest. It's cons will become more apparent once we actually implement it :)

I also <3 okpy & nbgrader, and they both have a lot of pros I have not listed here :)

@papajohn
Copy link
Contributor

I thought @vipasu told me that he had addressed this issue. Is that true? If so, can we close it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants