Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #189: Update the Beam jobs wiki page with instructions for how we handle Beam jobs in the Oppia codebase. #191

Merged
merged 3 commits into from
Oct 1, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions Apache-Beam-Jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
* [Common Beam errors](#common-beam-errors)
* [`'_UnwindowedValues' object is not subscriptable` error](#_unwindowedvalues-object-is-not-subscriptable-error)
* [`_namedptransform is not iterable` error](#_namedptransform-is-not-iterable-error)
* [Guidelines for writing beam jobs](#guidelines-for-writing-beam-jobs)
* [Case studies](#case-studies)
* [Case Study: `SchemaMigrationJob`](#case-study-schemamigrationjob)

Expand Down Expand Up @@ -691,6 +692,32 @@ some_values = (
| 'Get values' >> beam.Values()
)
```
## Guidelines for writing beam jobs
This section provides some general guidelines for writing Beam jobs, which would be particularly helpful for new contributors.

### Planning a job
* If your beam job includes updating the storage models, make sure to write an audit job. An audit job is similar to your beam job but doesn't make any changes to the datastore. During testing, the audit job is run before the actual beam job to prevent any unwanted changes to the datastore incase of an error. You will be able to find examples in the codebase for the same.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incase --> in case

Can you point to the examples in the codebase directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere: capitalize "Beam"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* There should also be a job to verify the changes done by your job.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how to do this (maybe a worked example or a reference)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bullet point here still reads as though it's standalone. Maybe put the previous point below it so that it's clear that the previous point is an example of this point? Otherwise, it looks like the previous point is an example of the point before it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlip Do you mean to say I should interchange the points "If your beam job..." and "Consider the example..."?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I just mean that the actual example should be after the point that it is supposed to be an example for. If it is an example for line 703 then the line with the example should come after line 703. Does that help clarify?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does. Thanks for the clarification @seanlip.


### Executing jobs
* Refer to code from similar jobs to avoid mistakes. The "Troubleshooting" section in the wiki lists out the common errors encountered while executing jobs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop "out the"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* Make sure to write tests for all jobs. These tests should check the behaviour of the job for different cases of inputs. The common cases for all jobs are listed below:
* Empty input.
* Input which triggers the job to perform its intended action.
* Incorrect input which causes the job to fail.
* The tests should ensure that the job runs appropirately for all such cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"appropriately" is misspelled

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### PR guidelines
* A PR for a beam job should always have "Proof of work" in the description which should include the successful local run of the beam job on the release coordinator page and a screenshot of the storage model which is being changed. The storage models can be checked [locally](https://github.com/oppia/oppia/wiki/Debugging-datastore-locally) via dsadmin.
* A testing request for the job must be submitted once it is approved by all reviewers. Make sure to be clear in your instructions (in the doc) about what exactly needs to be done by the tester. Also, the doc link should be included in the PR description.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain or point to info on how to actually submit the request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### Job testing workflow
* As soon as a PR is approved by all reviewers, a testing doc should be created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to follow on from the previous section. Might it be better to combine them, and have the following points as sub-bullets of the "testing request" main bullet point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the testing doc be created along with the initial review of the PR and be reviewed immediately as well? We can save the actual job run until the PR has been approved by others, but I don't see a reason to delay the creation of the documentation (and in fact I think it's needed for being able to review the PR properly in the first place).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right. Done

* The sample job testing [template](https://docs.google.com/document/d/1Xg0MSIpYUEax8dqH39CKhgqt30f-kr8wxO2F4aSxwz4/edit) should be used as a starting point for the testing doc. Make sure to make a copy before editing the doc.
* In case of multiple jobs to be tested, they should be written in the order of testing. Generally, an audit job is run before the migration job to verify that the job runs as intended.
* The instructions should be concise to prevent any confusion for the tester. Any pre and post checks to be done should also be specified clearly.
* After the doc is complete, request a review from the main code reviewer of the related PR. The reviewer should answer the relevant questions at the beginnig of the doc after completing their review.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beginnig --> beginning

Maybe change the last sentence to: "The reviewer should fill in the approval section at the top of the doc."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* The assigned server admin will then test the PR according to the instructions in the doc and update the PR author with the results. In case of errors,the server admin would provide the relevant logs for debugging.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe include the trigger condition at the start: Once approvals have been received, the ...

Add a space after "In case of errors,"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


## Case Studies

Expand Down
Loading