Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #189: Update the Beam jobs wiki page with instructions for how we handle Beam jobs in the Oppia codebase. #191

Merged
merged 3 commits into from
Oct 1, 2023

Conversation

JayVivarekar
Copy link
Member

@JayVivarekar JayVivarekar commented Aug 27, 2023

This PR adds info to the "Apache Beam Jobs" wiki page regarding guidelines for writing beam jobs and job testing workflow.

@JayVivarekar
Copy link
Member Author

@seanlip PTAL

Copy link
Member

@seanlip seanlip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @JayVivarekar! Took a full pass and left some notes, PTAL and let me know if you have questions about any of them.

Thanks!

This section provides some general guidelines for writing Beam jobs, which would be particularly helpful for new contributors.

### Planning a job
* If your beam job includes updating the storage models, make sure to write an audit job. An audit job is similar to your beam job but doesn't make any changes to the datastore. During testing, the audit job is run before the actual beam job to prevent any unwanted changes to the datastore incase of an error. You will be able to find examples in the codebase for the same.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incase --> in case

Can you point to the examples in the codebase directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### Planning a job
* If your beam job includes updating the storage models, make sure to write an audit job. An audit job is similar to your beam job but doesn't make any changes to the datastore. During testing, the audit job is run before the actual beam job to prevent any unwanted changes to the datastore incase of an error. You will be able to find examples in the codebase for the same.
* There should also be a job to verify the changes done by your job.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how to do this (maybe a worked example or a reference)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bullet point here still reads as though it's standalone. Maybe put the previous point below it so that it's clear that the previous point is an example of this point? Otherwise, it looks like the previous point is an example of the point before it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlip Do you mean to say I should interchange the points "If your beam job..." and "Consider the example..."?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I just mean that the actual example should be after the point that it is supposed to be an example for. If it is an example for line 703 then the line with the example should come after line 703. Does that help clarify?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it does. Thanks for the clarification @seanlip.

* There should also be a job to verify the changes done by your job.

### Executing jobs
* Refer to code from similar jobs to avoid mistakes. The "Troubleshooting" section in the wiki lists out the common errors encountered while executing jobs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop "out the"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* Empty input.
* Input which triggers the job to perform its intended action.
* Incorrect input which causes the job to fail.
* The tests should ensure that the job runs appropirately for all such cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"appropriately" is misspelled

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

This section provides some general guidelines for writing Beam jobs, which would be particularly helpful for new contributors.

### Planning a job
* If your beam job includes updating the storage models, make sure to write an audit job. An audit job is similar to your beam job but doesn't make any changes to the datastore. During testing, the audit job is run before the actual beam job to prevent any unwanted changes to the datastore incase of an error. You will be able to find examples in the codebase for the same.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere: capitalize "Beam"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


### PR guidelines
* A PR for a beam job should always have "Proof of work" in the description which should include the successful local run of the beam job on the release coordinator page and a screenshot of the storage model which is being changed. The storage models can be checked [locally](https://github.com/oppia/oppia/wiki/Debugging-datastore-locally) via dsadmin.
* A testing request for the job must be submitted once it is approved by all reviewers. Make sure to be clear in your instructions (in the doc) about what exactly needs to be done by the tester. Also, the doc link should be included in the PR description.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain or point to info on how to actually submit the request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* A testing request for the job must be submitted once it is approved by all reviewers. Make sure to be clear in your instructions (in the doc) about what exactly needs to be done by the tester. Also, the doc link should be included in the PR description.

### Job testing workflow
* As soon as a PR is approved by all reviewers, a testing doc should be created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to follow on from the previous section. Might it be better to combine them, and have the following points as sub-bullets of the "testing request" main bullet point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done.

* The sample job testing [template](https://docs.google.com/document/d/1Xg0MSIpYUEax8dqH39CKhgqt30f-kr8wxO2F4aSxwz4/edit) should be used as a starting point for the testing doc. Make sure to make a copy before editing the doc.
* In case of multiple jobs to be tested, they should be written in the order of testing. Generally, an audit job is run before the migration job to verify that the job runs as intended.
* The instructions should be concise to prevent any confusion for the tester. Any pre and post checks to be done should also be specified clearly.
* After the doc is complete, request a review from the main code reviewer of the related PR. The reviewer should answer the relevant questions at the beginnig of the doc after completing their review.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beginnig --> beginning

Maybe change the last sentence to: "The reviewer should fill in the approval section at the top of the doc."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* In case of multiple jobs to be tested, they should be written in the order of testing. Generally, an audit job is run before the migration job to verify that the job runs as intended.
* The instructions should be concise to prevent any confusion for the tester. Any pre and post checks to be done should also be specified clearly.
* After the doc is complete, request a review from the main code reviewer of the related PR. The reviewer should answer the relevant questions at the beginnig of the doc after completing their review.
* The assigned server admin will then test the PR according to the instructions in the doc and update the PR author with the results. In case of errors,the server admin would provide the relevant logs for debugging.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe include the trigger condition at the start: Once approvals have been received, the ...

Add a space after "In case of errors,"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* A testing request for the job must be submitted once it is approved by all reviewers. Make sure to be clear in your instructions (in the doc) about what exactly needs to be done by the tester. Also, the doc link should be included in the PR description.

### Job testing workflow
* As soon as a PR is approved by all reviewers, a testing doc should be created.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the testing doc be created along with the initial review of the PR and be reviewed immediately as well? We can save the actual job run until the PR has been approved by others, but I don't see a reason to delay the creation of the documentation (and in fact I think it's needed for being able to review the PR properly in the first place).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right. Done

@JayVivarekar
Copy link
Member Author

@seanlip PTAL

Copy link
Member

@seanlip seanlip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JayVivarekar, this looks great. Just a small note but please feel free to merge once it's addressed. Thanks!


### Planning a job
* If your beam job includes updating the storage models, make sure to write an audit job. An audit job is similar to your beam job but doesn't make any changes to the datastore. During testing, the audit job is run before the actual beam job to prevent any unwanted changes to the datastore incase of an error. You will be able to find examples in the codebase for the same.
* There should also be a job to verify the changes done by your job.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bullet point here still reads as though it's standalone. Maybe put the previous point below it so that it's clear that the previous point is an example of this point? Otherwise, it looks like the previous point is an example of the point before it.

@JayVivarekar
Copy link
Member Author

@seanlip PTAL

@seanlip seanlip merged commit 6d4033a into oppia:develop Oct 1, 2023
3 checks passed
@seanlip
Copy link
Member

seanlip commented Oct 1, 2023

Looks great, I merged it. Thanks @JayVivarekar !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants