Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop retry flow for error handling (log trigger) #12026

Conversation

amirylm
Copy link
Collaborator

@amirylm amirylm commented Feb 14, 2024

AUTO-9005

Description

The idea is to stop retry ceremony after 6 rounds (first 5 retries 1 second, 1 retry 5 second) and then invoke error handler.

NOTES

  • does not address conditional triggers that should fallback after 1 round
  • In this PR we will return timeout once the retry round is 6, but in followup PRs we will make a call to err handler through executeCallback()

Changes

  • added error codes
  • return error code from DoRequest
  • return timeout upon high round retries count
  • added placeholder for next PRs to prepare args for checkErrHandler

Testing

  • Tested retryInterval (with timeout once reached)
  • Tested error codes
  • Followup PRs should add integration tests

Copy link
Contributor

I see that you haven't updated any README files. Would it make sense to do so?

Copy link
Contributor

I see that you haven't updated any CHANGELOG files. Would it make sense to do so?

@amirylm amirylm force-pushed the auto-9004-stream-err-handler branch from a3bd1f7 to 18b185f Compare February 14, 2024 14:09
@amirylm amirylm force-pushed the AUTO-9005-stop-retry-flow-for-error-handling-log-trigger branch from 3aff10d to cb19d4b Compare February 14, 2024 14:11
@amirylm amirylm marked this pull request as ready for review February 14, 2024 18:03
@amirylm amirylm requested a review from a team as a code owner February 14, 2024 18:03
Copy link
Contributor

@shileiwill shileiwill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM: added some nitpicking comments.

@@ -148,6 +151,7 @@ func (c *client) singleFeedRequest(ctx context.Context, ch chan<- mercury.Mercur
c.lggr.Warnf("at block %s upkeep %s received status code %d for feed %s", sl.Time.String(), sl.UpkeepId.String(), httpResponse.StatusCode, sl.Feeds[index])
retryable = true
state = encoding.MercuryFlakyFailure
errCode = encoding.HttpToErrCode(httpResponse.StatusCode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this errCode is only updated for this group of status? why not move this assignment outside of switch in L149?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the error code just in specific scenarios, so just to be sure I've preferred to do it specifically when needed.

@@ -154,26 +157,31 @@ func (c *client) multiFeedsRequest(ctx context.Context, ch chan<- mercury.Mercur
case http.StatusUnauthorized:
retryable = false
state = encoding.UpkeepNotAuthorized
errCode = encoding.HttpToErrCode(resp.StatusCode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we can save some lines by moving this assignment outside of switch-case

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want the error code just in specific scenarios, being explicit here seems more safe.

- typo
- using encoding.ErrCodeNil instead of 0
- rename ErrCodeMercuryError -> ErrCodeDataStreamsError
resultLen := len(streamsLookup.Feeds)
ch := make(chan mercury.MercuryData, resultLen)
if len(streamsLookup.Feeds) == 0 {
return encoding.NoPipelineError, encoding.UpkeepFailureReasonInvalidRevertDataInput, [][]byte{}, false, 0 * time.Second, fmt.Errorf("invalid revert data input: feed param key %s, time param key %s, feeds %s", streamsLookup.FeedParamKey, streamsLookup.TimeParamKey, streamsLookup.Feeds)
return encoding.NoPipelineError, encoding.UpkeepFailureReasonInvalidRevertDataInput, [][]byte{}, false, 0 * time.Second, encoding.ErrCodeNil, fmt.Errorf("invalid revert data input: feed param key %s, time param key %s, feeds %s", streamsLookup.FeedParamKey, streamsLookup.TimeParamKey, streamsLookup.Feeds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically this is also a user error and we can surface this in callback.

However, I'm not sure if we want to go through all 0.2 cases, properly define it and also test all cases. So one option is to just always return 0 error code here for 0.2 to satisfy the interface, while properly handle it for 0.3

cc @anirudhwarrier

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for e.g. there are some errors like 0.2 permissioning which are done upstream and not in 0.2 module

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can decide in followup PRs if we want exclude v0.2

@@ -237,20 +238,34 @@ func (s *streams) CheckCallback(ctx context.Context, values [][]byte, lookup *me
}

func (s *streams) DoMercuryRequest(ctx context.Context, lookup *mercury.StreamsLookup, checkResults []ocr2keepers.CheckResult, i int) ([][]byte, error) {
state, reason, values, retryable, retryInterval, err := encoding.NoPipelineError, encoding.UpkeepFailureReasonInvalidRevertDataInput, [][]byte{}, false, 0*time.Second, fmt.Errorf("invalid revert data input: feed param key %s, time param key %s, feeds %s", lookup.FeedParamKey, lookup.TimeParamKey, lookup.Feeds)
state, reason, values, retryable, retryInterval, errCode, err := encoding.NoPipelineError, encoding.UpkeepFailureReasonInvalidRevertDataInput, [][]byte{}, false, 0*time.Second, encoding.ErrCodeNil, fmt.Errorf("invalid revert data input: feed param key %s, time param key %s, feeds %s", lookup.FeedParamKey, lookup.TimeParamKey, lookup.Feeds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these initial values used anywhere? I see we always overwrite them with either 0.2.DoRequest or 0.3.DoRequest, maybe simple to just declare them as vars?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could have declare some of the vars, but some has initial value and thats why they are defined like this

@amirylm amirylm merged commit 75c8094 into auto-9004-stream-err-handler Feb 15, 2024
103 checks passed
@amirylm amirylm deleted the AUTO-9005-stop-retry-flow-for-error-handling-log-trigger branch February 15, 2024 18:55
infiloop2 added a commit that referenced this pull request Feb 27, 2024
* Stop retry flow for error handling (log trigger) (#12026)

* return timeout for retry interval

* identify retryTimeout + placeholder for err handler

* err codes

* comments

* added function and tests for handling err code

* add checkErrorHandler (#12037)

* Stop retry flow for error handling (conditional trigger) (#12032)

* err codes for conditionals

* unit tests for conditional

* refactor streams error handler

* fix mercury v0.2 request handling

* fix mercury 0.3 DoRequest

* fix tests

* connect check error callback

* add todo

* add todo

* improve comments

* polish 0.2

* fix debug.go, refactor eth_call on checkCallback and checkErrorHandler, fix some minor comments (typo and test)

* small fixes

* calculate retry config for conditionals

* rename to clarify function

* cleanup pipeline execution errors

* fix unit tests for v02_request.go

* Fix bug in 0.3 request error code

* add state assertion to single feed request tests

* add test case for invalid response bytes

* add tests for more retryable errors

* add tests for retryable -> error conversion

* polish 0.2 combining multiple feeds

* add more tests for different combination of feed responses

* remove unused fields

* unit tests for v03

* fix mercury_test.go

* minor polishing

* thread control: added function that accept a context

* use threadCtrl.GoCtx to ensure the timeout is applied with the provided context

* added timeout for mercury requests (including retries)

* fix lint

* set timeout to 10s

* use GoCtx within request clients

* lint

* add more 0.3 tests

* set err code to timeout if ctx is done

* Finalize stream error codes, polish requests to return consistent nil bytes upon error, use HttpToStreamsErrCode everywhere

* add tests for checkErrorHanlder

* treat timeout as non retryable error codes

* allow empty feed request which returns error code

* update test contract with a flag for checkErr result

* generate wrappers for LogTriggeredStreamsLookup

* handling empty performData case for err handler

* test (wip)

* fixing contract

* waiting for err handler logs (wip)

* update contract and generate wrappers

* lint

* use startBlock instead of 1

* add missing arg

* check multiple responses:

- server timeout
- unauthorized
- bad req
- internal server err
- not found

* fix test

* cleanup

* set timeout in http client

* Add timeout test for streams (#12170)

* Add timeout test

* fix DoRequest() to consider ctx (client v0.3)

* make sure we timeout in time

* align thread control test

* hacky context timeout

* trying to wait for timeout with child context

* push hack

* update

* another fix

* fix

* use ctx background

* add log

* Use new ctx to implement timeout

* fix test

* add v0.2 test

---------

Co-authored-by: amirylm <[email protected]>

* Empty Commit

---------

Co-authored-by: Amir Y <[email protected]>
Co-authored-by: Lei <[email protected]>
infiloop2 added a commit that referenced this pull request Feb 27, 2024
* Stop retry flow for error handling (log trigger) (#12026)

* return timeout for retry interval

* identify retryTimeout + placeholder for err handler

* err codes

* comments

* added function and tests for handling err code

* add checkErrorHandler (#12037)

* Stop retry flow for error handling (conditional trigger) (#12032)

* err codes for conditionals

* unit tests for conditional

* refactor streams error handler

* fix mercury v0.2 request handling

* fix mercury 0.3 DoRequest

* fix tests

* connect check error callback

* add todo

* add todo

* improve comments

* polish 0.2

* fix debug.go, refactor eth_call on checkCallback and checkErrorHandler, fix some minor comments (typo and test)

* small fixes

* calculate retry config for conditionals

* rename to clarify function

* cleanup pipeline execution errors

* fix unit tests for v02_request.go

* Fix bug in 0.3 request error code

* add state assertion to single feed request tests

* add test case for invalid response bytes

* add tests for more retryable errors

* add tests for retryable -> error conversion

* polish 0.2 combining multiple feeds

* add more tests for different combination of feed responses

* remove unused fields

* unit tests for v03

* fix mercury_test.go

* minor polishing

* thread control: added function that accept a context

* use threadCtrl.GoCtx to ensure the timeout is applied with the provided context

* added timeout for mercury requests (including retries)

* fix lint

* set timeout to 10s

* use GoCtx within request clients

* lint

* add more 0.3 tests

* set err code to timeout if ctx is done

* Finalize stream error codes, polish requests to return consistent nil bytes upon error, use HttpToStreamsErrCode everywhere

* add tests for checkErrorHanlder

* treat timeout as non retryable error codes

* allow empty feed request which returns error code

* update test contract with a flag for checkErr result

* generate wrappers for LogTriggeredStreamsLookup

* handling empty performData case for err handler

* test (wip)

* fixing contract

* waiting for err handler logs (wip)

* update contract and generate wrappers

* lint

* use startBlock instead of 1

* add missing arg

* check multiple responses:

- server timeout
- unauthorized
- bad req
- internal server err
- not found

* fix test

* cleanup

* set timeout in http client

* Add timeout test for streams (#12170)

* Add timeout test

* fix DoRequest() to consider ctx (client v0.3)

* make sure we timeout in time

* align thread control test

* hacky context timeout

* trying to wait for timeout with child context

* push hack

* update

* another fix

* fix

* use ctx background

* add log

* Use new ctx to implement timeout

* fix test

* add v0.2 test

---------

Co-authored-by: amirylm <[email protected]>

* Empty Commit

---------

Co-authored-by: Amir Y <[email protected]>
Co-authored-by: Lei <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Feb 27, 2024
* Stream fallback: error handler (#12173)

* Stop retry flow for error handling (log trigger) (#12026)

* return timeout for retry interval

* identify retryTimeout + placeholder for err handler

* err codes

* comments

* added function and tests for handling err code

* add checkErrorHandler (#12037)

* Stop retry flow for error handling (conditional trigger) (#12032)

* err codes for conditionals

* unit tests for conditional

* refactor streams error handler

* fix mercury v0.2 request handling

* fix mercury 0.3 DoRequest

* fix tests

* connect check error callback

* add todo

* add todo

* improve comments

* polish 0.2

* fix debug.go, refactor eth_call on checkCallback and checkErrorHandler, fix some minor comments (typo and test)

* small fixes

* calculate retry config for conditionals

* rename to clarify function

* cleanup pipeline execution errors

* fix unit tests for v02_request.go

* Fix bug in 0.3 request error code

* add state assertion to single feed request tests

* add test case for invalid response bytes

* add tests for more retryable errors

* add tests for retryable -> error conversion

* polish 0.2 combining multiple feeds

* add more tests for different combination of feed responses

* remove unused fields

* unit tests for v03

* fix mercury_test.go

* minor polishing

* thread control: added function that accept a context

* use threadCtrl.GoCtx to ensure the timeout is applied with the provided context

* added timeout for mercury requests (including retries)

* fix lint

* set timeout to 10s

* use GoCtx within request clients

* lint

* add more 0.3 tests

* set err code to timeout if ctx is done

* Finalize stream error codes, polish requests to return consistent nil bytes upon error, use HttpToStreamsErrCode everywhere

* add tests for checkErrorHanlder

* treat timeout as non retryable error codes

* allow empty feed request which returns error code

* update test contract with a flag for checkErr result

* generate wrappers for LogTriggeredStreamsLookup

* handling empty performData case for err handler

* test (wip)

* fixing contract

* waiting for err handler logs (wip)

* update contract and generate wrappers

* lint

* use startBlock instead of 1

* add missing arg

* check multiple responses:

- server timeout
- unauthorized
- bad req
- internal server err
- not found

* fix test

* cleanup

* set timeout in http client

* Add timeout test for streams (#12170)

* Add timeout test

* fix DoRequest() to consider ctx (client v0.3)

* make sure we timeout in time

* align thread control test

* hacky context timeout

* trying to wait for timeout with child context

* push hack

* update

* another fix

* fix

* use ctx background

* add log

* Use new ctx to implement timeout

* fix test

* add v0.2 test

---------

Co-authored-by: amirylm <[email protected]>

* Empty Commit

---------

Co-authored-by: Amir Y <[email protected]>
Co-authored-by: Lei <[email protected]>

* Empty Commit

---------

Co-authored-by: Amir Y <[email protected]>
Co-authored-by: Lei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants