Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(zetaclient): pkg/scheduler #3319

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from
Open

feat(zetaclient): pkg/scheduler #3319

wants to merge 7 commits into from

Conversation

swift1337
Copy link
Contributor

@swift1337 swift1337 commented Dec 19, 2024

pkg/scheduler provides a background task scheduler that allows for the registration, execution, and management of periodic tasks.

Tasks can be grouped, named, and configured with various options such as custom intervals, log fields, and skip conditions.

The scheduler supports dynamic interval updates and can gracefully stop tasks either individually or by group.

Features

  • Task registration
  • Task configuration (interval updates, skip invocation, etc..)
  • Task grouping
  • Block-based tickers 🔥
  • Graceful shutdown

Changes

  • Scheduler package
  • Minor improvements in ticker package
  • Add ticker.StopBlocking()
  • Metrics [WIP]

Closes #3299

Summary by CodeRabbit

  • New Features

    • Introduced a /systemtime telemetry endpoint in the Zeta client.
    • Added a scheduler for managing periodic tasks.
  • Bug Fixes

    • Improved handling of unsupported transaction versions for Solana.
    • Enhanced error handling for inbound vote message validations.
  • Documentation

    • Updated changelog to reflect new features, tests, and fixes.
  • Tests

    • Added comprehensive unit tests for the scheduler functionality, including task registration and execution scenarios.
  • Refactor

    • Revamped the TSS package and improved the initialization process of the Zeta client.
  • Style

    • Renamed existing test groups for clarity.

@swift1337 swift1337 self-assigned this Dec 19, 2024
Copy link
Contributor

coderabbitai bot commented Dec 19, 2024

📝 Walkthrough

Walkthrough

The pull request introduces a comprehensive scheduler component for the ZetaChain project, designed to manage background tasks with enhanced flexibility and control. The new pkg/scheduler package provides a robust implementation that supports task registration, dynamic interval updates, group-based task management, and sophisticated error handling. This implementation addresses the need for a centralized mechanism to handle periodic background tasks across different components of the system.

Changes

File Change Summary
changelog.md Updated to reflect new features, tests, and refactors across ZetaChain components
pkg/scheduler/chan.go Introduced blockTicker for handling Zeta block events with custom ticker implementation
pkg/scheduler/opts.go Added functional options for configuring task definitions
pkg/scheduler/scheduler.go Implemented background task scheduler with comprehensive task management
pkg/scheduler/scheduler_test.go Added comprehensive unit tests for scheduler functionality
pkg/ticker/ticker.go Updated ticker implementation with improved state management
pkg/ticker/ticker_test.go Modified tests to reflect new ticker behavior

Sequence Diagram

sequenceDiagram
    participant Scheduler
    participant Task
    participant BlockChannel
    
    Scheduler->>Task: Register with options
    Scheduler->>BlockChannel: Listen for events
    loop Task Execution
        BlockChannel-->>Scheduler: New Block Event
        Scheduler->>Task: Execute Task
        Task-->>Scheduler: Return Result
    end
    Scheduler->>Task: Stop Task
Loading

Assessment against linked issues

Objective Addressed Explanation
Implement Scheduler component [#3299]
Scheduler groups support
Test Coverage
Automatic metrics provisioning Metrics integration not explicitly demonstrated

Possibly related PRs

Suggested labels

e2e, performance, infrastructure

Suggested reviewers

  • fbac
  • kingpinXD
  • lumtis
  • skosito
  • brewmaster012

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@swift1337 swift1337 requested a review from gartnera December 19, 2024 17:38
@swift1337 swift1337 marked this pull request as ready for review December 19, 2024 17:44
@swift1337 swift1337 requested a review from a team as a code owner December 19, 2024 17:44

err := d.task(ctx)

// todo metrics (TBD)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gartnera would be great to hear your thoughts on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we add one metric then use the task name as a label.

Metrics for:

  • count of times run
  • error rate
  • duration

Copy link
Member

@gartnera gartnera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note #3317

doneChan chan struct{}

// atomic flag. `1` for RUNNING, `0` for STOPPED
status int32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use atomic.Int32{} directly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like mixing the timer based tickers and block ticker logic. Maybe a common interface would be a better idea (implementing Stop() for example) would work better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to preserve the current state for now. The exported API is compact and straightforward, which provides ample space for future improvements. Let's see how it unrolls when integrating with ObserverSigner{}

Comment on lines +42 to +46
// BlockTicker makes Definition to listen for new zeta blocks instead of using interval ticker.
// IntervalUpdater is ignored.
func BlockTicker(blocks <-chan cometbft.EventDataNewBlock) Opt {
return func(d *Definition) { d.blockChan = blocks }
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a run every N blocks ticker would be useful too.

Just mod block number == 0 right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be implemented as a separate "middleware" or Opt that wraps chan with another chan, etc. Block ticker should not be aware of this.

Copy link

codecov bot commented Dec 19, 2024

Codecov Report

Attention: Patch coverage is 92.27468% with 18 lines in your changes missing coverage. Please review.

Project coverage is 61.93%. Comparing base (faf1f79) to head (80768d8).

Files with missing lines Patch % Lines
pkg/scheduler/chan.go 85.24% 7 Missing and 2 partials ⚠️
pkg/scheduler/scheduler.go 92.91% 6 Missing and 3 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3319      +/-   ##
===========================================
+ Coverage    61.71%   61.93%   +0.21%     
===========================================
  Files          440      443       +3     
  Lines        31141    31362     +221     
===========================================
+ Hits         19220    19423     +203     
- Misses       11058    11071      +13     
- Partials       863      868       +5     
Files with missing lines Coverage Δ
pkg/scheduler/opts.go 100.00% <100.00%> (ø)
pkg/ticker/ticker.go 94.59% <100.00%> (+1.11%) ⬆️
pkg/scheduler/chan.go 85.24% <85.24%> (ø)
pkg/scheduler/scheduler.go 92.91% <92.91%> (ø)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
pkg/scheduler/chan.go (2)

35-42: Consider initializing doneChan in newBlockTicker
Currently, doneChan is assigned to nil and then later created in Start(). For improved clarity and consistency, consider initializing doneChan directly within newBlockTicker.


81-85: Enhance test coverage for task error cases
Lines 81–85 suggest handling and logging a task error or context error scenario. Consider adding a test case where the task function returns an error or the context is canceled to confirm robust logging and coverage.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 81-85: pkg/scheduler/chan.go#L81-L85
Added lines #L81 - L85 were not covered by tests

pkg/ticker/ticker.go (2)

59-59: Document runCompleteChan usage
runCompleteChan is central for StopBlocking. Consider a short comment or docstring to clarify that it signals the final completion of the task loop.


178-183: Safe multiple calls to Stop()
Your approach of setting the ticker’s state and stopping it is concurrency-safe. Consider logging a debug message if Stop() is called when the ticker is already stopped, so the caller is aware of the no-op.

pkg/scheduler/opts.go (1)

42-46: BlockTicker overrides interval-based scheduling
Invoking BlockTicker zeroes out the interval logic and sets block-based triggering. Ensure that calling IntervalUpdater afterwards does not cause confusion for maintainers—maybe add a note that it’s ignored once BlockTicker is set.

pkg/ticker/ticker_test.go (2)

156-158: Consider using a test helper for logger creation.

The logger creation logic is duplicated across test cases. Consider extracting it into a test helper function to improve maintainability.

+func newTestLogger(t *testing.T) zerolog.Logger {
+    return zerolog.New(zerolog.NewTestWriter(t)).With().Timestamp().Logger()
+}

161-173: Consider parameterizing the task creation.

The task creation logic could be parameterized to make it more reusable across different test cases and scenarios.

+type taskConfig struct {
+    sleepDuration time.Duration
+    logger        zerolog.Logger
+}
+
+func newTestTask(counter *int32, cfg taskConfig) Task {
+    return func(ctx context.Context, _ *Ticker) error {
+        cfg.logger.Info().Msg("Tick start")
+        atomic.AddInt32(counter, 1)
+        time.Sleep(cfg.sleepDuration)
+        cfg.logger.Info().Msg("Tick end")
+        atomic.AddInt32(counter, -1)
+        return nil
+    }
+}
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between faf1f79 and 80768d8.

📒 Files selected for processing (7)
  • changelog.md (1 hunks)
  • pkg/scheduler/chan.go (1 hunks)
  • pkg/scheduler/opts.go (1 hunks)
  • pkg/scheduler/scheduler.go (1 hunks)
  • pkg/scheduler/scheduler_test.go (1 hunks)
  • pkg/ticker/ticker.go (4 hunks)
  • pkg/ticker/ticker_test.go (14 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
pkg/ticker/ticker_test.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

pkg/ticker/ticker.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

pkg/scheduler/opts.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

pkg/scheduler/scheduler_test.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

pkg/scheduler/chan.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

pkg/scheduler/scheduler.go (1)

Pattern **/*.go: Review the Go code, point out issues relative to principles of clean code, expressiveness, and performance.

🪛 GitHub Check: codecov/patch
pkg/scheduler/chan.go

[warning] 56-57: pkg/scheduler/chan.go#L56-L57
Added lines #L56 - L57 were not covered by tests


[warning] 81-85: pkg/scheduler/chan.go#L81-L85
Added lines #L81 - L85 were not covered by tests

pkg/scheduler/scheduler.go

[warning] 123-124: pkg/scheduler/scheduler.go#L123-L124
Added lines #L123 - L124 were not covered by tests


[warning] 186-187: pkg/scheduler/scheduler.go#L186-L187
Added lines #L186 - L187 were not covered by tests


[warning] 230-231: pkg/scheduler/scheduler.go#L230-L231
Added lines #L230 - L231 were not covered by tests

🔇 Additional comments (14)
pkg/scheduler/chan.go (4)

56-57: Add a test case for duplicate starts
These lines indicate the ticker's already-running error state but are not covered by tests according to the static analysis report. A test scenario that calls Start() twice on the same blockTicker will improve coverage and confirm the correct error handling.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 56-57: pkg/scheduler/chan.go#L56-L57
Added lines #L56 - L57 were not covered by tests


54-91: Graceful handling of closed blockChan
The code logs a warning and returns if blockChan is closed. This is a good approach to end the loop gracefully. Additionally, ensure upstream logic is aware that the ticker stops if the channel closes unexpectedly.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 56-57: pkg/scheduler/chan.go#L56-L57
Added lines #L56 - L57 were not covered by tests


[warning] 81-85: pkg/scheduler/chan.go#L81-L85
Added lines #L81 - L85 were not covered by tests


94-105: Confirm concurrency correctness in Stop()
Stop() signals the loop and waits for the doneChan. This approach is correct for a blocking termination. Make sure to document that this can block until the ticker’s main loop exits.


107-117: Appropriate usage of atomic flags
Using CompareAndSwapInt32 to manage the runner state is a thread-safe approach. Confirm that no other concurrency paths modify status outside these methods to maintain data integrity.

pkg/ticker/ticker.go (2)

Line range hint 106-136: Ensure robust error handling on initial run
Inside Start(), the initial run might fail, triggering an immediate Stop(). This is correct for error containment. Just ensure that any required resource cleanup happens before returning the error.


185-191: Proper blocking in StopBlocking()
Blocking on runCompleteChan ensures we wait for the last iteration to complete. This method usage pattern needs to be highlighted to avoid usage from within the same task context.

pkg/scheduler/scheduler.go (4)

123-124: Test coverage for empty StopGroup
A scenario where StopGroup finds no definitions to stop (lines 123–124) is not covered according to the static analysis. Consider adding a test that exercises StopGroup on an empty group or a group without tasks to confirm no unwanted side effects.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 123-124: pkg/scheduler/scheduler.go#L123-L124
Added lines #L123 - L124 were not covered by tests


186-187: Improve coverage on error branch
When invoke() returns an error, lines 186–187 handle logging. Adding a negative test scenario that forces the task to fail would validate correct error logging and coverage.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 186-187: pkg/scheduler/scheduler.go#L186-L187
Added lines #L186 - L187 were not covered by tests


230-231: Test the blockTicker type within newDefinitionLogger
Lines 230–231 set "block_ticker" as the task type. Consider adding a test that registers a blockTicker-based Definition to confirm that the logger includes "task.type=block_ticker."

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 230-231: pkg/scheduler/scheduler.go#L230-L231
Added lines #L230 - L231 were not covered by tests


54-91: Graceful concurrency design
The concurrency approach with a sync.WaitGroup in StopGroup is sound, ensuring each definition stops safely in parallel. This design is likely to scale well as more tasks are added.

pkg/scheduler/opts.go (1)

14-15: Name option provides clarity
Setting a distinct name fosters easier debugging and logging. This functional option pattern is solid for maintainability and reuse.

pkg/ticker/ticker_test.go (1)

23-24: Verify test isolation for parallel execution.

The addition of t.Parallel() to all test cases could potentially cause issues if the tests share any global state or resources. Ensure that each test case is completely isolated and uses its own test fixtures.

Also applies to: 51-52, 78-79, 114-115, 147-149, 175-177, 206-208, 309-311

✅ Verification successful

Let me check if there are any shared resources between test functions that could cause race conditions.


Parallel test execution is safe

The counter variables are properly scoped within each test function and declared as local variables. Each test has its own independent counter, and there are no shared package-level variables that could cause race conditions during parallel test execution.

  • All counter variables are declared with := inside test functions, making them function-scoped
  • Each test function maintains its own isolated state
  • No shared mutable state is detected between test functions
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for potential shared resources across parallel tests
rg -A 5 "var \w+" pkg/ticker/ticker_test.go

Length of output: 769


Script:

#!/bin/bash
# Check for test function definitions and their scopes
ast-grep --pattern 'func Test$_($_) {
  $$$
}'

Length of output: 48


Script:

#!/bin/bash
# Check for any package level variables
rg "^var" pkg/ticker/ticker_test.go

Length of output: 37


Script:

#!/bin/bash
# Check for any test helper functions that might be shared
ast-grep --pattern 'func $_(t *testing.T) {
  $$$
}'

Length of output: 95433


Script:

#!/bin/bash
# Check if counter variable is package-scoped or function-scoped
rg "counter :?=" pkg/ticker/ticker_test.go -B 2

Length of output: 283

pkg/scheduler/scheduler_test.go (1)

1-402: LGTM! Well-structured test implementation.

The scheduler test implementation is comprehensive and well-organized with:

  • Clear test case organization
  • Good coverage of scheduler functionality
  • Proper use of test fixtures and helpers
changelog.md (1)

Line range hint 1-1000: LGTM! Well-documented changelog.

The changelog is comprehensive and well-structured with:

  • Clear categorization of changes
  • Breaking changes highlighted
  • Detailed descriptions of updates

Copy link
Member

@lumtis lumtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the API looks clear to me, maybe some comments can be more explicative

NIT PR is named feat(zetaclient) but we don't do any ZetaClient change here

logger zerolog.Logger
}

// Task represents scheduler's task
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe be more precise on this one? What differentiates a task from a normal function?

}
}

// Register registers and starts new task in the background
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we explain the returned value? Definition is configuration of a Task, to me a configuration is something that would be given as an input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Scheduler component
3 participants