Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the job-scheduler logic #87

Closed
7 tasks done
Tracked by #349
AlexRuiz7 opened this issue Oct 2, 2024 · 9 comments · Fixed by #103 · May be fixed by wazuh/wazuh-indexer#577
Closed
7 tasks done
Tracked by #349

Implement the job-scheduler logic #87

AlexRuiz7 opened this issue Oct 2, 2024 · 9 comments · Fixed by #103 · May be fixed by wazuh/wazuh-indexer#577
Assignees
Labels
level/task Task issue mvp Minimum Viable Product type/enhancement Enhancement issue

Comments

@AlexRuiz7
Copy link
Member

AlexRuiz7 commented Oct 2, 2024

Description

As part of the command manager plugin development and in continuation of #65, we are going to implement the job-scheduler logic to prioritize the commands and send them to the Wazuh Server's Management API.

Plan

  • Implement the Job Runner class.
  • Implement the Job Parameter class.
  • Implement the Job Runner logic.

Functional requirements

  • The job runner sends the commands to an external function for its processing. For the time being, we can just print these commands. Once the HTTP service implementation is completed, we can assemble both pieces.
@AlexRuiz7 AlexRuiz7 added level/task Task issue type/enhancement Enhancement issue labels Oct 2, 2024
@wazuhci wazuhci moved this to Backlog in Release 5.0.0 Oct 3, 2024
@f-galland f-galland self-assigned this Oct 10, 2024
@wazuhci wazuhci moved this from Backlog to In progress in Release 5.0.0 Oct 10, 2024
@f-galland
Copy link
Member

Other plugins seem to interface with JobScheduler through its Service Provider Interface:

@f-galland
Copy link
Member

f-galland commented Oct 10, 2024

It looks like the Plugin class (the main class inheriting from OpenSearch's Plugin) needs to implement JobSchedulerExtension.

@f-galland
Copy link
Member

f-galland commented Oct 10, 2024

A separate class implements ScheduledJobRunner's runJob() which pushes the task to its own thread:

A javadoc in this class reads as follows:

 * The job runner class for scheduling async query.
 *
 * <p>The job runner should be a singleton class if it uses OpenSearch client or other objects
 * passed from OpenSearch. Because when registering the job runner to JobScheduler plugin,
 * OpenSearch has not invoked plugins' createComponents() method. That is saying the plugin is not
 * completely initialized, and the OpenSearch {@link org.opensearch.client.Client}, {@link
 * ClusterService} and other objects are not available to plugin and this job runner.
 *
 * <p>So we have to move this job runner initialization to {@link Plugin} createComponents() method,
 * and using singleton job runner to ensure we register a usable job runner instance to JobScheduler
 * plugin.

@f-galland
Copy link
Member

f-galland commented Oct 10, 2024

@AlexRuiz7
Copy link
Member Author

That research was already performed in #65

@f-galland
Copy link
Member

#65 's PR only added job-scheduler to the command manager's gradle task. Job scheduler classes are not really being used over there.

@AlexRuiz7 AlexRuiz7 linked a pull request Oct 14, 2024 that will close this issue
@wazuhci wazuhci moved this from In progress to On hold in Release 5.0.0 Oct 14, 2024
@wazuhci wazuhci moved this from On hold to In progress in Release 5.0.0 Oct 15, 2024
@f-galland
Copy link
Member

SampleExtensionRestHandler:

  • Receives POST call parameters
  • Instantiates SampleJobParameters with parameters from POST call
  • Indexes the SampleJobParameter as a json object.

SampleExtensionPlugin:

  • Exposes:
    • getJobType(): Returns a string with the job type
    • getJobIndex(): Returns the name of the index that holds the scheduled jobs' parameters
    • getJobRunner(): Returns the singleton instance of the plugin's Runner class
    • getJobParser(): Returns a ScheduledJobParser object that can parse the task's parameters

SampleJobParameter:

  • Implemets getters and setters for every job parameter
  • Implements toXContent(), which is used to index the job
SampleJobRunner:
  • Implements runJob() which contains the job's logic.
  • Receives a ScheduledJobParameter, which gives it access to the task's details
  • Receives a JobExecutionContext which allows it to acquire a lock during the task execution time window.
  • The task itself is wrapped inside a Runnable object that gets submitted to an Opensearch thread.

It seems like the only proper way to schedule tasks using the job scheduler is to store them as documents to an index.

This is evidenced by the fact that the only call of runJob comes from the reschedule() method from the JobScheduler class. The job parameters to this runJob() call can be traced back to the sweep() method from the JobSweeper class in turn.
Lastly, the sweep() method seems to parse the job parameters from a provided index.

@wazuhci wazuhci moved this from In progress to On hold in Release 5.0.0 Oct 18, 2024
@wazuhci wazuhci moved this from On hold to In progress in Release 5.0.0 Oct 22, 2024
@f-galland
Copy link
Member

f-galland commented Oct 25, 2024

Search results pagination can be achieved by means of two distinct methods:

  1. Using SearchSourceBuilder's from() and size() which appear to be meant for user facing interfaces
  2. Using Scroll and other related classes.

Solution 2 seems more robust (and is suggested for larger data batches).

I'm researching how official plugins handle iterating over the search result pages without blocking execution.

We have used the provided ThreadPool for this in past tests alongside simple while loops, but there seem to be more elegant solutions:

@wazuhci wazuhci moved this from In progress to On hold in Release 5.0.0 Nov 5, 2024
@wazuhci wazuhci moved this from On hold to In progress in Release 5.0.0 Nov 5, 2024
@wazuhci wazuhci moved this from In progress to Pending review in Release 5.0.0 Nov 5, 2024
@f-galland
Copy link
Member

As of commit 3fc33ea, the JobSchedulerExtension has been implemented as explained below:

@wazuhci wazuhci moved this from Pending review to In progress in Release 5.0.0 Nov 11, 2024
@wazuhci wazuhci moved this from In progress to On hold in Release 5.0.0 Nov 11, 2024
@wazuhci wazuhci moved this from On hold to In progress in Release 5.0.0 Nov 14, 2024
@havidarou havidarou added the mvp Minimum Viable Product label Nov 18, 2024
@wazuhci wazuhci moved this from In progress to On hold in Release 5.0.0 Nov 19, 2024
@wazuhci wazuhci moved this from On hold to In progress in Release 5.0.0 Nov 21, 2024
@wazuhci wazuhci moved this from In progress to Pending review in Release 5.0.0 Nov 25, 2024
@wazuhci wazuhci moved this from Pending review to On hold in Release 5.0.0 Nov 27, 2024
@wazuhci wazuhci moved this from On hold to In progress in Release 5.0.0 Nov 28, 2024
@wazuhci wazuhci moved this from In progress to On hold in Release 5.0.0 Dec 2, 2024
@wazuhci wazuhci moved this from On hold to In progress in Release 5.0.0 Dec 3, 2024
@wazuhci wazuhci moved this from In progress to Pending review in Release 5.0.0 Dec 3, 2024
@wazuhci wazuhci moved this from Pending review to In final review in Release 5.0.0 Dec 4, 2024
@wazuhci wazuhci moved this from In final review to Done in Release 5.0.0 Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task Task issue mvp Minimum Viable Product type/enhancement Enhancement issue
Projects
Status: Done
3 participants