Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrated code lifecycle: Improve reinitialization when pausing build agents #9939

Open
wants to merge 29 commits into
base: develop
Choose a base branch
from

Conversation

BBesrour
Copy link
Member

@BBesrour BBesrour commented Dec 3, 2024

Checklist

General

Server

  • Important: I implemented the changes with a very good performance and prevented too many (unnecessary) and too complex database calls.
  • I strictly followed the principle of data economy for all database calls.
  • I strictly followed the server coding and design guidelines.
  • I added multiple integration tests (Spring) related to the features (with a high test coverage).
  • I added pre-authorization annotations according to the guidelines and checked the course groups for all new REST Calls (security).
  • I documented the Java code using JavaDoc style.

Changes affecting Programming Exercises

  • High priority: I tested all changes and their related features with all corresponding user types on a test server configured with the integrated lifecycle setup (LocalVC and LocalCI).

Motivation and Context

We sometimes get this error when a build agent has been running for some time

Nov 28 21:48:53 ---[3392294]: java.lang.IllegalStateException: Connection pool shut down
...
com.github.dockerjava.httpclient5.ApacheDockerHttpClient.execute(ApacheDockerHttpClient.java:9)
Nov 28 21:48:53 artemis-production-agent23.artemis.cit.tum.de java[3392294]:         at com.github.dockerjava.core.DefaultInvocationBuilder.execute(DefaultInvocationBuilder.java:228)
Nov 28 21:48:53 artemis-production-agent23.artemis.cit.tum.de java[3392294]:         at com.github.dockerjava.core.DefaultInvocationBuilder.lambda$executeAndStream$1(DefaultInvocationBuilder.java:269)
Nov 28 21:48:53 artemis-production-agent23.artemis.cit.tum.de java[3392294]:         at java.base/java.lang.Thread.run(Thread.java:1583)

Description

We reinitialize Docker Client and ThreadPoolExecutor when pausing build agent.

Steps for Testing

Prerequisites:

  • 1 Admin
  1. Submit a few times to a programming exercise (Pls count how many submissions to verify that they all finished)
  2. Pause build agents
  3. Wait for the running build jobs to finish while making sure there are some queued jobs (you can submit a few more times after pausing the agent)
  4. Resume build agent
  5. Make sure that the build jobs run correctly (you can check this on the build overview page)

Testserver States

Note

These badges show the state of the test servers.
Green = Currently available, Red = Currently locked
Click on the badges to get to the test servers.







Review Progress

Performance Review

  • I (as a reviewer) confirm that the client changes (in particular related to REST calls and UI responsiveness) are implemented with a very good performance even for very large courses with more than 2000 students.
  • I (as a reviewer) confirm that the server changes (in particular related to database calls) are implemented with a very good performance even for very large courses with more than 2000 students.

Code Review

  • Code Review 1
  • Code Review 2

Manual Tests

  • Test 1
  • Test 2

Test Coverage

Screenshots

Summary by CodeRabbit

  • New Features

    • Enhanced management of build jobs and Docker client interactions through new configuration options.
    • Added lifecycle management methods for build executor and Docker client, including pause and resume functionalities.
    • Improved logging for test result handling during build job execution.
  • Bug Fixes

    • Improved error handling and logging for Docker container management and build job execution.
  • Refactor

    • Transitioned to a configuration-based approach for managing build executors and Docker clients, improving modularity and testability.
    • Streamlined error handling and resource management in build job execution.
  • Documentation

    • Updated method documentation to reflect changes in the configuration and management logic.

@BBesrour BBesrour self-assigned this Dec 3, 2024
@BBesrour BBesrour requested a review from a team as a code owner December 3, 2024 23:16
@github-actions github-actions bot added server Pull requests that update Java code. (Added Automatically!) buildagent Pull requests that affect the corresponding module labels Dec 3, 2024
Copy link

coderabbitai bot commented Dec 3, 2024

Warning

Rate limit exceeded

@BBesrour has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 15 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 98342c5 and bbd7591.

📒 Files selected for processing (1)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (8 hunks)

Walkthrough

The changes in this pull request involve significant modifications to several classes within the build agent module. The BuildAgentConfiguration class has been updated to manage build job execution and Docker client interactions through new fields and methods. Other classes, including BuildAgentDockerService, BuildJobContainerService, BuildJobManagementService, and SharedQueueProcessingService, have been refactored to utilize BuildAgentConfiguration instead of direct Docker client instances or executor services. This restructuring enhances the modularity and management of build processes and Docker interactions.

Changes

File Change Summary
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java Added new fields and methods for managing build jobs and Docker client interactions; updated method signatures.
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildAgentDockerService.java Replaced DockerClient with BuildAgentConfiguration; updated methods to retrieve Docker client from configuration.
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java Replaced DockerClient with BuildAgentConfiguration; updated methods to use new configuration for Docker client.
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java Replaced direct executor service reference with BuildAgentConfiguration; updated job submission logic accordingly.
src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java Replaced ThreadPoolExecutor with BuildAgentConfiguration; updated job handling and service management methods.
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java Enhanced logging and error handling for test results during build job execution.
src/test/java/de/tum/cit/aet/artemis/programming/icl/TestBuildAgentConfiguration.java Removed mocked DockerClient bean method, simplifying test setup for Docker interactions.
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java Enhanced Docker command mocking setup for testing build agent interactions.

Possibly related PRs

Suggested labels

ready to merge

Suggested reviewers

  • SimonEntholzer
  • krusche
  • coolchock
  • az108

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (10)
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (2)

Line range hint 175-182: Restrict visibility of createDockerClient() method

The method createDockerClient() is only used within the BuildAgentConfiguration class. To follow the principle of least privilege, consider changing its access modifier from public to private.

Apply this diff to change the method's visibility:

-public DockerClient createDockerClient() {
+private DockerClient createDockerClient() {

Line range hint 183-200: Handle unexpected memory units in parseMemoryString

The parseMemoryString method does not handle cases where memory strings have unexpected units or formats (e.g., missing quotes, uppercase letters). Consider adding validation or default cases to prevent potential NumberFormatException.

Apply this diff to enhance robustness:

         else {
+            try {
                 return Long.parseLong(memoryString);
+            } catch (NumberFormatException e) {
+                throw new LocalCIException("Invalid memory string format: " + memoryString, e);
+            }
         }
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (2)

208-244: Optimize executor service usage in stopUnresponsiveContainer

Creating a new ExecutorService every time stopUnresponsiveContainer is called may lead to resource overhead. Consider reusing a shared executor service or refactoring the method to avoid unnecessary executor creation.

Apply this diff to use a shared executor service:

+private final ExecutorService executor = Executors.newSingleThreadExecutor();

 public void stopUnresponsiveContainer(String containerId) {
-    try (ExecutorService executor = Executors.newSingleThreadExecutor()) {
         try {
             // Existing code...
         }
         finally {
-            executor.shutdown();
+            // No need to shutdown the shared executor here
         }
-    }
 }

Ensure the shared executor service is properly shut down when the application is stopped, possibly in a @PreDestroy method.


Line range hint 406-433: Handle potential resource leaks in executeDockerCommand

In the executeDockerCommand method, if the dockerClient.execStartCmd does not call onComplete, the CountDownLatch may not be decremented, leading to a potential indefinite wait in latch.await(). Consider adding a timeout or ensuring that onComplete is always called.

Apply this diff to prevent indefinite blocking:

 try {
     dockerClient.execStartCmd(execCreateCmdResponse.getId()).withDetach(detach).exec(new ResultCallback.Adapter<>() {
         @Override
         public void onNext(Frame item) {
             // Existing code...
         }
+        @Override
+        public void onError(Throwable throwable) {
+            latch.countDown();
+            super.onError(throwable);
+        }
+        @Override
+        public void close() throws IOException {
+            latch.countDown();
+            super.close();
+        }
     });
 } catch (ConflictException e) {
     throw new LocalCIException("Could not execute Docker command: " + String.join(" ", command), e);
 }

 try {
-    latch.await();
+    if (!latch.await(60, TimeUnit.SECONDS)) {
+        throw new LocalCIException("Timeout while executing Docker command: " + String.join(" ", command));
+    }
 } catch (InterruptedException e) {
     Thread.currentThread().interrupt();
     throw new LocalCIException("Interrupted while executing Docker command: " + String.join(" ", command), e);
 }
src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (3)

340-342: Enhance exception logging in updateLocalBuildAgentInformation

The catch block logs a generic error without specific information about the exception. Consider adding the exception to the log statement to aid in debugging.

Apply this diff to include the exception in the log:

         catch (Exception e) {
-            log.error("Error while updating build agent information for agent with address {}", memberAddress);
+            log.error("Error while updating build agent information for agent with address {}", memberAddress, e);
         }

272-275: Avoid logging sensitive or excessive information

The log statement in the catch block logs the build job and executor service details, which may contain sensitive information or clutter the logs. Review the logged data to ensure it complies with logging policies.

Apply this diff to adjust the log statement:

-    log.error("Couldn't add build job to thread pool: {}\n Concurrent Build Jobs Count: {} Active tasks in pool: {}, Concurrent Build Jobs Size: {}", buildJob,
+    log.error("Couldn't add build job to thread pool. Concurrent Build Jobs Count: {} Active tasks in pool: {}, Maximum Pool Size: {}",
             localProcessingJobs.get(), buildExecutorService.getActiveCount(), buildExecutorService.getMaximumPoolSize(), e);

578-582: Simplify condition in nodeIsAvailable() method

The condition in nodeIsAvailable() checks both localProcessingJobs and buildExecutorService active count against the maximum pool size. Since both represent the number of running tasks, consider simplifying the condition to avoid redundancy.

Apply this diff to simplify the condition:

 return localProcessingJobs.get() < buildExecutorService.getMaximumPoolSize()
-    && buildExecutorService.getActiveCount() < buildExecutorService.getMaximumPoolSize()
     && buildExecutorService.getQueue().isEmpty();
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (1)

Line range hint 193-199: Preserve interrupt status when handling InterruptedException

In the finishBuildJobExceptionally method, when catching InterruptedException, consider resetting the thread's interrupt status to ensure proper thread interruption handling.

Apply this diff to reset the interrupt status:

     if (containerId != null) {
         buildJobContainerService.stopUnresponsiveContainer(containerId);
     }
+    if (exception instanceof InterruptedException) {
+        Thread.currentThread().interrupt();
+    }
 }
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildAgentDockerService.java (2)

325-327: Handle potential exceptions during Docker image removal

While catching NotFoundException is appropriate, consider catching other exceptions that might occur during image removal, such as ConflictException, and log them accordingly to ensure reliable cleanup.

Apply this diff to handle additional exceptions:

 try {
     buildAgentConfiguration.getDockerClient().removeImageCmd(dockerImage).exec();
 } catch (NotFoundException e) {
     log.warn("Docker image {} not found during cleanup", dockerImage);
+} catch (ConflictException e) {
+    log.warn("Conflict occurred while removing Docker image {}: {}", dockerImage, e.getMessage());
+} catch (Exception e) {
+    log.error("Unexpected error while removing Docker image {}: {}", dockerImage, e.getMessage());
 }

Line range hint 348-385: Avoid potential race conditions in disk space check

In checkUsableDiskSpaceThenCleanUp(), the method does not lock the Docker client or the image list during the cleanup process. Consider adding synchronization to prevent race conditions when multiple threads might modify the Docker images concurrently.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0f12a54 and 74d8477.

📒 Files selected for processing (5)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (8 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildAgentDockerService.java (8 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (10 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (4 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (9 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildAgentDockerService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 4, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (1)

Line range hint 1-290: Consider adding resilience patterns

Given that this service manages distributed build jobs and Docker containers, consider implementing additional resilience patterns:

  1. Circuit breaker pattern for Docker client operations
  2. Retry mechanism for failed executor submissions
  3. Health check mechanism for the build executor service

This would help handle temporary failures and improve system stability during connection issues.

src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (2)

208-245: Consider extracting timeout values as configurable constants

The implementation handles container stopping and killing gracefully with proper resource management. However, the timeout values (15s for stop, 10s for kill) could be made configurable.

+ private static final int CONTAINER_STOP_TIMEOUT_SECONDS = 15;
+ private static final int CONTAINER_KILL_TIMEOUT_SECONDS = 10;
+ private static final int CONTAINER_STOP_COMMAND_TIMEOUT_SECONDS = 20;

  public void stopUnresponsiveContainer(String containerId) {
      try (ExecutorService executor = Executors.newSingleThreadExecutor()) {
          try {
              Future<Void> future = executor.submit(() -> {
-                 buildAgentConfiguration.getDockerClient().stopContainerCmd(containerId).withTimeout(15).exec();
+                 buildAgentConfiguration.getDockerClient().stopContainerCmd(containerId)
+                     .withTimeout(CONTAINER_STOP_TIMEOUT_SECONDS).exec();
                  return null;
              });
-             future.get(20, TimeUnit.SECONDS);
+             future.get(CONTAINER_STOP_COMMAND_TIMEOUT_SECONDS, TimeUnit.SECONDS);
          }
          // ... existing catch blocks ...
          try {
              Future<Void> killFuture = executor.submit(() -> {
                  buildAgentConfiguration.getDockerClient().killContainerCmd(containerId).exec();
                  return null;
              });
-             killFuture.get(10, TimeUnit.SECONDS);
+             killFuture.get(CONTAINER_KILL_TIMEOUT_SECONDS, TimeUnit.SECONDS);
          }

Line range hint 445-447: Consider enhancing path validation pattern

The current regex pattern might be too restrictive. Consider allowing additional valid path characters while maintaining security.

-    if (path == null || path.contains("..") || !path.matches("[a-zA-Z0-9_*./-]+")) {
+    if (path == null || path.contains("..") || !path.matches("^[a-zA-Z0-9_\\-.*/@\\s]+$")) {
         throw new LocalCIException("Invalid path: " + path);
     }

This pattern adds support for @ and spaces while maintaining protection against path traversal.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 74d8477 and 7fa6a0e.

📒 Files selected for processing (2)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (10 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

🔇 Additional comments (7)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (4)

33-33: LGTM: Clean dependency injection changes

The replacement of direct ExecutorService with BuildAgentConfiguration aligns with the PR objective of managing connection lifecycles centrally.

Also applies to: 51-51


86-88: LGTM: Constructor follows dependency injection best practices

The constructor properly initializes the BuildAgentConfiguration field through constructor injection, maintaining clean dependency management.


Line range hint 170-180: LGTM: Improved exception handling and logging

The enhanced exception handling for cancelled jobs and the addition of the exception parameter to log.error improves error tracking and debugging capabilities.

Also applies to: 237-237


147-148: ⚠️ Potential issue

Check for uninitialized executor service

The direct use of buildAgentConfiguration.getBuildExecutor() without a null check could lead to a NullPointerException.

Apply this diff to add a null check:

+if (buildAgentConfiguration.getBuildExecutor() == null) {
+    throw new LocalCIException("Build executor service is not initialized.");
+}
 future = buildAgentConfiguration.getBuildExecutor().submit(buildJob);
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (3)

62-62: LGTM: Clean dependency injection of BuildAgentConfiguration

The refactoring to use BuildAgentConfiguration instead of direct DockerClient injection improves modularity and aligns with the PR's objective of managing Docker client lifecycle.

Also applies to: 80-83


400-403: LGTM: Proper Docker command execution handling

The Docker command execution methods have been properly updated to use BuildAgentConfiguration while maintaining robust error handling and resource cleanup.

Also applies to: 406-439


452-454: LGTM: Efficient container lookup implementation

The container listing and filtering implementation is clean and efficient, properly handling null cases and container name matching.

coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 4, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java (2)

Line range hint 320-365: Consider improving error handling and cleanup logic

The current implementation has several areas that could be enhanced:

  1. Extract the cleanup logic into a separate method to improve readability and maintainability
  2. Consider using a more structured approach for handling cleanup failures
  3. Implement proper exception chaining to preserve the root cause

Here's a suggested refactor:

 finally {
-    buildJobContainerService.stopContainer(containerName);
-
-    // Delete the cloned repositories
-    deleteCloneRepo(assignmentRepositoryUri, assignmentRepoCommitHash, buildJob.id(), assignmentRepositoryPath);
-    deleteCloneRepo(testRepositoryUri, assignmentRepoCommitHash, buildJob.id(), testsRepositoryPath);
-    // do not try to delete the temp repository if it does not exist or is the same as the assignment reposity
-    if (solutionRepositoryUri != null && !Objects.equals(assignmentRepositoryUri.repositorySlug(), solutionRepositoryUri.repositorySlug())) {
-        deleteCloneRepo(solutionRepositoryUri, assignmentRepoCommitHash, buildJob.id(), solutionRepositoryPath);
-    }
-
-    for (int i = 0; i < auxiliaryRepositoriesUris.length; i++) {
-        deleteCloneRepo(auxiliaryRepositoriesUris[i], assignmentRepoCommitHash, buildJob.id(), auxiliaryRepositoriesPaths[i]);
-    }
-
-    try {
-        deleteRepoParentFolder(assignmentRepoCommitHash, assignmentRepositoryPath, testRepoCommitHash, testsRepositoryPath);
-    }
-    catch (IOException e) {
-        msg = "Could not delete " + CHECKED_OUT_REPOS_TEMP_DIR + " directory";
-        buildLogsMap.appendBuildLogEntry(buildJob.id(), msg);
-        log.error(msg, e);
-    }
+    cleanupResources(containerName, buildJob.id(), 
+        new CleanupContext(
+            assignmentRepositoryUri, testRepositoryUri, solutionRepositoryUri,
+            auxiliaryRepositoriesUris, assignmentRepositoryPath, testsRepositoryPath,
+            solutionRepositoryPath, auxiliaryRepositoriesPaths,
+            assignmentRepoCommitHash, testRepoCommitHash
+        )
+    );
 }

+ private record CleanupContext(
+     VcsRepositoryUri assignmentRepositoryUri, VcsRepositoryUri testRepositoryUri,
+     VcsRepositoryUri solutionRepositoryUri, VcsRepositoryUri[] auxiliaryRepositoriesUris,
+     Path assignmentRepositoryPath, Path testsRepositoryPath,
+     Path solutionRepositoryPath, Path[] auxiliaryRepositoriesPaths,
+     String assignmentRepoCommitHash, String testRepoCommitHash
+ ) {}
+
+ private void cleanupResources(String containerName, String buildJobId, CleanupContext ctx) {
+     List<Exception> suppressedExceptions = new ArrayList<>();
+
+     try {
+         buildJobContainerService.stopContainer(containerName);
+     } catch (Exception e) {
+         suppressedExceptions.add(e);
+         log.error("Failed to stop container {}", containerName, e);
+     }
+
+     cleanupRepositories(buildJobId, ctx, suppressedExceptions);
+
+     if (!suppressedExceptions.isEmpty()) {
+         LocalCIException ex = new LocalCIException("Cleanup failed with multiple errors");
+         suppressedExceptions.forEach(ex::addSuppressed);
+         log.warn("Cleanup completed with {} errors", suppressedExceptions.size(), ex);
+     }
+ }

Line range hint 1-24: Consider architectural improvements for better maintainability

The class has several areas that could benefit from architectural improvements:

  1. The class handles multiple concerns (build execution, Docker management, Git operations)
  2. The runScriptAndParseResults method has 12 parameters, making it hard to maintain
  3. The logging pattern is duplicated across methods

Consider:

  1. Extracting Docker-specific operations into a dedicated BuildDockerService
  2. Creating a BuildContext record to encapsulate build parameters
  3. Implementing a proper logging facade to reduce duplication

Example of a BuildContext record:

public record BuildContext(
    BuildJobQueueItem buildJob,
    String containerName,
    String containerId,
    RepositoryContext repositories,
    String assignmentRepoCommitHash,
    String testRepoCommitHash
) {
    public record RepositoryContext(
        VcsRepositoryUri assignmentUri,
        VcsRepositoryUri testUri,
        VcsRepositoryUri solutionUri,
        VcsRepositoryUri[] auxiliaryUris,
        Path assignmentPath,
        Path testsPath,
        Path solutionPath,
        Path[] auxiliaryPaths
    ) {}
}

Also applies to: 303-314

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7fa6a0e and a4d612d.

📒 Files selected for processing (1)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

🔇 Additional comments (1)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java (1)

303-306: LGTM: Well-structured logging additions

The new logging statements enhance observability by tracking critical steps in the build process, following the established logging pattern.

Also applies to: 311-314

coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 4, 2024
@BBesrour BBesrour marked this pull request as ready for review December 11, 2024 19:27
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (1)

Line range hint 147-168: Consider adding metrics for thread pool monitoring.

The implementation is solid with good error handling and clear thread naming. Consider adding metrics to monitor thread pool health (queue size, active threads, etc.).

Add metrics collection:

+    private final MeterRegistry meterRegistry;
+
     private ThreadPoolExecutor createBuildExecutor() {
         // ... existing code ...
+        // Add metrics
+        meterRegistry.gauge("build.executor.pool.size", threadPoolSize);
+        meterRegistry.gauge("build.executor.active.threads", 
+            buildExecutor, ThreadPoolExecutor::getActiveCount);
+        meterRegistry.gauge("build.executor.queue.size", 
+            buildExecutor, e -> e.getQueue().size());
         return buildExecutor;
     }
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (1)

226-263: Consider making timeouts configurable.

The implementation is solid with good error handling and resource cleanup. However, the timeout values (20s for stop, 10s for kill) should be configurable.

Extract timeout values to configuration:

+    @Value("${artemis.continuous-integration.container.stop-timeout-seconds:20}")
+    private int containerStopTimeoutSeconds;
+
+    @Value("${artemis.continuous-integration.container.kill-timeout-seconds:10}")
+    private int containerKillTimeoutSeconds;

     public void stopUnresponsiveContainer(String containerId) {
         try (ExecutorService executor = Executors.newSingleThreadExecutor()) {
             try {
-                future.get(20, TimeUnit.SECONDS);
+                future.get(containerStopTimeoutSeconds, TimeUnit.SECONDS);
                 // ... 
-                killFuture.get(10, TimeUnit.SECONDS);
+                killFuture.get(containerKillTimeoutSeconds, TimeUnit.SECONDS);
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8816eee and e54bebf.

📒 Files selected for processing (5)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (8 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (11 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java (2 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (5 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (9 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java
🧰 Additional context used
📓 Path-based instructions (3)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

🔇 Additional comments (12)
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (7)

45-49: LGTM! Fields are properly encapsulated.

The new private fields provide good encapsulation for managing the core resources.


72-82: LGTM! Clean getter implementations.

The getters follow Java conventions and provide controlled access to the private fields.


217-227: LGTM! Proper resource cleanup implementation.

The method handles Docker client cleanup appropriately with proper null checks and error handling.


229-232: LGTM! Clear coordination of resource cleanup.

The method provides a clean interface for shutting down both resources in a logical order.


234-237: LGTM! Clean reinitialization implementation.

The method properly reinitializes both resources in a logical order.


203-215: ⚠️ Potential issue

Implement proper executor shutdown handling.

The current implementation doesn't properly handle incomplete shutdowns, as noted in a previous review.

Apply the previously suggested changes to ensure proper resource cleanup:

-        if (buildExecutor != null && !buildExecutor.isShutdown()) {
-            buildExecutor.shutdown();
-            try {
-                buildExecutor.awaitTermination(5, TimeUnit.SECONDS);
-            }
-            catch (InterruptedException e) {
-                log.warn("Executor termination interrupted", e);
-            }
-        }
-        buildExecutor = null;
+        if (buildExecutor != null && !buildExecutor.isShutdown()) {
+            buildExecutor.shutdown();
+            try {
+                buildExecutor.awaitTermination(5, TimeUnit.SECONDS);
+                if (!buildExecutor.isTerminated()) {
+                    log.warn("Executor did not terminate in the specified time. Forcing shutdown now.");
+                    buildExecutor.shutdownNow();
+                }
+            }
+            catch (InterruptedException e) {
+                log.warn("Executor termination interrupted", e);
+                // Preserve interrupt status
+                Thread.currentThread().interrupt();
+            }
+            buildExecutor = null;
+        }

66-70: Consider potential race conditions in initialization.

While the initialization timing is appropriate, ensure that no other components try to access these resources before they're initialized.

src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (1)

106-106: LGTM! Consistent refactoring of Docker client access.

All Docker client access has been properly centralized through BuildAgentConfiguration.

Also applies to: 123-123, 137-137, 180-180, 361-361, 413-413, 419-419, 465-465

src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (4)

272-275: LGTM! Comprehensive error logging.

The error logging provides good visibility into the thread pool state during failures.


520-521: LGTM! Thread-safe resource cleanup.

The resource cleanup is properly synchronized and occurs at the right point in the pause sequence.


555-571: LGTM! Comprehensive resume implementation.

The resume sequence properly reinitializes resources, cleans up containers, and checks availability.


578-582: LGTM! Thorough availability checking.

The method properly checks all relevant metrics to determine node availability.

coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 11, 2024
coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 11, 2024
SimonEntholzer
SimonEntholzer previously approved these changes Dec 11, 2024
Copy link
Contributor

@SimonEntholzer SimonEntholzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works locally, docker doesn't break down any more 😄 🚀 Some tests need to be fixed still

@github-actions github-actions bot added tests programming Pull requests that affect the corresponding module labels Dec 11, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (2)

133-134: Consider using @MockBean instead of @SpyBean for BuildAgentConfiguration

Using @MockBean may be more appropriate if you are mocking the entire BuildAgentConfiguration class rather than spying on it. This can simplify the mocking process and prevent unintended interactions with the actual bean.


221-334: Simplify mock setup to enhance maintainability

The mockBuildAgentServices method contains repetitive code for mocking various Docker client commands. Refactoring the repeated patterns into helper methods or leveraging loops can reduce code duplication and improve readability.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between df6077f and 20654d7.

📒 Files selected for processing (2)
  • src/test/java/de/tum/cit/aet/artemis/programming/icl/TestBuildAgentConfiguration.java (0 hunks)
  • src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (5 hunks)
💤 Files with no reviewable changes (1)
  • src/test/java/de/tum/cit/aet/artemis/programming/icl/TestBuildAgentConfiguration.java
🧰 Additional context used
📓 Path-based instructions (1)
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (1)

Pattern src/test/java/**/*.java: test_naming: descriptive; test_size: small_specific; fixed_data: true; junit5_features: true; assert_use: assertThat; assert_specificity: true; archunit_use: enforce_package_rules; db_query_count_tests: track_performance; util_service_factory_pattern: true; avoid_db_access: true; mock_strategy: static_mocks; context_restart_minimize: true

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (1)

224-336: Consider breaking down the Docker client mock setup into smaller methods

While the mock setup is comprehensive and well-structured, the method is quite long and could be more maintainable if broken down into smaller, focused helper methods.

Consider refactoring like this:

@BeforeAll
protected static void mockDockerClient() throws InterruptedException {
    DockerClient dockerClient = mock(DockerClient.class);
-    // Mock dockerClient.inspectImageCmd(String dockerImage).exec()
-    InspectImageCmd inspectImageCmd = mock(InspectImageCmd.class);
-    // ... rest of the method
+    mockImageInspection(dockerClient);
+    mockImagePulling(dockerClient);
+    mockContainerOperations(dockerClient);
+    mockExecCommands(dockerClient);
+    mockContainerListing(dockerClient);
+    mockImageListing(dockerClient);
+    mockCleanupOperations(dockerClient);
    
    dockerClientMock = dockerClient;
}

+private static void mockImageInspection(DockerClient dockerClient) {
+    InspectImageCmd inspectImageCmd = mock(InspectImageCmd.class);
+    InspectImageResponse inspectImageResponse = new InspectImageResponse();
+    doReturn(inspectImageCmd).when(dockerClient).inspectImageCmd(anyString());
+    doReturn(inspectImageResponse).when(inspectImageCmd).exec();
+}
// ... similar methods for other Docker operations
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 20654d7 and 98342c5.

📒 Files selected for processing (1)
  • src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (1)

Pattern src/test/java/**/*.java: test_naming: descriptive; test_size: small_specific; fixed_data: true; junit5_features: true; assert_use: assertThat; assert_specificity: true; archunit_use: enforce_package_rules; db_query_count_tests: track_performance; util_service_factory_pattern: true; avoid_db_access: true; mock_strategy: static_mocks; context_restart_minimize: true

🔇 Additional comments (3)
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (3)

12-19: LGTM: Imports are well-organized and necessary

The new imports for Mockito matchers and Docker client classes are properly organized and all are utilized in the implementation.

Also applies to: 45-64


134-136: LGTM: BuildAgentConfiguration spy bean properly configured

The spy bean is correctly annotated and aligns with the PR's objective of managing Docker client reinitialization.


338-342: LGTM: Build agent services mock setup is well-implemented

The method is concise, properly annotated with @beforeeach, and correctly initializes the Docker client for each test.

This reverts commit bbd7591.
coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 11, 2024
SimonEntholzer
SimonEntholzer previously approved these changes Dec 11, 2024
Copy link
Contributor

@SimonEntholzer SimonEntholzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works, and tests pass

@krusche krusche changed the title Integrated code lifecycle: Reinitialize Docker client connection and ThreadPoolExecutor when pausing build agents Integrated code lifecycle: Improve reinitialization when pausing build agents Dec 12, 2024
@krusche krusche added this to the 7.7.5 milestone Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buildagent Pull requests that affect the corresponding module programming Pull requests that affect the corresponding module ready for review server Pull requests that update Java code. (Added Automatically!) tests
Projects
Status: Ready For Review
Development

Successfully merging this pull request may close these issues.

3 participants