-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrated code lifecycle
: Improve reinitialization when pausing build agents
#9939
base: develop
Are you sure you want to change the base?
Conversation
# Conflicts: # src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java
Warning Rate limit exceeded@BBesrour has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 15 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThe changes in this pull request involve significant modifications to several classes within the build agent module. The Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (10)
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (2)
Line range hint
175-182
: Restrict visibility ofcreateDockerClient()
methodThe method
createDockerClient()
is only used within theBuildAgentConfiguration
class. To follow the principle of least privilege, consider changing its access modifier frompublic
toprivate
.Apply this diff to change the method's visibility:
-public DockerClient createDockerClient() { +private DockerClient createDockerClient() {
Line range hint
183-200
: Handle unexpected memory units inparseMemoryString
The
parseMemoryString
method does not handle cases where memory strings have unexpected units or formats (e.g., missing quotes, uppercase letters). Consider adding validation or default cases to prevent potentialNumberFormatException
.Apply this diff to enhance robustness:
else { + try { return Long.parseLong(memoryString); + } catch (NumberFormatException e) { + throw new LocalCIException("Invalid memory string format: " + memoryString, e); + } }src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (2)
208-244
: Optimize executor service usage instopUnresponsiveContainer
Creating a new
ExecutorService
every timestopUnresponsiveContainer
is called may lead to resource overhead. Consider reusing a shared executor service or refactoring the method to avoid unnecessary executor creation.Apply this diff to use a shared executor service:
+private final ExecutorService executor = Executors.newSingleThreadExecutor(); public void stopUnresponsiveContainer(String containerId) { - try (ExecutorService executor = Executors.newSingleThreadExecutor()) { try { // Existing code... } finally { - executor.shutdown(); + // No need to shutdown the shared executor here } - } }Ensure the shared executor service is properly shut down when the application is stopped, possibly in a
@PreDestroy
method.
Line range hint
406-433
: Handle potential resource leaks inexecuteDockerCommand
In the
executeDockerCommand
method, if thedockerClient.execStartCmd
does not callonComplete
, theCountDownLatch
may not be decremented, leading to a potential indefinite wait inlatch.await()
. Consider adding a timeout or ensuring thatonComplete
is always called.Apply this diff to prevent indefinite blocking:
try { dockerClient.execStartCmd(execCreateCmdResponse.getId()).withDetach(detach).exec(new ResultCallback.Adapter<>() { @Override public void onNext(Frame item) { // Existing code... } + @Override + public void onError(Throwable throwable) { + latch.countDown(); + super.onError(throwable); + } + @Override + public void close() throws IOException { + latch.countDown(); + super.close(); + } }); } catch (ConflictException e) { throw new LocalCIException("Could not execute Docker command: " + String.join(" ", command), e); } try { - latch.await(); + if (!latch.await(60, TimeUnit.SECONDS)) { + throw new LocalCIException("Timeout while executing Docker command: " + String.join(" ", command)); + } } catch (InterruptedException e) { Thread.currentThread().interrupt(); throw new LocalCIException("Interrupted while executing Docker command: " + String.join(" ", command), e); }src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (3)
340-342
: Enhance exception logging inupdateLocalBuildAgentInformation
The catch block logs a generic error without specific information about the exception. Consider adding the exception to the log statement to aid in debugging.
Apply this diff to include the exception in the log:
catch (Exception e) { - log.error("Error while updating build agent information for agent with address {}", memberAddress); + log.error("Error while updating build agent information for agent with address {}", memberAddress, e); }
272-275
: Avoid logging sensitive or excessive informationThe log statement in the catch block logs the build job and executor service details, which may contain sensitive information or clutter the logs. Review the logged data to ensure it complies with logging policies.
Apply this diff to adjust the log statement:
- log.error("Couldn't add build job to thread pool: {}\n Concurrent Build Jobs Count: {} Active tasks in pool: {}, Concurrent Build Jobs Size: {}", buildJob, + log.error("Couldn't add build job to thread pool. Concurrent Build Jobs Count: {} Active tasks in pool: {}, Maximum Pool Size: {}", localProcessingJobs.get(), buildExecutorService.getActiveCount(), buildExecutorService.getMaximumPoolSize(), e);
578-582
: Simplify condition innodeIsAvailable()
methodThe condition in
nodeIsAvailable()
checks bothlocalProcessingJobs
andbuildExecutorService
active count against the maximum pool size. Since both represent the number of running tasks, consider simplifying the condition to avoid redundancy.Apply this diff to simplify the condition:
return localProcessingJobs.get() < buildExecutorService.getMaximumPoolSize() - && buildExecutorService.getActiveCount() < buildExecutorService.getMaximumPoolSize() && buildExecutorService.getQueue().isEmpty();
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (1)
Line range hint
193-199
: Preserve interrupt status when handlingInterruptedException
In the
finishBuildJobExceptionally
method, when catchingInterruptedException
, consider resetting the thread's interrupt status to ensure proper thread interruption handling.Apply this diff to reset the interrupt status:
if (containerId != null) { buildJobContainerService.stopUnresponsiveContainer(containerId); } + if (exception instanceof InterruptedException) { + Thread.currentThread().interrupt(); + } }src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildAgentDockerService.java (2)
325-327
: Handle potential exceptions during Docker image removalWhile catching
NotFoundException
is appropriate, consider catching other exceptions that might occur during image removal, such asConflictException
, and log them accordingly to ensure reliable cleanup.Apply this diff to handle additional exceptions:
try { buildAgentConfiguration.getDockerClient().removeImageCmd(dockerImage).exec(); } catch (NotFoundException e) { log.warn("Docker image {} not found during cleanup", dockerImage); +} catch (ConflictException e) { + log.warn("Conflict occurred while removing Docker image {}: {}", dockerImage, e.getMessage()); +} catch (Exception e) { + log.error("Unexpected error while removing Docker image {}: {}", dockerImage, e.getMessage()); }
Line range hint
348-385
: Avoid potential race conditions in disk space checkIn
checkUsableDiskSpaceThenCleanUp()
, the method does not lock the Docker client or the image list during the cleanup process. Consider adding synchronization to prevent race conditions when multiple threads might modify the Docker images concurrently.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (5)
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java
(8 hunks)src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildAgentDockerService.java
(8 hunks)src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java
(10 hunks)src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java
(4 hunks)src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java
(9 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildAgentDockerService.java (1)
Pattern src/main/java/**/*.java
: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (1)
Pattern src/main/java/**/*.java
: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (1)
Pattern src/main/java/**/*.java
: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports
src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (1)
Pattern src/main/java/**/*.java
: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (1)
Pattern src/main/java/**/*.java
: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java
Show resolved
Hide resolved
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (1)
Line range hint
1-290
: Consider adding resilience patternsGiven that this service manages distributed build jobs and Docker containers, consider implementing additional resilience patterns:
- Circuit breaker pattern for Docker client operations
- Retry mechanism for failed executor submissions
- Health check mechanism for the build executor service
This would help handle temporary failures and improve system stability during connection issues.
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (2)
208-245
: Consider extracting timeout values as configurable constantsThe implementation handles container stopping and killing gracefully with proper resource management. However, the timeout values (15s for stop, 10s for kill) could be made configurable.
+ private static final int CONTAINER_STOP_TIMEOUT_SECONDS = 15; + private static final int CONTAINER_KILL_TIMEOUT_SECONDS = 10; + private static final int CONTAINER_STOP_COMMAND_TIMEOUT_SECONDS = 20; public void stopUnresponsiveContainer(String containerId) { try (ExecutorService executor = Executors.newSingleThreadExecutor()) { try { Future<Void> future = executor.submit(() -> { - buildAgentConfiguration.getDockerClient().stopContainerCmd(containerId).withTimeout(15).exec(); + buildAgentConfiguration.getDockerClient().stopContainerCmd(containerId) + .withTimeout(CONTAINER_STOP_TIMEOUT_SECONDS).exec(); return null; }); - future.get(20, TimeUnit.SECONDS); + future.get(CONTAINER_STOP_COMMAND_TIMEOUT_SECONDS, TimeUnit.SECONDS); } // ... existing catch blocks ... try { Future<Void> killFuture = executor.submit(() -> { buildAgentConfiguration.getDockerClient().killContainerCmd(containerId).exec(); return null; }); - killFuture.get(10, TimeUnit.SECONDS); + killFuture.get(CONTAINER_KILL_TIMEOUT_SECONDS, TimeUnit.SECONDS); }
Line range hint
445-447
: Consider enhancing path validation patternThe current regex pattern might be too restrictive. Consider allowing additional valid path characters while maintaining security.
- if (path == null || path.contains("..") || !path.matches("[a-zA-Z0-9_*./-]+")) { + if (path == null || path.contains("..") || !path.matches("^[a-zA-Z0-9_\\-.*/@\\s]+$")) { throw new LocalCIException("Invalid path: " + path); }This pattern adds support for
@
and spaces while maintaining protection against path traversal.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java
(10 hunks)src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java
(6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (1)
Pattern src/main/java/**/*.java
: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (1)
Pattern src/main/java/**/*.java
: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports
🔇 Additional comments (7)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (4)
33-33
: LGTM: Clean dependency injection changes
The replacement of direct ExecutorService with BuildAgentConfiguration aligns with the PR objective of managing connection lifecycles centrally.
Also applies to: 51-51
86-88
: LGTM: Constructor follows dependency injection best practices
The constructor properly initializes the BuildAgentConfiguration field through constructor injection, maintaining clean dependency management.
Line range hint 170-180
: LGTM: Improved exception handling and logging
The enhanced exception handling for cancelled jobs and the addition of the exception parameter to log.error improves error tracking and debugging capabilities.
Also applies to: 237-237
147-148
:
Check for uninitialized executor service
The direct use of buildAgentConfiguration.getBuildExecutor()
without a null check could lead to a NullPointerException.
Apply this diff to add a null check:
+if (buildAgentConfiguration.getBuildExecutor() == null) {
+ throw new LocalCIException("Build executor service is not initialized.");
+}
future = buildAgentConfiguration.getBuildExecutor().submit(buildJob);
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (3)
62-62
: LGTM: Clean dependency injection of BuildAgentConfiguration
The refactoring to use BuildAgentConfiguration instead of direct DockerClient injection improves modularity and aligns with the PR's objective of managing Docker client lifecycle.
Also applies to: 80-83
400-403
: LGTM: Proper Docker command execution handling
The Docker command execution methods have been properly updated to use BuildAgentConfiguration while maintaining robust error handling and resource cleanup.
Also applies to: 406-439
452-454
: LGTM: Efficient container lookup implementation
The container listing and filtering implementation is clean and efficient, properly handling null cases and container name matching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java (2)
Line range hint
320-365
: Consider improving error handling and cleanup logicThe current implementation has several areas that could be enhanced:
- Extract the cleanup logic into a separate method to improve readability and maintainability
- Consider using a more structured approach for handling cleanup failures
- Implement proper exception chaining to preserve the root cause
Here's a suggested refactor:
finally { - buildJobContainerService.stopContainer(containerName); - - // Delete the cloned repositories - deleteCloneRepo(assignmentRepositoryUri, assignmentRepoCommitHash, buildJob.id(), assignmentRepositoryPath); - deleteCloneRepo(testRepositoryUri, assignmentRepoCommitHash, buildJob.id(), testsRepositoryPath); - // do not try to delete the temp repository if it does not exist or is the same as the assignment reposity - if (solutionRepositoryUri != null && !Objects.equals(assignmentRepositoryUri.repositorySlug(), solutionRepositoryUri.repositorySlug())) { - deleteCloneRepo(solutionRepositoryUri, assignmentRepoCommitHash, buildJob.id(), solutionRepositoryPath); - } - - for (int i = 0; i < auxiliaryRepositoriesUris.length; i++) { - deleteCloneRepo(auxiliaryRepositoriesUris[i], assignmentRepoCommitHash, buildJob.id(), auxiliaryRepositoriesPaths[i]); - } - - try { - deleteRepoParentFolder(assignmentRepoCommitHash, assignmentRepositoryPath, testRepoCommitHash, testsRepositoryPath); - } - catch (IOException e) { - msg = "Could not delete " + CHECKED_OUT_REPOS_TEMP_DIR + " directory"; - buildLogsMap.appendBuildLogEntry(buildJob.id(), msg); - log.error(msg, e); - } + cleanupResources(containerName, buildJob.id(), + new CleanupContext( + assignmentRepositoryUri, testRepositoryUri, solutionRepositoryUri, + auxiliaryRepositoriesUris, assignmentRepositoryPath, testsRepositoryPath, + solutionRepositoryPath, auxiliaryRepositoriesPaths, + assignmentRepoCommitHash, testRepoCommitHash + ) + ); } + private record CleanupContext( + VcsRepositoryUri assignmentRepositoryUri, VcsRepositoryUri testRepositoryUri, + VcsRepositoryUri solutionRepositoryUri, VcsRepositoryUri[] auxiliaryRepositoriesUris, + Path assignmentRepositoryPath, Path testsRepositoryPath, + Path solutionRepositoryPath, Path[] auxiliaryRepositoriesPaths, + String assignmentRepoCommitHash, String testRepoCommitHash + ) {} + + private void cleanupResources(String containerName, String buildJobId, CleanupContext ctx) { + List<Exception> suppressedExceptions = new ArrayList<>(); + + try { + buildJobContainerService.stopContainer(containerName); + } catch (Exception e) { + suppressedExceptions.add(e); + log.error("Failed to stop container {}", containerName, e); + } + + cleanupRepositories(buildJobId, ctx, suppressedExceptions); + + if (!suppressedExceptions.isEmpty()) { + LocalCIException ex = new LocalCIException("Cleanup failed with multiple errors"); + suppressedExceptions.forEach(ex::addSuppressed); + log.warn("Cleanup completed with {} errors", suppressedExceptions.size(), ex); + } + }
Line range hint
1-24
: Consider architectural improvements for better maintainabilityThe class has several areas that could benefit from architectural improvements:
- The class handles multiple concerns (build execution, Docker management, Git operations)
- The
runScriptAndParseResults
method has 12 parameters, making it hard to maintain- The logging pattern is duplicated across methods
Consider:
- Extracting Docker-specific operations into a dedicated
BuildDockerService
- Creating a
BuildContext
record to encapsulate build parameters- Implementing a proper logging facade to reduce duplication
Example of a
BuildContext
record:public record BuildContext( BuildJobQueueItem buildJob, String containerName, String containerId, RepositoryContext repositories, String assignmentRepoCommitHash, String testRepoCommitHash ) { public record RepositoryContext( VcsRepositoryUri assignmentUri, VcsRepositoryUri testUri, VcsRepositoryUri solutionUri, VcsRepositoryUri[] auxiliaryUris, Path assignmentPath, Path testsPath, Path solutionPath, Path[] auxiliaryPaths ) {} }Also applies to: 303-314
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java (1)
Pattern src/main/java/**/*.java
: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports
🔇 Additional comments (1)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java (1)
303-306
: LGTM: Well-structured logging additions
The new logging statements enhance observability by tracking critical steps in the build process, following the established logging pattern.
Also applies to: 311-314
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (1)
Line range hint
147-168
: Consider adding metrics for thread pool monitoring.The implementation is solid with good error handling and clear thread naming. Consider adding metrics to monitor thread pool health (queue size, active threads, etc.).
Add metrics collection:
+ private final MeterRegistry meterRegistry; + private ThreadPoolExecutor createBuildExecutor() { // ... existing code ... + // Add metrics + meterRegistry.gauge("build.executor.pool.size", threadPoolSize); + meterRegistry.gauge("build.executor.active.threads", + buildExecutor, ThreadPoolExecutor::getActiveCount); + meterRegistry.gauge("build.executor.queue.size", + buildExecutor, e -> e.getQueue().size()); return buildExecutor; }src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (1)
226-263
: Consider making timeouts configurable.The implementation is solid with good error handling and resource cleanup. However, the timeout values (20s for stop, 10s for kill) should be configurable.
Extract timeout values to configuration:
+ @Value("${artemis.continuous-integration.container.stop-timeout-seconds:20}") + private int containerStopTimeoutSeconds; + + @Value("${artemis.continuous-integration.container.kill-timeout-seconds:10}") + private int containerKillTimeoutSeconds; public void stopUnresponsiveContainer(String containerId) { try (ExecutorService executor = Executors.newSingleThreadExecutor()) { try { - future.get(20, TimeUnit.SECONDS); + future.get(containerStopTimeoutSeconds, TimeUnit.SECONDS); // ... - killFuture.get(10, TimeUnit.SECONDS); + killFuture.get(containerKillTimeoutSeconds, TimeUnit.SECONDS);
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (5)
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java
(8 hunks)src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java
(11 hunks)src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java
(2 hunks)src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java
(5 hunks)src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java
(9 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java
- src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java
🧰 Additional context used
📓 Path-based instructions (3)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (1)
Pattern src/main/java/**/*.java
: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (1)
Pattern src/main/java/**/*.java
: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (1)
Pattern src/main/java/**/*.java
: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports
🔇 Additional comments (12)
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (7)
45-49
: LGTM! Fields are properly encapsulated.
The new private fields provide good encapsulation for managing the core resources.
72-82
: LGTM! Clean getter implementations.
The getters follow Java conventions and provide controlled access to the private fields.
217-227
: LGTM! Proper resource cleanup implementation.
The method handles Docker client cleanup appropriately with proper null checks and error handling.
229-232
: LGTM! Clear coordination of resource cleanup.
The method provides a clean interface for shutting down both resources in a logical order.
234-237
: LGTM! Clean reinitialization implementation.
The method properly reinitializes both resources in a logical order.
203-215
:
Implement proper executor shutdown handling.
The current implementation doesn't properly handle incomplete shutdowns, as noted in a previous review.
Apply the previously suggested changes to ensure proper resource cleanup:
- if (buildExecutor != null && !buildExecutor.isShutdown()) {
- buildExecutor.shutdown();
- try {
- buildExecutor.awaitTermination(5, TimeUnit.SECONDS);
- }
- catch (InterruptedException e) {
- log.warn("Executor termination interrupted", e);
- }
- }
- buildExecutor = null;
+ if (buildExecutor != null && !buildExecutor.isShutdown()) {
+ buildExecutor.shutdown();
+ try {
+ buildExecutor.awaitTermination(5, TimeUnit.SECONDS);
+ if (!buildExecutor.isTerminated()) {
+ log.warn("Executor did not terminate in the specified time. Forcing shutdown now.");
+ buildExecutor.shutdownNow();
+ }
+ }
+ catch (InterruptedException e) {
+ log.warn("Executor termination interrupted", e);
+ // Preserve interrupt status
+ Thread.currentThread().interrupt();
+ }
+ buildExecutor = null;
+ }
66-70
: Consider potential race conditions in initialization.
While the initialization timing is appropriate, ensure that no other components try to access these resources before they're initialized.
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (1)
106-106
: LGTM! Consistent refactoring of Docker client access.
All Docker client access has been properly centralized through BuildAgentConfiguration.
Also applies to: 123-123, 137-137, 180-180, 361-361, 413-413, 419-419, 465-465
src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (4)
272-275
: LGTM! Comprehensive error logging.
The error logging provides good visibility into the thread pool state during failures.
520-521
: LGTM! Thread-safe resource cleanup.
The resource cleanup is properly synchronized and occurs at the right point in the pause sequence.
555-571
: LGTM! Comprehensive resume implementation.
The resume sequence properly reinitializes resources, cleans up containers, and checks availability.
578-582
: LGTM! Thorough availability checking.
The method properly checks all relevant metrics to determine node availability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works locally, docker doesn't break down any more 😄 🚀 Some tests need to be fixed still
20654d7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (2)
133-134
: Consider using@MockBean
instead of@SpyBean
forBuildAgentConfiguration
Using
@MockBean
may be more appropriate if you are mocking the entireBuildAgentConfiguration
class rather than spying on it. This can simplify the mocking process and prevent unintended interactions with the actual bean.
221-334
: Simplify mock setup to enhance maintainabilityThe
mockBuildAgentServices
method contains repetitive code for mocking various Docker client commands. Refactoring the repeated patterns into helper methods or leveraging loops can reduce code duplication and improve readability.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (2)
src/test/java/de/tum/cit/aet/artemis/programming/icl/TestBuildAgentConfiguration.java
(0 hunks)src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java
(5 hunks)
💤 Files with no reviewable changes (1)
- src/test/java/de/tum/cit/aet/artemis/programming/icl/TestBuildAgentConfiguration.java
🧰 Additional context used
📓 Path-based instructions (1)
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (1)
Pattern src/test/java/**/*.java
: test_naming: descriptive; test_size: small_specific; fixed_data: true; junit5_features: true; assert_use: assertThat; assert_specificity: true; archunit_use: enforce_package_rules; db_query_count_tests: track_performance; util_service_factory_pattern: true; avoid_db_access: true; mock_strategy: static_mocks; context_restart_minimize: true
...est/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (1)
224-336
: Consider breaking down the Docker client mock setup into smaller methodsWhile the mock setup is comprehensive and well-structured, the method is quite long and could be more maintainable if broken down into smaller, focused helper methods.
Consider refactoring like this:
@BeforeAll protected static void mockDockerClient() throws InterruptedException { DockerClient dockerClient = mock(DockerClient.class); - // Mock dockerClient.inspectImageCmd(String dockerImage).exec() - InspectImageCmd inspectImageCmd = mock(InspectImageCmd.class); - // ... rest of the method + mockImageInspection(dockerClient); + mockImagePulling(dockerClient); + mockContainerOperations(dockerClient); + mockExecCommands(dockerClient); + mockContainerListing(dockerClient); + mockImageListing(dockerClient); + mockCleanupOperations(dockerClient); dockerClientMock = dockerClient; } +private static void mockImageInspection(DockerClient dockerClient) { + InspectImageCmd inspectImageCmd = mock(InspectImageCmd.class); + InspectImageResponse inspectImageResponse = new InspectImageResponse(); + doReturn(inspectImageCmd).when(dockerClient).inspectImageCmd(anyString()); + doReturn(inspectImageResponse).when(inspectImageCmd).exec(); +} // ... similar methods for other Docker operations
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (1)
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java
(5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (1)
Pattern src/test/java/**/*.java
: test_naming: descriptive; test_size: small_specific; fixed_data: true; junit5_features: true; assert_use: assertThat; assert_specificity: true; archunit_use: enforce_package_rules; db_query_count_tests: track_performance; util_service_factory_pattern: true; avoid_db_access: true; mock_strategy: static_mocks; context_restart_minimize: true
🔇 Additional comments (3)
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (3)
12-19
: LGTM: Imports are well-organized and necessary
The new imports for Mockito matchers and Docker client classes are properly organized and all are utilized in the implementation.
Also applies to: 45-64
134-136
: LGTM: BuildAgentConfiguration spy bean properly configured
The spy bean is correctly annotated and aligns with the PR's objective of managing Docker client reinitialization.
338-342
: LGTM: Build agent services mock setup is well-implemented
The method is concise, properly annotated with @beforeeach, and correctly initializes the Docker client for each test.
This reverts commit bbd7591.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works, and tests pass
06ab517
Integrated code lifecycle
: Reinitialize Docker client connection and ThreadPoolExecutor when pausing build agentsIntegrated code lifecycle
: Improve reinitialization when pausing build agents
Checklist
General
Server
Changes affecting Programming Exercises
Motivation and Context
We sometimes get this error when a build agent has been running for some time
Description
We reinitialize Docker Client and ThreadPoolExecutor when pausing build agent.
Steps for Testing
Prerequisites:
Testserver States
Note
These badges show the state of the test servers.
Green = Currently available, Red = Currently locked
Click on the badges to get to the test servers.
Review Progress
Performance Review
Code Review
Manual Tests
Test Coverage
Screenshots
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Documentation