Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Extend DB for controls table and a mapper table to map controls… #42

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

leon-schmid
Copy link
Member

@leon-schmid leon-schmid commented Nov 20, 2024

This PR adds the following:

  • New Tables "controls" and "controls_workflows"
  • Adds new tests for the DB tables and necessary adjustment for it's use
  • New docker service for a testing database
  • some refactoring as we only support postgresql databases

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced new schemas for controls and workflows, enhancing the database structure.
    • Added a new PostgreSQL service for unit testing in the Docker configuration.
    • Implemented new relationships in the user schema to link controls and workflows.
  • Bug Fixes

    • Simplified database URL handling, improving configuration management.
  • Tests

    • Added comprehensive tests for control and workflow mappings, including cascade delete functionality.
  • Documentation

    • Updated environment variable examples to include test database configurations.

Copy link

coderabbitai bot commented Nov 20, 2024

Walkthrough

The pull request introduces significant changes across multiple files, primarily focusing on enhancements to database configuration and schema management within the Admyral project. Key modifications include the addition of new environment variables for test databases, updates to the handling of database URLs, and the introduction of new schemas and relationships for controls and workflows. Additionally, a new PostgreSQL service configuration for unit testing is added in the Docker setup, along with various test functions to validate the new functionalities.

Changes

File Change Summary
admyral/config/config.py Added constants ENV_ADMYRAL_TEST_DATABASE_URL and ADMYRAL_TEST_DATABASE_URL, updated ADMYRAL_DATABASE_URL, and modified GlobalConfig to include test_database_url. Simplified database URL handling by removing error checks.
admyral/db/admyral_store.py Updated AdmyralStore constructor to accept database_url, added clean_up_controls_data method, and modified create_store method to accept an optional database_url.
admyral/db/alembic/database_manager.py Changed DatabaseManager constructor to accept database_url, removed self.config, and simplified database handling methods by eliminating checks for database types.
admyral/db/schemas/init.py Added new schemas: ControlSchema and ControlsWorkflowsMappingSchema.
admyral/db/schemas/auth_schemas.py Added controls relationship in UserSchema linking to ControlSchema.
admyral/db/schemas/control_schemas.py Introduced ControlSchema and ControlsWorkflowsMappingSchema classes with defined relationships and methods for model conversion.
admyral/db/schemas/workflow_control_results_schemas.py Updated workflow relationship in WorkflowControlResultsSchema to use back_populates="control_results".
admyral/db/schemas/workflow_schemas.py Replaced controls relationship with control_workflow_mappings and control_results, linking to new schemas.
admyral/models/init.py Added imports for Control and ControlsWorkflowsMapping, updated __all__ list.
admyral/models/control.py Introduced Control and ControlsWorkflowsMapping classes with attributes for data validation.
docker-compose.dev.yml Added new service postgresql-unit-tests for unit testing, with specific environment variables and port mappings.
tests/conftest.py Updated store fixture to require database_url, added cleanup call for controls data.
tests/db/test_controls_workflows_mapping.py Introduced multiple test functions to validate control and workflow mappings, including setup and cascading delete tests.
.env.example Added new environment variable ADMYRAL_TEST_DATABASE_URL for testing purposes.
admyral/db/alembic/versions/3967e08c0051_add_controls_and_controls_workflows_.py Added migration script to create controls and controls_workflows tables with necessary constraints and relationships.

Possibly related PRs

  • New actions indicating success/failure of a control #41: The changes in the main PR regarding the addition of new constants and modifications to database URL handling may relate to the new actions introduced in this PR, as both involve enhancements to control management and database interactions within the Admyral project.

Suggested reviewers

  • danielgrittner

Poem

🐇 In the garden of code, we hop and play,
New schemas and tests brighten the day.
With controls and workflows, our paths intertwine,
In the world of Admyral, all functions align!
So let’s raise a cheer for the changes we see,
A hop, skip, and jump to a better DB! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

vercel bot commented Nov 20, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
admyral-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 27, 2024 7:35pm

# primary keys
control_id: int = Field(
primary_key=True,
index=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to build an index on a PK

Comment on lines 77 to 78
)
op.drop_column("workflow_controls_results", "control_id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here? I can't see any removal in the diff

Comment on lines 77 to 78
# foreign keys
user_id: str = Field(sa_type=TEXT(), index=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make user_id part of the primary key, so we can easily swap it with an org_id/workspace_id later

),
sa.PrimaryKeyConstraint("control_id", "user_id", "workflow_id"),
)
op.drop_column("workflow_controls_results", "control_id")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? The workflow_controls_results table does not contain a control_id column (see here)

from sqlalchemy import text

# Use SQLite for testing
DATABASE_URL = "sqlite:///:memory:"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit tests already expects that our postgres DB is running. I would suggest to use the postgres instead of sqlite. We should then also generalize the session fixture to use the same one across tests. See tests/workers/conftest.py for the fixture used by the other tests.

Comment on lines 114 to 137
# Create a control
control = ControlSchema(
control_name="Test Control",
control_description="A test control",
user_id="user123",
)
session.add(control)

# Create a workflow
workflow = WorkflowSchema(
workflow_id="workflow123",
workflow_name="Test Workflow",
user_id="user123",
workflow_dag={},
is_active=True,
)
session.add(workflow)
session.commit()

# Create the mapping
mapping = ControlsWorkflowsMappingSchema(
control_id=control.control_id, workflow_id=workflow.workflow_id
)
session.add(mapping)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lets combine the test_create_control_workflow_mapping and test_cascade_delete_workflow test because this here is code duplication and there's no need to test it twice imo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative, we could factor this out in a helper function for setting up the scenario.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Outside diff range and nitpick comments (12)
admyral/models/control.py (2)

1-2: Consider adding validation imports for robust data handling.

Since this file defines data models that will interact with a database, consider importing additional Pydantic validation tools:

-from pydantic import BaseModel
+from pydantic import BaseModel, Field, validator, constr

This would allow you to:

  • Add length constraints on strings
  • Validate control_id as positive
  • Add custom validation logic if needed

1-12: Consider architectural improvements for better maintainability.

Given that this is part of a larger database extension:

  1. Consider creating a base class for common validation logic:
class AdmyralBaseModel(BaseModel):
    """Base class for all Admyral models with common validation logic."""
    class Config:
        validate_assignment = True
        extra = "forbid"
  1. Document the relationship between Pydantic models and SQLModel schemas:
"""
This module defines Pydantic models for data validation.
These models are paired with SQLModel schemas in admyral/db/schemas/control_schemas.py
for database operations.

Note: These models should be kept in sync with their corresponding SQLModel schemas.
"""

Would you like me to create a GitHub issue to track these architectural improvements?

tests/conftest.py (1)

Line range hint 9-22: Well-structured fixture with appropriate scope and cleanup.

The session-scoped fixture with proper event loop handling and sequential cleanup is a good design. Consider adding error handling around the cleanup operations to ensure they execute even if tests fail.

Consider wrapping the cleanup in a try-finally block:

@pytest.fixture(scope="session", autouse=True)
async def store(event_loop):
    store = await AdmyralStore.create_store(database_url=CONFIG.test_database_url)
    yield store
-    await store.clean_up_workflow_data_of(TEST_USER_ID)
-    await store.clean_up_controls_data()
+    try:
+        yield store
+    finally:
+        await store.clean_up_workflow_data_of(TEST_USER_ID)
+        await store.clean_up_controls_data()
admyral/db/schemas/workflow_control_results_schemas.py (1)

35-35: Document the relationship purpose

Consider adding a docstring to explain the relationship's purpose and the one-to-many nature between workflows and their control results.

-    workflow: "WorkflowSchema" = Relationship(back_populates="control_results")
+    workflow: "WorkflowSchema" = Relationship(
+        back_populates="control_results",
+        description="Reference to the parent workflow that owns these control results"
+    )
docker-compose.dev.yml (2)

30-35: Consider optimizing health check for faster test feedback.

The current health check configuration might be too conservative for a test database. Consider reducing the intervals for faster feedback during testing.

     healthcheck:
       test: ["CMD-SHELL", "pg_isready", "-d", "admyral"]
-      interval: 30s
-      timeout: 60s
-      retries: 5
-      start_period: 80s
+      interval: 10s
+      timeout: 20s
+      retries: 3
+      start_period: 30s

23-29: Add resource constraints for test database.

To prevent the test database from consuming excessive resources during CI/CD pipelines or local development, consider adding memory and CPU limits.

   postgresql-unit-tests:
     container_name: postgresql-unit-tests
+    deploy:
+      resources:
+        limits:
+          memory: 512M
+          cpus: '0.5'
     environment:
admyral/db/alembic/versions/8fbcc7f42a0b_add_controls_and_controls_workflows_.py (1)

1-85: Consider additional architectural improvements

While the basic schema is sound, consider these architectural suggestions:

  1. Add appropriate CHECK constraints for data validation (e.g., non-empty control_name)
  2. Consider adding a status column to track control lifecycle
  3. Consider adding metadata columns for audit purposes (e.g., last_modified_by)

These suggestions would make the schema more robust and future-proof. Would you like me to provide specific implementation details for any of these improvements?

admyral/db/alembic/database_manager.py (2)

Line range hint 13-13: Address critical TODOs

Several TODO comments need attention, particularly those related to:

  1. Filtering of alembic_version table (security implications?)
  2. Database drop functionality (important for testing)
  3. Database migration safety (backup requirements)

Would you like me to help create GitHub issues to track these TODOs and propose implementations for them?

Also applies to: 73-75, 89-93


Line range hint 26-126: Consider splitting database management and migration concerns

The DatabaseManager class currently handles both database administration (creation, existence checks) and schema migrations. Consider splitting these into separate classes following the Single Responsibility Principle:

  • DatabaseAdministrator: For database creation/deletion
  • MigrationManager: For handling schema migrations

This separation would:

  1. Improve testability
  2. Make the code more maintainable
  3. Allow for better error handling specific to each concern
  4. Make it easier to modify one aspect without affecting the other

Would you like me to propose a detailed design for this separation?

tests/db/test_controls_workflows_mapping.py (1)

17-19: Consider improving the control ID generation strategy.

The current implementation uses a fixed range (1-999999) which could potentially conflict with real control IDs in production. Consider:

  1. Using a dedicated range for test control IDs
  2. Adding validation for negative numbers
 def generate_random_control_id() -> int:
-    return random.randint(1, 999999)
+    TEST_CONTROL_ID_MIN = 1_000_000  # Dedicated range for test IDs
+    TEST_CONTROL_ID_MAX = 1_999_999
+    control_id = random.randint(TEST_CONTROL_ID_MIN, TEST_CONTROL_ID_MAX)
+    if control_id <= 0:
+        raise ValueError("Control ID must be positive")
+    return control_id
admyral/config/config.py (1)

Line range hint 118-154: Consider environment-specific configuration management

The current approach of maintaining separate production and test database URLs might benefit from a more structured environment-based configuration system. This would be particularly valuable as the project grows and potentially needs to support more environments (staging, development, etc.).

Consider implementing an environment-based configuration system:

  1. Define an Environment enum (PRODUCTION, TEST, STAGING, etc.)
  2. Create environment-specific configuration classes
  3. Use factory pattern to load appropriate configuration based on ADMYRAL_ENV

This would make it easier to:

  • Manage environment-specific settings
  • Validate configurations per environment
  • Add new environments in the future
  • Test configuration loading logic
admyral/db/schemas/control_schemas.py (1)

64-64: Remove redundant UniqueConstraint on primary key columns.

The UniqueConstraint on ("user_id", "control_id") is redundant because these fields are already defined as primary keys in ControlSchema. Primary key columns are inherently unique, so this constraint is unnecessary.

Apply this diff to remove the redundant unique constraint:

 __table_args__ = (
-    UniqueConstraint("user_id", "control_id", name="unique_control_id"),
     ForeignKeyConstraint(
         ["user_id"],
         ["User.id"],
         ondelete="CASCADE",
     ),
 )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between de332e2 and 8c52142.

📒 Files selected for processing (14)
  • admyral/config/config.py (2 hunks)
  • admyral/db/admyral_store.py (5 hunks)
  • admyral/db/alembic/database_manager.py (2 hunks)
  • admyral/db/alembic/versions/8fbcc7f42a0b_add_controls_and_controls_workflows_.py (1 hunks)
  • admyral/db/schemas/__init__.py (2 hunks)
  • admyral/db/schemas/auth_schemas.py (2 hunks)
  • admyral/db/schemas/control_schemas.py (1 hunks)
  • admyral/db/schemas/workflow_control_results_schemas.py (1 hunks)
  • admyral/db/schemas/workflow_schemas.py (2 hunks)
  • admyral/models/__init__.py (2 hunks)
  • admyral/models/control.py (1 hunks)
  • docker-compose.dev.yml (1 hunks)
  • tests/conftest.py (2 hunks)
  • tests/db/test_controls_workflows_mapping.py (1 hunks)
🔇 Additional comments (24)
admyral/models/control.py (1)

10-12: 🛠️ Refactor suggestion

Add validation and documentation for the mapping model.

The mapping model needs:

  1. Documentation explaining its purpose
  2. Validation for workflow_id format
  3. Potential unique constraint consideration

Consider this implementation:

+"""
+Maps the relationship between security controls and workflows.
+
+This mapping table establishes many-to-many relationships between
+controls and workflows, allowing a single control to be associated
+with multiple workflows and vice versa.
+
+Attributes:
+    control_id: Reference to the Control entity
+    workflow_id: Reference to the Workflow entity
+"""
 class ControlsWorkflowsMapping(BaseModel):
-    control_id: int
-    workflow_id: str
+    control_id: int = Field(gt=0, description="Reference to the Control entity")
+    workflow_id: str = Field(
+        regex="^[a-zA-Z0-9_-]+$",
+        min_length=1,
+        max_length=255,
+        description="Reference to the Workflow entity"
+    )
+
+    class Config:
+        schema_extra = {
+            "example": {
+                "control_id": 1,
+                "workflow_id": "workflow_123"
+            }
+        }

Let's verify the workflow_id format in existing code:

tests/conftest.py (3)

3-3: LGTM: Import statement is correctly placed and necessary.

The import of CONFIG is required for accessing the test database configuration.


21-21: Verify controls cleanup implementation.

The addition of clean_up_controls_data() is good for test isolation, but let's verify its implementation.

✅ Verification successful

Based on the results, I can generate the final response as we have sufficient information about the implementation:

Controls cleanup implementation is properly handled

The cleanup implementation in admyral/db/admyral_store.py follows good practices:

  • Uses transaction handling via async context manager
  • Cleans up all related tables in a logical order
  • Commits the transaction after all deletions
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the controls cleanup implementation

# Check the implementation of clean_up_controls_data
ast-grep --pattern 'clean_up_controls_data($$$) {
  $$$
}'

# Look for transaction handling in the cleanup
rg -A 10 "clean_up_controls_data" 

# Check for related schema definitions
ast-grep --pattern 'class $_(Schema|Table)($$$) {
  $$$
}'

Length of output: 1688


18-18: Verify test database configuration setup.

The store creation now depends on CONFIG.test_database_url. Let's ensure this configuration is properly set up.

✅ Verification successful

Test database configuration is properly set up

The verification shows that the test database configuration is correctly implemented:

  • The ADMYRAL_TEST_DATABASE_URL environment variable is properly defined in admyral/config/config.py
  • A dedicated PostgreSQL test database service (postgresql-unit-tests) is configured in Docker, running on port 5433
  • The configuration follows the project's transition to PostgreSQL support
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the test database configuration setup

# Check if CONFIG.test_database_url is defined in the config file
rg -A 5 "test_database_url" 

# Check for any existing test database configuration in environment files
fd -e .env -e .env.example -e .env.test --exec cat {} \; 

# Look for Docker test database service configuration
fd -g "*docker*" --exec grep -l "test.*database" {} \;

Length of output: 956


Script:

#!/bin/bash
# Let's check for environment variable definitions and their usage

# Check for environment variable definitions
rg "ADMYRAL_TEST_DATABASE_URL"

# Check for any test database configuration in Docker files
fd -e yml -e yaml -g "*docker*" --exec cat {} \;

# Look for any test database setup in test files
rg -g "test_*.py" "database.*setup|database.*config"

Length of output: 10391

admyral/db/schemas/__init__.py (2)

22-25: LGTM! Clean and well-structured import statement.

The new imports follow the established pattern and are logically grouped with other schema imports.


43-44: LGTM! Proper exposure of new schemas.

The new schemas are correctly added to the __all__ list, maintaining alphabetical order and following the module's public interface pattern.

Let's verify the schema definitions exist:

✅ Verification successful

Schema definitions are properly implemented and imported

The verification confirms that:

  • Both ControlSchema and ControlsWorkflowsMappingSchema are properly defined in admyral/db/schemas/control_schemas.py
  • The schemas are correctly imported by other modules (auth_schemas.py and workflow_schemas.py)
  • The schemas follow the project's pattern by inheriting from BaseSchema with table=True
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the control schema definitions exist and are properly defined

# Check for the existence and content of control_schemas.py
ast-grep --pattern 'class ControlSchema:
  $$$
'

ast-grep --pattern 'class ControlsWorkflowsMappingSchema:
  $$$
'

# Verify no circular imports exist
rg -l "from admyral.db.schemas import.*ControlSchema" | grep -v "__init__.py"

Length of output: 196


Script:

#!/bin/bash
# Let's try a different approach to locate these schema definitions

# Search for files containing these schema classes
fd -e py . | xargs rg "class (ControlSchema|ControlsWorkflowsMappingSchema)"

# Look for any imports of these schemas in the codebase
rg "from.*import.*(ControlSchema|ControlsWorkflowsMappingSchema)"

# Check content of potential schema files
fd "control.*schema.*\.py" -I -i

Length of output: 774

docker-compose.dev.yml (1)

23-43: LGTM! Well-structured test database configuration.

The new PostgreSQL service for unit tests is well-configured with:

  • Proper isolation from production (separate ports, volumes)
  • Consistent environment variable handling
  • Appropriate health checks
admyral/db/alembic/versions/8fbcc7f42a0b_add_controls_and_controls_workflows_.py (1)

80-85: Verify referenced table dependencies

The migration references the User and workflows tables. Let's verify these dependencies exist.

admyral/models/__init__.py (2)

66-66: LGTM! Import statement follows project conventions.

The new import for control-related models is properly placed and follows the established import pattern.


132-133: Verify readiness for public API exposure.

The new control-related models are being added to the public API. Since this is marked as WIP, please ensure that:

  1. The model interfaces are stable and well-documented
  2. All necessary validation rules are in place
  3. The models properly handle all edge cases

Let's verify the model definitions:

✅ Verification successful

Models are well-defined and ready for public API exposure

The Control-related models are properly implemented with:

  • Clear and simple interfaces in admyral/models/control.py
  • Comprehensive database schemas with proper constraints and relationships
  • Extensive test coverage validating core functionality
  • Proper cascade delete behavior and relationship handling
  • Working integration with the workflow system
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the completeness of Control models
# Expected: Find model definitions with proper documentation and validation

# Check for model class definitions and their docstrings
ast-grep --pattern 'class Control:
    $$$
' 

# Look for validation rules
rg -A 5 'validator|Field|constraint' admyral/models/control.py

Length of output: 111


Script:

#!/bin/bash
# Let's try a different approach to find the model definitions

# First, let's find the relevant model files
fd "control.*\.py$"

# Then let's examine their content
fd "control.*\.py$" -x cat {}

# Also check for any test files that might give us insights into the model usage
fd "test.*control.*\.py$" -x cat {}

Length of output: 21993

admyral/db/schemas/workflow_schemas.py (3)

17-17: LGTM! Clean import organization

The new imports are well-organized and follow good practices, particularly the use of TYPE_CHECKING for avoiding circular imports.

Also applies to: 25-25


71-78: Verify impact on existing code

The relationship changes (replacing direct controls with control_workflow_mappings, control_results, and controls) might impact existing code that relies on these relationships.

✅ Verification successful

No impact on existing code from relationship changes

The verification shows that the existing usage of workflow.controls in graph_conversion.py refers to workflow_dag.controls, which is a different attribute from the SQLModel relationship being modified. The relationship changes in workflow_schemas.py are safe as there are no direct usages of these SQLModel relationships in queries or joins across the codebase.

  • The only match found in admyral/editor/graph_conversion.py is accessing workflow.workflow_dag.controls, not the SQLModel relationship
  • No SQL joins or queries were found that rely on these relationships
  • The new relationship structure properly maintains the connections through ControlsWorkflowsMappingSchema
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for existing usage of workflow.controls
rg -l "workflow.*\.controls" --type py

# Search for potential ORM queries using these relationships
rg -l "\.join.*controls.*workflow" --type py

Length of output: 122


Script:

#!/bin/bash
# Let's check the actual usage context in graph_conversion.py
rg -A 5 -B 5 "workflow.*\.controls" admyral/editor/graph_conversion.py

# Let's also check for any other potential relationships between workflows and controls
ast-grep --pattern 'workflow.$_: list[$_] = Relationship($$$)'

# Check for any direct SQL or ORM queries involving workflows and controls
rg -A 5 "from.*workflow.*import|select.*workflow.*control|where.*workflow.*control" --type py

Length of output: 16209


77-78: Verify the corresponding relationship in ControlSchema

The relationship looks correct, but we should verify that ControlSchema has a matching workflows relationship with the same link_model.

✅ Verification successful

The relationship is correctly defined in both schemas

The workflows relationship in ControlSchema is properly set up with back_populates="controls" and link_model=ControlsWorkflowsMappingSchema, which matches exactly with the corresponding relationship in WorkflowSchema. The many-to-many relationship between workflows and controls is correctly implemented on both sides.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for the corresponding relationship in ControlSchema
ast-grep --pattern 'class ControlSchema {
  $$$
  workflows: list["WorkflowSchema"] = Relationship(
    back_populates="controls",
    link_model=ControlsWorkflowsMappingSchema
  )
  $$$
}'

Length of output: 191


Script:

#!/bin/bash
# Let's first find the ControlSchema definition
rg "class ControlSchema" -A 10

Length of output: 746


Script:

#!/bin/bash
# Let's check the entire ControlSchema file for any relationship definitions
cat admyral/db/schemas/control_schemas.py

Length of output: 2918

admyral/db/schemas/auth_schemas.py (2)

16-16: LGTM: Import statement follows project conventions

The import statement is properly placed with other schema imports and follows the project's import style.


57-59: Relationship configuration looks good, verify migration strategy

The relationship configuration follows the established pattern and correctly implements cascading delete for user-owned controls. However, since this is a WIP PR adding a new table relationship, please ensure:

  1. A migration plan exists for handling existing users
  2. The corresponding Control table is properly set up with the reciprocal relationship

Let's verify the migration and control table setup:

tests/db/test_controls_workflows_mapping.py (3)

72-72: Verify database and session configuration.

Based on previous review comments:

  1. Ensure tests are using PostgreSQL instead of SQLite
  2. Consider using the shared session fixture from tests/workers/conftest.py

Let's verify the current configuration:

✅ Verification successful

Based on the shell script results, I can now generate the final response:

Database and session configuration is properly set up

The test is correctly configured:

  • Using PostgreSQL via CONFIG.test_database_url in the session-scoped store fixture
  • Properly utilizing AdmyralStore and its async session maker
  • Following the project's database testing patterns with proper cleanup
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check database configuration
rg -A 5 "async_session_maker|AdmyralStore" --type py

# Check session fixture usage
rg -A 5 "pytest.fixture.*session" --type py

Length of output: 19508


94-115: 🛠️ Refactor suggestion

Refactor test cases to reduce duplication and improve coverage.

Several improvements are needed:

  1. The setup code is duplicated between tests

  2. Cleanup should be consistent across all tests

  3. Manual cascade delete could be replaced with database CASCADE

  4. Missing negative test cases

  5. Add a fixture for cleanup:

@pytest.fixture(autouse=True)
async def cleanup(store: AdmyralStore):
    yield
    async with store.async_session_maker() as session:
        await store.clean_up_workflow_data_of(TEST_USER_ID)
        await store.clean_up_controls_data()
  1. Add negative test cases:
@pytest.mark.asyncio
async def test_create_control_workflow_mapping_invalid_ids(store: AdmyralStore):
    async with store.async_session_maker() as session:
        with pytest.raises(IntegrityError):
            mapping = ControlsWorkflowsMappingSchema(
                control_id=-1,  # Invalid ID
                workflow_id="invalid",
                user_id=TEST_USER_ID,
            )
            session.add(mapping)
            await session.commit()
  1. Use database CASCADE instead of manual deletion:

Let's verify if CASCADE is configured:

Also applies to: 117-176


38-50: Add validation for workflow DAG structure.

The workflow DAG is initialized as an empty dictionary without any schema validation. This could lead to invalid DAG structures in production.

Let's verify the DAG schema requirements:

admyral/config/config.py (2)

118-118: LGTM!

The new environment variable constant follows the established naming convention and is appropriately placed with other database-related constants.


154-154: LGTM!

The new test_database_url attribute is well-integrated into the GlobalConfig class, maintaining consistency with other database-related configurations.

admyral/db/admyral_store.py (3)

42-43: LGTM: New schema imports follow conventions

The new schema imports for controls are properly organized with other schema imports and follow the established naming conventions.


95-98: LGTM: Backward compatible method signature update

The create_store method changes maintain backward compatibility by making database_url optional and defaulting to CONFIG.database_url. This is a good approach for supporting both new and existing code.


79-83: Verify the impact of constructor signature change

The constructor signature has changed from accepting GlobalConfig to database_url, which is a breaking change. This could affect existing code that instantiates AdmyralStore.

✅ Verification successful

Let's examine the test files to understand the usage patterns and verify the impact.


Let's try a different approach to find the usage patterns.


Let's check one more thing to see how the store is being created.


Based on the evidence from the test files, I can now provide a final response:

Constructor change is properly handled across the codebase

The change from GlobalConfig to direct database_url parameter is safe because:

  • All instantiations in tests use the create_store factory method with database_url parameter
  • The main instantiation pattern seen in conftest.py uses AdmyralStore.create_store(database_url=CONFIG.test_database_url)
  • No direct instantiations using the old GlobalConfig parameter were found in the codebase
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all direct instantiations of AdmyralStore to verify impact
# Test: Search for AdmyralStore instantiations. Expect: Instances using the new signature.

# Search for direct instantiations
rg -p "AdmyralStore\(" --type py

# Search for potential test files that might need updates
fd -e py -x grep -l "AdmyralStore" {} \; | grep "test"

Length of output: 333


Script:

#!/bin/bash
# Check test files for AdmyralStore instantiation patterns
rg -p "AdmyralStore\(" --type py -A 3 ./tests/

Length of output: 48


Script:

#!/bin/bash
# Search for AdmyralStore usage in test files with more context
for file in tests/workers/utils.py tests/workers/test_workflow_executor.py tests/conftest.py tests/compiler/test_workflow_compiler.py tests/db/test_controls_workflows_mapping.py; do
    echo "=== $file ==="
    cat "$file" | grep -A 5 -B 5 "AdmyralStore"
done

# Also check if there are any imports of GlobalConfig in the main file
rg "GlobalConfig" admyral/db/admyral_store.py

Length of output: 11675


Script:

#!/bin/bash
# Check the create_store method implementation
ast-grep --pattern 'async def create_store' -A 10 admyral/db/admyral_store.py

Length of output: 79

admyral/db/schemas/control_schemas.py (1)

67-67: Verify the correctness of the foreign key reference to User.id.

Ensure that the foreign key constraint references the correct table and column. If the actual table name is users or differs from User, this could lead to database errors.

Please confirm that "User.id" correctly references the primary key of the User table.

admyral/models/control.py Show resolved Hide resolved
admyral/db/schemas/workflow_schemas.py Show resolved Hide resolved
tests/db/test_controls_workflows_mapping.py Show resolved Hide resolved
admyral/config/config.py Show resolved Hide resolved
admyral/config/config.py Show resolved Hide resolved
admyral/db/admyral_store.py Outdated Show resolved Hide resolved
admyral/db/schemas/control_schemas.py Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (8)
admyral/db/alembic/versions/1239762f7ef5_add_controls_and_controls_workflows_.py (1)

39-42: Consider adding constraints and optimizing column types.

A few suggestions to improve the table design:

  1. Consider making control_id auto-incrementing (SERIAL)
  2. Consider adding length limits to TEXT columns to prevent excessive storage usage
-        sa.Column("control_id", sa.Integer(), nullable=False),
-        sa.Column("user_id", sa.TEXT(), nullable=False),
-        sa.Column("control_name", sa.TEXT(), nullable=False),
-        sa.Column("control_description", sa.TEXT(), nullable=False),
+        sa.Column("control_id", sa.Integer(), sa.Sequence('control_id_seq'), nullable=False),
+        sa.Column("user_id", sa.String(36), nullable=False),
+        sa.Column("control_name", sa.String(255), nullable=False),
+        sa.Column("control_description", sa.String(1000), nullable=False),
admyral/db/admyral_store.py (2)

80-84: Consider making SQL statement logging configurable

The SQLAlchemy engine is created with echo=True, which logs all SQL statements. This might generate excessive logs in production.

Consider making this configurable:

-    def __init__(self, database_url: str) -> None:
+    def __init__(self, database_url: str, debug_sql: bool = False) -> None:
         self.database_url = database_url
         self.engine = create_async_engine(
-            database_url, echo=True, future=True, pool_pre_ping=True
+            database_url, echo=debug_sql, future=True, pool_pre_ping=True
         )

1085-1102: Optimize transaction management for multiple delete operations

The method performs multiple delete operations with separate commits. This can be optimized by using a single transaction.

Consider this optimization:

     async def clean_up_controls_data(self, user_id: str) -> None:
         """Delete all data from controls-related tables. Should only be used in testing environments."""
         async with self._get_async_session() as db:
-            # Delete workflow control results where the workflow belongs to the user
-            await db.exec(
-                delete(WorkflowControlResultsSchema).where(
-                    WorkflowControlResultsSchema.workflow_id.in_(
-                        sa_select(WorkflowSchema.workflow_id).where(
-                            WorkflowSchema.user_id == user_id
-                        )
-                    )
-                )
-            )
-            await db.exec(
-                delete(ControlsWorkflowsMappingSchema).where(
-                    ControlsWorkflowsMappingSchema.user_id == user_id
-                )
-            )
-            await db.exec(delete(ControlSchema).where(ControlSchema.user_id == user_id))
-            await db.commit()
+            try:
+                # Delete workflow control results where the workflow belongs to the user
+                await db.exec(
+                    delete(WorkflowControlResultsSchema).where(
+                        WorkflowControlResultsSchema.workflow_id.in_(
+                            sa_select(WorkflowSchema.workflow_id).where(
+                                WorkflowSchema.user_id == user_id
+                            )
+                        )
+                    )
+                )
+                await db.exec(
+                    delete(ControlsWorkflowsMappingSchema).where(
+                        ControlsWorkflowsMappingSchema.user_id == user_id
+                    )
+                )
+                await db.exec(delete(ControlSchema).where(ControlSchema.user_id == user_id))
+                await db.commit()
+            except Exception:
+                await db.rollback()
+                raise
tests/db/test_controls_workflows_mapping.py (5)

17-23: Add type annotations to helper functions for clarity

Consider adding type annotations to the generate_random_control_id and generate_random_workflow_id functions to improve code readability and facilitate static type checking.

Apply this diff:

 def generate_random_control_id() -> int:
     return random.randint(1, 999999)


 def generate_random_workflow_id() -> str:
     return f"workflow-{uuid.uuid4()}"

25-35: Add type annotations to create_control function

Adding type annotations to the create_control function parameters enhances code readability and helps with static analysis tools.

Apply this diff:

+from sqlmodel import AsyncSession

 async def create_control(session: AsyncSession, control_id: int = 1):
     control = ControlSchema(
         control_name="Test Control",
         control_description="A test control",

Ensure that AsyncSession is imported appropriately.


38-49: Add type annotations to create_workflow function

Similarly, add type annotations to the create_workflow function for clarity.

Apply this diff:

+from sqlmodel import AsyncSession

 async def create_workflow(session: AsyncSession, workflow_id: str = "workflow123"):
     workflow = WorkflowSchema(
         workflow_id=workflow_id,
         workflow_name=f"Test Workflow-{workflow_id}",

Ensure that AsyncSession is imported appropriately.


118-176: Leverage cascading deletes to simplify test_cascade_delete_workflow

Currently, test_cascade_delete_workflow manually deletes related entries before deleting the workflow. If the database schema is configured with proper cascading deletes, you can simplify the test by deleting the workflow directly and asserting that related entries are automatically removed. This ensures that the cascading behavior is correctly implemented and tested.

For example, you can modify the test as follows:

  • Delete the workflow first.
  • Commit the session.
  • Verify that related entries are automatically deleted.
  • Adjust assertions accordingly.

Apply this diff:

     # Delete the workflow
     await session.delete(workflow)
     await session.commit()

     # Verify cascading deletes
     mapping_result = await session.exec(select(ControlsWorkflowsMappingSchema))
     mappings = mapping_result.all()
     assert len(mappings) == 0

     results_result = await session.exec(select(WorkflowControlResultsSchema))
     results = results_result.all()
     assert len(results) == 0

Ensure that the database schema is configured with ON DELETE CASCADE for the foreign key relationships.


143-160: Refactor deletion logic into a helper function

The code blocks for deleting entries are similar. Consider refactoring the deletion of entries into a helper function to reduce code duplication and improve maintainability.

For example:

async def delete_entries(session, model, **filters):
    entries_to_delete = await session.exec(select(model).filter_by(**filters))
    for entry in entries_to_delete:
        await session.delete(entry)

Then, you can use this helper function:

await delete_entries(session, WorkflowControlResultsSchema, workflow_id=workflow.workflow_id)
await delete_entries(session, ControlsWorkflowsMappingSchema, workflow_id=workflow.workflow_id)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8c52142 and 9a43b57.

📒 Files selected for processing (5)
  • admyral/db/admyral_store.py (6 hunks)
  • admyral/db/alembic/versions/1239762f7ef5_add_controls_and_controls_workflows_.py (1 hunks)
  • admyral/db/schemas/control_schemas.py (1 hunks)
  • tests/conftest.py (2 hunks)
  • tests/db/test_controls_workflows_mapping.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • admyral/db/schemas/control_schemas.py
  • tests/conftest.py
🔇 Additional comments (10)
admyral/db/alembic/versions/1239762f7ef5_add_controls_and_controls_workflows_.py (3)

1-20: LGTM! Migration metadata is well-structured.

The migration metadata follows best practices with proper typing hints and required imports.


80-85: LGTM! Downgrade function is properly implemented.

The tables and indexes are dropped in the correct order, and cascade deletes will handle cleanup automatically.


1-85: Verify PostgreSQL compatibility.

Since the project now exclusively supports PostgreSQL, let's verify the compatibility of all database features used in this migration.

✅ Verification successful

PostgreSQL compatibility confirmed for migration.

The migration uses standard SQLAlchemy features that are fully compatible with PostgreSQL:

  • TIMESTAMP with timezone support (standard PostgreSQL feature)
  • Standard foreign key constraints with CASCADE delete
  • Basic table and index creation operations
  • Server-default timestamps using now()

All these features are well-supported in PostgreSQL and align with the existing database schema, as evidenced by the previous migration that explicitly uses PostgreSQL dialect for timestamp handling.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any PostgreSQL version requirements or compatibility issues in project docs
rg -i "postgresql.*version|postgres.*version" --type md

# Check for any existing PostgreSQL-specific features in other migrations
fd -e py -d 1 . admyral/db/alembic/versions/ --exec rg -l "postgresql|postgres"

Length of output: 218


Script:

#!/bin/bash
# Check the content of the previous migration that uses PostgreSQL features
cat admyral/db/alembic/versions/7985f1c159a3_switch_to_timezone_aware_datetime_.py

# Let's also check if there are any database configuration or setup files
fd -e py -e sql . admyral/db/ --exec rg -l "postgresql|postgres"

# Check for any SQLAlchemy configuration that might indicate PostgreSQL requirements
rg "create_engine|Engine" admyral/db/

Length of output: 16886

admyral/db/admyral_store.py (3)

96-99: LGTM!

The changes to create_store maintain backward compatibility while adding flexibility to specify a custom database URL.


113-113: LGTM!

The DatabaseManager initialization is correctly updated to use both the engine and database URL.


1083-1084: Add environment check for test-only method

The docstring indicates this method should only be used in testing environments, but there's no programmatic enforcement of this restriction.

tests/db/test_controls_workflows_mapping.py (4)

52-69: Consider adding error handling and docstrings to setup_scenario function

Adding error handling and a docstring to setup_scenario will enhance its robustness and readability.


71-82: Test for create_control is properly implemented

The test function test_create_control accurately tests the creation of a control, and all assertions are appropriate.


83-93: Test for create_workflow is properly implemented

The test function test_create_workflow correctly tests the creation of a workflow, with appropriate assertions.


95-115: Test for control-workflow mapping is comprehensive

The test_create_control_workflow_mapping function effectively tests the mapping between controls and workflows, including verification of relationships.


ADMYRAL_TEST_DATABASE_URL = os.getenv(
ENV_ADMYRAL_TEST_DATABASE_URL,
"postgresql://postgres:your-super-secret-and-long-postgres-password@localhost:5433/admyral",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default should point to the test postgres and not the actual postgres database. Can you also adapt .env.example and set the an env var for the test database, as well, so that we're consistent with how we set the normal database url.

POSTGRES_DB: admyral
image: postgres:${POSTGRES_VERSION:-16.4-bookworm}
healthcheck:
test: ["CMD-SHELL", "pg_isready", "-d", "admyral"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Let's stay consistent and use temporal instead of admyral.

admyral/db/schemas/control_schemas.py Show resolved Hide resolved
{
"control_id": self.control_id,
"workflow_id": self.workflow_id,
"user_id": self.user_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually don't return the user_id because it is unnecessary metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also this will fail since ControlsWorkflowsMapping does not have user_id member.

@@ -8,6 +8,7 @@
from contextlib import asynccontextmanager
from uuid import uuid4
import json
from sqlalchemy import select as sa_select
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use select from sqlmodel. sqlmodel is a wrapper around sqlalchemy.

return f"workflow-{uuid.uuid4()}"


async def create_control(session, control_id=1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is not needed because we always generate a random id. Same holds for create_workflow.

Comment on lines 144 to 150
results_to_delete = await session.exec(
select(WorkflowControlResultsSchema).where(
WorkflowControlResultsSchema.workflow_id == workflow.workflow_id
)
)
for result in results_to_delete.all():
await session.delete(result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just do

Suggested change
results_to_delete = await session.exec(
select(WorkflowControlResultsSchema).where(
WorkflowControlResultsSchema.workflow_id == workflow.workflow_id
)
)
for result in results_to_delete.all():
await session.delete(result)
await session.exec(
delete(WorkflowControlResultsSchema).where(
WorkflowControlResultsSchema.workflow_id == workflow.workflow_id
)
)

There is no need for retrieving everything first.

Comment on lines 152 to 159
# Then delete the workflow mappings
mappings_to_delete = await session.exec(
select(ControlsWorkflowsMappingSchema).where(
ControlsWorkflowsMappingSchema.workflow_id == workflow.workflow_id
)
)
for mapping in mappings_to_delete.all():
await session.delete(mapping)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe let's delete the workflows for the test user instead

Comment on lines 166 to 174
mapping_result = await session.exec(select(ControlsWorkflowsMappingSchema))
mappings = mapping_result.all()
assert len(mappings) == 0

results_result = await session.exec(select(WorkflowControlResultsSchema))
results = results_result.all()
assert len(results) == 0

control_result = await session.exec(select(ControlSchema))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would filter for the TEST_USER_ID


control_result = await session.exec(select(ControlSchema))
controls = control_result.all()
assert len(controls) == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also test the other direction? Delete a control, then it should remove the corresponding rows from the mapping table

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (5)
.env.example (1)

7-7: LGTM! Consider adding clarifying comments.

The addition of a separate test database URL is a good practice. However, consider adding comments to:

  • Indicate this is specifically for PostgreSQL databases
  • Explain the purpose of the test database URL
  • Document that this supports the new test database Docker service
 ADMYRAL_DATABASE_URL="postgresql://postgres:your-super-secret-and-long-postgres-password@localhost:5432/admyral"
+# Test database URL for running unit tests with the test database Docker service
+# Note: PostgreSQL is now the only supported database
 ADMYRAL_TEST_DATABASE_URL="postgresql://postgres:your-super-secret-and-long-postgres-password@localhost:5433/admyral"
admyral/db/admyral_store.py (1)

Line range hint 1072-1079: Add input parameter validation

The method should validate that the input parameters are not None or empty strings before proceeding with the database operation.

     async def store_workflow_control_result(
         self,
         workflow_id: str,
         run_id: str,
         result: str,
     ) -> None:
+        if not workflow_id or not run_id or result is None:
+            raise ValueError("workflow_id, run_id, and result must not be empty")
         async with self._get_async_session() as db:
             await db.exec(
admyral/db/alembic/versions/3967e08c0051_add_controls_and_controls_workflows_.py (1)

13-13: Unnecessary import of sqlmodel

The sqlmodel module is imported but not used in this script. Consider removing the import statement to clean up the code.

tests/db/test_controls_workflows_mapping.py (2)

20-21: Ensure consistent ID generation for workflows

The generate_random_workflow_id function adds a 'workflow-' prefix to the UUID, whereas generate_random_control_id does not add a prefix. For consistency and to adhere to standard UUID usage, consider removing the prefix from generate_random_workflow_id.

Apply this diff to update the function:

-def generate_random_workflow_id() -> str:
-    return f"workflow-{uuid.uuid4()}"
+def generate_random_workflow_id() -> str:
+    return str(uuid.uuid4())

37-41: Adjust workflow name to include a descriptive prefix

With the removal of the 'workflow-' prefix from the workflow_id, consider adjusting the workflow_name to maintain a clear identification of test workflows.

Apply this diff to update the workflow name:

-        workflow_name=f"Test Workflow-{workflow_id}",
+        workflow_name=f"Test Workflow-workflow-{workflow_id}",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9a43b57 and f79c918.

📒 Files selected for processing (8)
  • .env.example (1 hunks)
  • admyral/config/config.py (2 hunks)
  • admyral/db/admyral_store.py (5 hunks)
  • admyral/db/alembic/database_manager.py (2 hunks)
  • admyral/db/alembic/versions/3967e08c0051_add_controls_and_controls_workflows_.py (1 hunks)
  • admyral/db/schemas/control_schemas.py (1 hunks)
  • docker-compose.dev.yml (1 hunks)
  • tests/db/test_controls_workflows_mapping.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • admyral/db/schemas/control_schemas.py
🔇 Additional comments (22)
.env.example (1)

6-7: Consider using distinct database names for production and test environments.

Both database URLs use the same database name "admyral" but different ports. This could lead to confusion or potential data corruption if the wrong port is used. Consider:

  1. Using distinct database names (e.g., "admyral_test" for testing)
  2. Using different placeholder passwords to emphasize these are separate environments
docker-compose.dev.yml (4)

42-43: Use consistent environment variable naming

For consistency with other environment variables in the project, we should use POSTGRES_TEST_VOLUME_PATH instead of TEST_POSTGRES_VOLUME_PATH.

Apply this diff:

    volumes:
-     - ${TEST_POSTGRES_VOLUME_PATH:-.admyral/postgres-test}:/var/lib/postgresql/data
+     - ${POSTGRES_TEST_VOLUME_PATH:-.admyral/postgres-test}:/var/lib/postgresql/data

28-28: Use consistent database naming

For consistency, we should use temporal instead of admyral as the database name, following the project's naming convention.

Apply this diff:

-     POSTGRES_DB: admyral
+     POSTGRES_DB: temporal

30-31: ⚠️ Potential issue

Fix healthcheck configuration

The healthcheck is configured to check the temporal database, but this service is configured to use the admyral database. This mismatch could lead to incorrect health status reporting.

Apply this diff:

    healthcheck:
-     test: ["CMD-SHELL", "pg_isready", "-d", "temporal"]
+     test: ["CMD-SHELL", "pg_isready", "-d", "admyral"]

Likely invalid or redundant comment.


23-44: Verify database initialization scripts

The new PostgreSQL service for unit tests might require specific initialization scripts or schema setup for the controls and workflows tables mentioned in the PR description.

Let's check for any database initialization scripts:

✅ Verification successful

Let me search for any database-related files or migrations that might contain schema information.


Let me check the Prisma schema to understand the database structure and requirements.


No initialization scripts needed for unit tests database

The PostgreSQL container for unit tests doesn't require additional initialization scripts. The database schema is managed by Prisma ORM, which automatically handles schema creation and migrations during application startup. The schema.prisma file contains all necessary table definitions, and Prisma will create these tables when the application connects to the database.

  • The POSTGRES_DB: admyral environment variable in the docker-compose file correctly sets up the initial database
  • Prisma will handle the schema creation and migrations for all required tables (users, workflows, secrets, etc.)
  • The healthcheck is configured appropriately to ensure database availability
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Look for SQL initialization scripts that might need to be mounted
fd -e sql -e sh --type f

Length of output: 186


Script:

#!/bin/bash
# Look for migration files or database setup files
rg -g '!node_modules' -l -i "(migration|schema|database|sql|postgres)" --type-add 'prisma:*.prisma' --type prisma

Length of output: 140


Script:

#!/bin/bash
# Check the Prisma schema content
cat web/prisma/schema.prisma

Length of output: 7107

admyral/db/alembic/database_manager.py (5)

26-28: ⚠️ Potential issue

Add URL validation and sanitization

The change from GlobalConfig to direct database_url string simplifies the interface but removes potential validation. Consider adding URL validation to prevent SQL injection and ensure proper URL format.


43-44: ⚠️ Potential issue

Use URLlib for robust URL parsing

The current string manipulation approach for URL parsing is fragile and could break with complex URLs containing parameters or special characters.


48-58: ⚠️ Potential issue

Improve error handling and use dynamic database name

The current implementation has several issues:

  1. Hardcoded database name instead of using the one from URL
  2. Over-broad exception handling
  3. Potential resource leaks in error cases

61-64: ⚠️ Potential issue

Add error handling and SQL injection protection

The database creation lacks error handling and uses a hardcoded database name, making it vulnerable to SQL injection.


61-64: ⚠️ Potential issue

Add check for race conditions in database creation

The create_database method should check if the database exists before attempting creation. In a concurrent environment, another process might create the database between the last existence check and the creation attempt.

Consider wrapping the creation in a transaction and handling potential race conditions:

 async def create_database(self) -> None:
     engine = self._get_postgres_setup_engine()
     async with engine.connect() as conn:
-        await conn.execute(text("commit"))
-        await conn.execute(text("create database admyral"))
+        await conn.execute(text("begin"))
+        try:
+            # Check again within transaction
+            result = await conn.execute(
+                text("select exists (select 1 from pg_database where datname = :name)")
+                .bindparams(name=db_name)
+            )
+            if result.scalar():
+                await conn.execute(text("rollback"))
+                raise ValueError(f"Database {db_name} already exists")
+            
+            await conn.execute(
+                text("create database :name").bindparams(name=db_name)
+            )
+            await conn.execute(text("commit"))
+        except Exception as e:
+            await conn.execute(text("rollback"))
+            raise RuntimeError(f"Failed to create database: {e}") from e

Likely invalid or redundant comment.

admyral/config/config.py (4)

118-118: LGTM!

The new environment variable constant follows the established naming convention and is appropriately placed with other database-related constants.


154-154: LGTM!

The new field follows the class's type safety practices and properly uses the configured test database URL.


124-126: ⚠️ Potential issue

Port 5433 might cause connection issues

  1. The default port 5433 is non-standard for PostgreSQL (standard is 5432). This might cause connection issues for users with default PostgreSQL installations.
  2. The URL transformation still lacks validation as mentioned in previous reviews.

Consider:

  1. Using the standard port 5432 for the main database
  2. Adding URL validation as suggested in previous reviews
  3. Documenting the port requirement if 5433 is intentional
-    "postgresql://postgres:your-super-secret-and-long-postgres-password@localhost:5433/admyral",
+    "postgresql://postgres:your-super-secret-and-long-postgres-password@localhost:5432/admyral",

127-131: ⚠️ Potential issue

Test database should use a different database name

Using the same database URL for both main and test configurations is risky:

  1. Could lead to data corruption if tests modify production data
  2. Makes it harder to run tests in parallel with the main application

Consider:

  1. Using a different database name for tests (e.g., admyral_test)
  2. Updating .env.example as suggested in previous reviews
  3. Adding URL validation as previously recommended
-    "postgresql://postgres:your-super-secret-and-long-postgres-password@localhost:5433/admyral",
+    "postgresql://postgres:your-super-secret-and-long-postgres-password@localhost:5433/admyral_test",
admyral/db/admyral_store.py (5)

42-42: LGTM!

The import of ControlSchema is correctly placed and necessary for the new controls functionality.


94-97: LGTM!

The changes to create_store maintain backward compatibility while adding flexibility through the optional database_url parameter.


111-111: LGTM!

The DatabaseManager initialization is correctly updated to use both the engine and database_url parameters.


1081-1095: Add environment check for test-only method

While the docstring indicates this method should only be used in testing environments, there's no programmatic enforcement of this restriction.


78-82: Verify the impact of constructor signature change

The constructor now takes database_url instead of GlobalConfig. While this change simplifies the dependency, it's a breaking change that needs careful verification.

✅ Verification successful

Constructor change is safely abstracted by factory method

The constructor signature change from GlobalConfig to database_url is safely handled by the create_store factory method, which:

  • Maintains backward compatibility by defaulting to CONFIG.database_url when no URL is provided
  • Is consistently used across the codebase for instantiation (no direct constructor calls found)
  • Already supports the new database_url parameter in all usage locations
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all instantiations of AdmyralStore to ensure they're updated
# Test: Search for direct instantiations and factory method calls
rg -A 2 "AdmyralStore\(" 
rg -A 2 "create_store\("

Length of output: 1090


Script:

#!/bin/bash
# Let's check the create_store factory method implementation to understand how it handles the database_url
ast-grep --pattern 'async def create_store($$$) -> $_: $$$'

# Also check if there are any other references to GlobalConfig in the store
rg "GlobalConfig" admyral/db/admyral_store.py

Length of output: 688

admyral/db/alembic/versions/3967e08c0051_add_controls_and_controls_workflows_.py (2)

43-43: ⚠️ Potential issue

Verify the table name in the foreign key constraint for user_id

The foreign key constraint references ["User.id"], but table names are typically lowercase and may be pluralized. Ensure that the referenced table is correctly named User, or update it to match the actual table name (e.g., users).

Run the following script to confirm the table name:

✅ Verification successful

The foreign key constraint referencing User table is correct

From the migration files output, we can see that the table is indeed created with uppercase name User in the same codebase:

op.create_table(
    "User",
    sa.Column(
        "created_at",
        sa.TIMESTAMP(),
        server_default=sa.text("now()"),

This confirms that the foreign key constraint ["User.id"] is using the correct table name.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence of the 'User' table in the codebase.

# Search for the creation of a table named 'User'
rg "op.create_table\(['\"]User['\"]" admyral/db/alembic/

# Search for a model class named 'User'
rg "class User" admyral/db/models/

Length of output: 158


Script:

#!/bin/bash
# Let's try to find any SQL table definitions or models across the codebase
# First, let's find all alembic migration files
fd -e py . admyral/db/alembic/versions/ -x cat {} | grep -A 5 "create_table"

# Also search for any model definitions across the codebase
fd -t f -e py . -x grep -l "class.*Base"

# And search for SQLAlchemy model definitions with 'user' in filename
fd -t f -e py . -x grep -l "user" {}

Length of output: 9525


65-75: Confirm necessity of user_id in controls_workflows primary key and foreign keys

Including user_id in the primary key and foreign keys of the controls_workflows table may complicate the schema. Verify that this is required for your data model. If control_id and workflow_id uniquely identify a record, you might simplify the schema by excluding user_id.

Run the following script to analyze the usage of user_id in related tables:

✅ Verification successful

Keep user_id in the controls_workflows primary key and foreign keys

The inclusion of user_id in the primary key and foreign keys of the controls_workflows table is necessary and correctly designed based on the following evidence:

  1. The controls table uses a composite primary key of (control_id, user_id) and has a unique constraint on these fields
  2. Controls are user-specific as shown by the foreign key from controls.user_id to User.id
  3. The controls_workflows table maintains referential integrity by including both fields in its foreign key constraint to match the parent table's primary key structure

This design ensures:

  • Each control belongs to a specific user
  • The same control_id can be used by different users
  • Proper cascading deletion when a user or control is removed
  • Referential integrity across the user-control-workflow relationship chain
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Analyze the necessity of 'user_id' in the 'controls_workflows' table.

# Check how 'user_id' is used in the 'controls' table definition
rg -A 15 "op.create_table\(['\"]controls['\"]" admyral/db/alembic/

# Check how 'user_id' is used in the 'controls_workflows' table definition
rg -A 20 "op.create_table\(['\"]controls_workflows['\"]" admyral/db/alembic/

# Search for references to 'user_id' in models and schemas
rg "user_id" admyral/db/models/
rg "user_id" admyral/db/schemas/

Length of output: 1971


Script:

#!/bin/bash
# Let's check the schema definitions and relationships more thoroughly

# Check for any table definitions that reference controls or workflows
rg -A 20 "op.create_table" admyral/db/alembic/

# Check for any foreign key constraints involving controls or workflows
rg -B 2 -A 2 "ForeignKeyConstraint" admyral/db/alembic/

# Look for any unique constraints that might indicate data relationships
rg -B 2 -A 2 "UniqueConstraint" admyral/db/alembic/

# Check for any SQLAlchemy model definitions that might show the relationships
rg -A 10 "class.*Table" admyral/db/schemas/

Length of output: 42543

tests/db/test_controls_workflows_mapping.py (1)

51-68: Enhance setup_scenario with error handling and documentation

Consider adding error handling and a docstring to the setup_scenario function to improve its robustness and maintainability. This will make the function more understandable and easier to modify in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants