Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature cluster matching #1862

Merged
merged 10 commits into from
Jan 9, 2025
Merged

Feature cluster matching #1862

merged 10 commits into from
Jan 9, 2025

Conversation

CarlosQ96
Copy link
Collaborator

@CarlosQ96 CarlosQ96 commented Oct 15, 2024

Implements the new estimated matching backend.
Uses this gitcoin backend code: https://github.com/Giveth/COCM_QF_Algorithm
Issue: https://github.com/orgs/Giveth/projects/9/views/1?pane=issue&itemId=72923215&issue=Giveth%7Cgiveth-dapps-v2%7C4519

Summary by CodeRabbit

  • New Features

    • Added cluster matching functionality to track and estimate project matching amounts.
    • Introduced a new background job to synchronize estimated cluster matching data.
    • Created adapters and interfaces for cluster matching API integration.
    • Added a new method to export cluster matching donations format.
  • Database Changes

    • Added new estimated_cluster_matching table to store project matching estimates.
  • Performance Improvements

    • Implemented worker threads for efficient cluster matching data processing.
  • Technical Enhancements

    • Added error handling for cluster matching API requests.
    • Created mock adapter for testing and development environments.
    • Enhanced the estimatedMatching method to incorporate new logic for calculating matching values.
    • Introduced new error messages for cluster matching API errors.
    • Added new TypeScript interfaces for structured data handling.

@CarlosQ96 CarlosQ96 force-pushed the feature_cluster_matching branch from 184f134 to 92a31c6 Compare October 16, 2024 13:23
@CarlosQ96 CarlosQ96 force-pushed the feature_cluster_matching branch from 92a31c6 to 56857c3 Compare October 16, 2024 16:03
Copy link
Collaborator

@kkatusic kkatusic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ;) , thx @CarlosQ96

@CarlosQ96 CarlosQ96 marked this pull request as ready for review January 6, 2025 20:09
Copy link
Contributor

coderabbitai bot commented Jan 6, 2025

Walkthrough

This pull request introduces a comprehensive implementation of Cluster Matching functionality across multiple components of the application. A new database migration creates an estimated_cluster_matching table, and new adapters, entities, and services are developed to support fetching, processing, and storing estimated cluster matching data. The implementation includes a cron job for synchronization, worker threads for processing, and GraphQL resolvers to expose the data.

Changes

File Change Summary
migration/1728554628004-AddEstimatedClusterMatching.ts New migration to create estimated_cluster_matching table with indexes
src/adapters/adaptersFactory.ts Added getClusterMatchingAdapter(), clusterMatchingAdapter, and clusterMatchingMockAdapter
src/adapters/cocmAdapter/* Introduced new adapter interfaces, implementation, and mock adapter for cluster matching
src/entities/estimatedClusterMatching.ts New entity class for EstimatedClusterMatching with TypeORM and TypeGraphQL decorators
src/services/cronJobs/syncEstimatedClusterMatchingJob.ts New cron job for synchronizing estimated cluster matching data
src/workers/cocm/estimatedClusterMatchingWorker.ts Worker module for handling cluster matching operations
src/entities/project.ts Updated estimatedMatching method to include logic for fetching EstimatedClusterMatching
src/repositories/donationRepository.ts Added exportClusterMatchingDonationsFormat function for querying donations
src/utils/errorMessages.ts Introduced new error messages for cluster matching API errors

Possibly related PRs

  • Allow refresh concurrently of materialized views #1611: The changes in this PR involve creating and managing materialized views, which is related to the new estimated_cluster_matching table created in the main PR, as both involve database schema modifications and data aggregation functionalities.

Suggested Reviewers

  • mohammadranjbarz
  • jainkrati
  • aminlatifi

Poem

🐰 Cluster Matching Magic
In data's realm, a rabbit hops with glee,
Matching projects with precision's key,
Adapters dance, workers thread their way,
Synchronizing matchings day by day,
A computational bunny's delight! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (11)
src/server/bootstrap.ts (1)

Line range hint 319-342: Consider enhancing error handling for view refresh.

While the implementation is functionally correct, consider these architectural improvements:

  1. Add explicit error handling for the view refresh operation
  2. Consider adding a retry mechanism for transient database issues
  3. Add monitoring/metrics for this critical operation

Example implementation:

 if (!isTestEnv) {
   logger.debug(
     'continueDbSetup() before refreshProjectEstimatedMatchingView() ',
     new Date(),
   );
-  await refreshProjectEstimatedMatchingView();
+  try {
+    await refreshProjectEstimatedMatchingView();
+  } catch (error) {
+    logger.error('Failed to refresh estimated matching view', error);
+    // Consider implementing retry logic here
+    throw error; // Re-throw to prevent partial initialization
+  }
   logger.debug(
     'continueDbSetup() after refreshProjectEstimatedMatchingView() ',
     new Date(),
   );
 }
src/entities/estimatedClusterMatching.ts (1)

13-19: Indexing Approach for Large-Range Numeric Fields

Indexing on matching and the combination of (projectId, qfRoundId) is beneficial if they're heavily used in queries. Ensure that the numeric data distribution won't degrade index performance and keep an eye on index maintenance overhead.

src/entities/project.ts (1)

Line range hint 507-540: Refine the estimatedMatching Method Logic

  1. Lines 526-533: There is duplicate logic for initializing matching = 0. You could unify it in a single conditional to improve readability.
  2. Consider the cost of queries if this method is frequently used in performance-intensive contexts.
  3. If matching signifies monetary amounts, think about using a safer numeric type.

Below is a possible refactor to remove the duplicate check on estimatedClusterMatching:

     const estimatedClusterMatching =
       await EstimatedClusterMatching.createQueryBuilder('matching')
         .where('matching."projectId" = :projectId', { projectId: this.id })
         .andWhere('matching."qfRoundId" = :qfRoundId', {
           qfRoundId: activeQfRound.id,
         })
         .getOne();

-    let matching: number;
-    if (!estimatedClusterMatching) matching = 0;
-
-    if (!estimatedClusterMatching) {
-      matching = 0;
-    } else {
-      matching = estimatedClusterMatching.matching;
-    }
+    const matching = estimatedClusterMatching
+      ? estimatedClusterMatching.matching
+      : 0;
src/adapters/cocmAdapter/cocmAdapterInterface.ts (2)

1-17: Move example data to test files

The example data in comments should be moved to test files where it can be properly utilized.

Consider creating a new file src/adapters/cocmAdapter/__tests__/fixtures.ts with:

export const sampleMatchingData = {
  matching_data: [
    {
      matching_amount: 83.25,
      matching_percent: 50.0,
      project_name: "Test1",
      strategy: "COCM"
    },
    // ... rest of the example data
  ]
};

28-43: Add validation constraints to EstimatedMatchingInput interface

The interface could benefit from additional validation constraints and documentation.

 export interface EstimatedMatchingInput {
   votes_data: [
     {
-      voter: string;
+      voter: string; // Ethereum address
-      payoutAddress: string;
+      payoutAddress: string; // Ethereum address
-      amountUSD: number;
+      amountUSD: number; // Must be positive
-      project_name: string;
+      project_name: string; // Non-empty string
-      score: number;
+      score: number; // Range: 0-1
     },
   ];
-  strategy: string;
+  strategy: 'COCM'; // Only COCM is supported currently
   min_donation_threshold_amount: number;
   matching_cap_amount: number;
   matching_amount: number;
   passport_threshold: number;
 }
migration/1728554628004-AddEstimatedClusterMatching.ts (1)

28-33: Add CASCADE option to drop statement

The down migration should handle foreign key constraints.

-    DROP TABLE IF EXISTS estimated_cluster_matching;
+    DROP TABLE IF EXISTS estimated_cluster_matching CASCADE;
src/adapters/cocmAdapter/cocmMockAdapter.ts (1)

10-26: Enhance mock data for better test coverage

The mock data could be more comprehensive to cover edge cases.

 return {
   matching_data: [
     {
       matching_amount: 83.25,
       matching_percent: 50.0,
       project_name: 'Test1',
       strategy: 'COCM',
     },
+    {
+      matching_amount: 0,
+      matching_percent: 0,
+      project_name: 'Test2',
+      strategy: 'COCM',
+    },
     {
-      matching_amount: 83.25,
-      matching_percent: 50.0,
+      matching_amount: 166.50,
+      matching_percent: 100.0,
       project_name: 'Test3',
       strategy: 'COCM',
     },
   ],
 };
src/workers/cocm/estimatedClusterMatchingWorker.ts (1)

1-1: Fix incorrect file path comment

The comment shows workers/auth.js but the actual file path is src/workers/cocm/estimatedClusterMatchingWorker.ts.

-// workers/auth.js
+// src/workers/cocm/estimatedClusterMatchingWorker.ts
src/adapters/adaptersFactory.ts (1)

157-166: Add environment variable validation

Following the pattern from getGivPowerSubgraphAdapter, add validation to ensure the adapter type is specified.

 export const getClusterMatchingAdapter = (): CocmAdapterInterface => {
+  if (!process.env.CLUSTER_MATCHING_ADAPTER) {
+    logger.warn('CLUSTER_MATCHING_ADAPTER not specified, using mock adapter');
+  }
   switch (process.env.CLUSTER_MATCHING_ADAPTER) {
     case 'clusterMatching':
       return clusterMatchingAdapter;
     case 'mock':
       return clusterMatchingMockAdapter;
     default:
       return clusterMatchingMockAdapter;
   }
 };
src/utils/validators/projectValidator.test.ts (1)

58-58: Track the TODO with an issue.

The TODO comment indicates a problem with the eth_getCode method that needs to be fixed. This should be tracked properly.

Would you like me to create a GitHub issue to track this TODO and the underlying eth_getCode method issue?

src/utils/errorMessages.ts (1)

22-22: Consider making the error message more specific.

The current error message "Error in the cluster matching api, check logs" is quite generic. Consider providing more context about what specifically went wrong (e.g., connection failure, invalid response, etc.).

-  CLUSTER_MATCHING_API_ERROR: 'Error in the cluster matching api, check logs',
+  CLUSTER_MATCHING_API_ERROR: 'Failed to fetch data from cluster matching API. Please check server logs for details.',

Also applies to: 216-216

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09186cd and fc69928.

📒 Files selected for processing (16)
  • migration/1728554628004-AddEstimatedClusterMatching.ts (1 hunks)
  • src/adapters/adaptersFactory.ts (2 hunks)
  • src/adapters/cocmAdapter/cocmAdapter.ts (1 hunks)
  • src/adapters/cocmAdapter/cocmAdapterInterface.ts (1 hunks)
  • src/adapters/cocmAdapter/cocmMockAdapter.ts (1 hunks)
  • src/entities/entities.ts (2 hunks)
  • src/entities/estimatedClusterMatching.ts (1 hunks)
  • src/entities/project.ts (3 hunks)
  • src/repositories/donationRepository.ts (1 hunks)
  • src/resolvers/projectResolver.allProject.test.ts (2 hunks)
  • src/server/bootstrap.ts (1 hunks)
  • src/services/cronJobs/syncEstimatedClusterMatchingJob.ts (1 hunks)
  • src/types/qfTypes.ts (1 hunks)
  • src/utils/errorMessages.ts (2 hunks)
  • src/utils/validators/projectValidator.test.ts (1 hunks)
  • src/workers/cocm/estimatedClusterMatchingWorker.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/resolvers/projectResolver.allProject.test.ts
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: test
🔇 Additional comments (13)
src/server/bootstrap.ts (3)

71-72: LGTM! Imports are correctly placed and organized.

The imports are appropriately placed with other utility and service imports.


Line range hint 324-341: Well-structured error prevention and logging implementation.

The implementation shows good practices:

  1. Prevents test environment failures by checking isTestEnv
  2. Includes comprehensive debug logging for tracking execution flow
  3. Contains clear comments explaining why the test environment check is necessary

Let's verify the logging configuration:

✅ Verification successful

Logging implementation verification confirms good practices.

The codebase demonstrates consistent and well-structured logging patterns:

  • Appropriate use of logging levels (debug, error, info, fatal)
  • Comprehensive error context in error logs
  • Detailed debug information for tracing execution flow
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if debug logging is consistently used across the codebase

# Search for logger.debug usage patterns
rg -A 1 'logger\.debug\(' | grep -v 'bootstrap.ts'

# Check for any mixed logging levels that might need alignment
rg 'logger\.(debug|info|error|fatal)\('

Length of output: 68197


Line range hint 319-342: Verify dependencies on the estimated matching view.

Since this is part of a new estimated matching backend, we should verify:

  1. Which cron jobs depend on this view
  2. Impact of view refresh failures on the system
src/entities/estimatedClusterMatching.ts (4)

1-2: Proper imports from TypeGraphQL

All necessary imports are in place, and there's no issue with the usage of type-graphql and typeorm.


25-28: Well-structured Many-to-One Relationship

The relationship with the Project entity is properly defined, ensuring foreign key constraints and referential integrity.


30-32: Consistency Check for projectId Field

Make sure that all writes to this entity properly align the projectId column with the project relationship. Failure to do so might lead to orphan records or invalid references.


34-36: Good Practice for Distinguishing QF Rounds

Using a dedicated qfRoundId field is a clear and maintainable way to handle different QF round data.

src/entities/project.ts (2)

45-46: Relevant Imports from qfRoundRepository

The imported methods are appropriate for QF round calculations, and reordering them here has no apparent side effects.


53-53: New EstimatedClusterMatching Import

The inline import for EstimatedClusterMatching establishes a direct link to the new cluster matching logic. Ensure that it remains consistent with the Project entity’s usage.

src/services/cronJobs/syncEstimatedClusterMatchingJob.ts (1)

8-11: Review cron job frequency

The default cron expression '0 * * * * *' runs every minute, which might be too frequent for this operation considering it involves API calls and database operations. Consider reducing the frequency to hourly or daily based on business requirements.

-  ) as string) || '0 * * * * *';
+  ) as string) || '0 0 * * *'; // Default to once per day at midnight
✅ Verification successful

Confirmed: Reduce cron frequency for cluster matching sync

The review comment is correct. Running this job every minute is too aggressive considering:

  • It spawns worker threads and makes external API calls
  • Performs multiple database operations
  • Processes Quadratic Funding data that doesn't change frequently
  • Has significant computational overhead

Reducing the frequency to daily ('0 0 * * *') is appropriate as QF rounds operate on longer timeframes.

🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found


Script:

#!/bin/bash
# Find the implementation of the sync operation
ast-grep --pattern 'class $_ {
  $$$
  syncEstimatedClusterMatching($$$) {
    $$$
  }
  $$$
}'

# Look for similar cron job configurations
rg "cronJobTime.*\|\|.*\*.*\*.*\*" -A 2

# Search for documentation about sync operations
fd "README" --exec cat {} \;

Length of output: 17576


Script:

#!/bin/bash
# Search for the configuration key
rg "SYNC_ESTIMATED_CLUSTER_MATCHING_CRONJOB_EXPRESSION" -B 2 -A 2

# Search for the implementation file
fd "syncEstimatedClusterMatchingJob" --type f

# Search for related test files
fd "syncEstimatedClusterMatchingJob.*test" --type f

# Look for the service implementation
rg "class SyncEstimatedClusterMatching" -A 10

Length of output: 682


Script:

#!/bin/bash
# View the implementation file
cat src/services/cronJobs/syncEstimatedClusterMatchingJob.ts

# Look for any imports or dependencies
rg -l "syncEstimatedClusterMatching" --type ts

# Search for any service references
rg "SyncEstimatedClusterMatching" -B 2 -A 2

Length of output: 2790

src/entities/entities.ts (1)

55-55: LGTM!

The EstimatedClusterMatching entity is properly imported and registered in the entities list.

Also applies to: 90-90

src/utils/validators/projectValidator.test.ts (1)

60-95: Ensure skipped tests are tracked and resolved.

All smart contract validation tests are currently skipped across multiple networks (mainnet, xdai, polygon, celo, celo alfajores, arbitrum mainnet, and arbitrum sepolia). This significantly reduces test coverage for wallet address validation.

Let's verify if there are any related issues or PRs:

✅ Verification successful

Smart contract validation tests are intentionally skipped pending implementation decision

The tests are correctly skipped as the underlying eth_getCode method is pending replacement. This appears to be a deliberate decision, as smart contract wallets are not currently supported due to multi-chain compatibility risks.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for related issues/PRs about eth_getCode or smart contract validation
gh search issues "eth_getCode in:title,body repo:Giveth/impact-graph"

# Check for any existing test implementations
rg -g '!node_modules' 'eth_getCode' 

Length of output: 3961


Script:

#!/bin/bash
# Find implementation of isWalletAddressSmartContract
ast-grep --pattern 'function isWalletAddressSmartContract'

# Search for any alternative implementations or related code
rg -g '!node_modules' "isWalletAddressSmartContract" -A 5

Length of output: 6786

src/repositories/donationRepository.ts (1)

13-33: Consider query performance and security implications.

The implementation looks good but consider the following suggestions:

  1. Add an index on qfRoundId if not already present to optimize the JOIN performance
  2. Consider paginating results for large datasets to prevent memory issues

Let's verify if the index exists:

✅ Verification successful

Query is already optimized and secure

The implementation is well-designed:

  • An index on qfRoundId exists to optimize the filtering
  • The query uses parameterized values which prevents SQL injection
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for migrations that might contain the index
fd -g '*.ts' -g '*.js' --exec rg -l 'CREATE.*INDEX.*donation.*qfRoundId' {} \;

Length of output: 175


Script:

#!/bin/bash
# First, let's find all migration files
fd -e ts -e js migrations

# Then search for index patterns in these files
rg -i "create.*index|add.*index" -A 3 -B 3

# Also search for table creation to see if index was defined inline
rg -i "create.*table.*donation" -A 10 -B 1

Length of output: 66646

Comment on lines +38 to +40
@Field()
@Column('double precision')
matching: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider Decimal Precision for Monetary Fields

double precision might introduce floating-point inaccuracies when dealing with money-like values. Consider a decimal type or a scaled integer if matching is meant for financial calculations.

Comment on lines +13 to +15

@Field(_type => Float, { nullable: true })
matching?: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Evaluate Precision for matching

If this field represents monetary or high-stakes data, consider alternatives to Float to avoid rounding errors and ensure consistent calculations.

Comment on lines +14 to +15
this.ClusterMatchingURL =
process.env.CLUSTER_MATCHING_API_URL || 'localhost';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Security: Improve default configuration for production use

The default URL of 'localhost' could be problematic in production. Consider:

  1. Making the environment variable required
  2. Using a more secure default URL if needed
  3. Adding URL validation
-    this.ClusterMatchingURL =
-      process.env.CLUSTER_MATCHING_API_URL || 'localhost';
+    if (!process.env.CLUSTER_MATCHING_API_URL) {
+      throw new Error('CLUSTER_MATCHING_API_URL environment variable is required');
+    }
+    this.ClusterMatchingURL = process.env.CLUSTER_MATCHING_API_URL;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
this.ClusterMatchingURL =
process.env.CLUSTER_MATCHING_API_URL || 'localhost';
if (!process.env.CLUSTER_MATCHING_API_URL) {
throw new Error('CLUSTER_MATCHING_API_URL environment variable is required');
}
this.ClusterMatchingURL = process.env.CLUSTER_MATCHING_API_URL;

Comment on lines +22 to +31
const result = await axios.post(
this.ClusterMatchingURL,
matchingDataInput,
{
headers: {
Accept: 'application/json',
'Content-Type': 'application/json',
},
},
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add timeout and retry mechanism for API calls

The axios request lacks timeout configuration and retry mechanism, which could lead to hanging requests or unnecessary failures.

+    const axiosConfig = {
+      timeout: 5000, // 5 seconds
+      headers: {
+        Accept: 'application/json',
+        'Content-Type': 'application/json',
+      },
+      retry: 3,
+      retryDelay: (retryCount) => retryCount * 1000, // exponential backoff
+    };
     const result = await axios.post(
       this.ClusterMatchingURL,
       matchingDataInput,
-      {
-        headers: {
-          Accept: 'application/json',
-          'Content-Type': 'application/json',
-        },
-      },
+      axiosConfig,
     );

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +32 to +38
if (result?.data?.error !== null) {
logger.error('clusterMatchingApi error', result.data.error);
throw new Error(
i18n.__(translationErrorMessagesKeys.CLUSTER_MATCHING_API_ERROR),
);
}
return result.data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling logic

The error check result?.data?.error !== null is ambiguous and might not catch all error cases.

-    if (result?.data?.error !== null) {
+    if (!result?.data || result.data.error) {
       logger.error('clusterMatchingApi error', result.data.error);
       throw new Error(
         i18n.__(translationErrorMessagesKeys.CLUSTER_MATCHING_API_ERROR),
       );
     }
+    // Validate response structure matches ProjectsEstimatedMatchings
+    if (!Array.isArray(result.data?.matching_data)) {
+      logger.error('Invalid response structure', result.data);
+      throw new Error(
+        i18n.__(translationErrorMessagesKeys.CLUSTER_MATCHING_API_ERROR),
+      );
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (result?.data?.error !== null) {
logger.error('clusterMatchingApi error', result.data.error);
throw new Error(
i18n.__(translationErrorMessagesKeys.CLUSTER_MATCHING_API_ERROR),
);
}
return result.data;
if (!result?.data || result.data.error) {
logger.error('clusterMatchingApi error', result.data.error);
throw new Error(
i18n.__(translationErrorMessagesKeys.CLUSTER_MATCHING_API_ERROR),
);
}
// Validate response structure matches ProjectsEstimatedMatchings
if (!Array.isArray(result.data?.matching_data)) {
logger.error('Invalid response structure', result.data);
throw new Error(
i18n.__(translationErrorMessagesKeys.CLUSTER_MATCHING_API_ERROR),
);
}
return result.data;

Comment on lines 27 to 31
data => `(
(SELECT id FROM project WHERE title = '${data.project_name}'),
${qfRoundId},
${data.matching_amount}
)`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: Fix SQL injection vulnerability

Using string interpolation for SQL values is unsafe. Use parameterized queries instead.

-          data => `(
-        (SELECT id FROM project WHERE title = '${data.project_name}'), 
-        ${qfRoundId}, 
-        ${data.matching_amount}
-      )`,
+          (data, index) => `($${index * 3 + 1}, $${index * 3 + 2}, $${index * 3 + 3})`,

Then modify the query execution to use parameters:

const params = matchingData.flatMap(data => [
  data.project_name,
  qfRoundId,
  data.matching_amount
]);

const query = `
  WITH project_ids AS (
    SELECT id FROM project WHERE title = ANY($1::text[])
  )
  INSERT INTO estimated_cluster_matching ("projectId", "qfRoundId", matching)
  SELECT p.id, $2, v.matching_amount
  FROM project_ids p
  JOIN unnest($3::numeric[]) WITH ORDINALITY v(matching_amount, idx) ON true
  ON CONFLICT ("projectId", "qfRoundId")
  DO UPDATE SET matching = EXCLUDED.matching
  RETURNING "projectId", "qfRoundId", matching;
`;

Comment on lines +48 to +51
logger.debug('Matching data processed successfully with raw SQL.');
} catch (error) {
logger.debug('Error processing matching data:', error.message);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling

The error message is too generic and the error is only logged at debug level.

-      logger.debug('Error processing matching data:', error.message);
+      logger.error('Failed to update estimated cluster matching:', {
+        error: error.message,
+        qfRoundId,
+        matchingDataCount: matchingData?.length
+      });
+      throw new Error('Failed to update estimated cluster matching: ' + error.message);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.debug('Matching data processed successfully with raw SQL.');
} catch (error) {
logger.debug('Error processing matching data:', error.message);
}
logger.debug('Matching data processed successfully with raw SQL.');
} catch (error) {
logger.error('Failed to update estimated cluster matching:', {
error: error.message,
qfRoundId,
matchingDataCount: matchingData?.length
});
throw new Error('Failed to update estimated cluster matching: ' + error.message);
}

Comment on lines +16 to +20
async fetchEstimatedClusterMatching(matchingDataInput: any) {
return await getClusterMatchingAdapter().fetchEstimatedClusterMatchings(
matchingDataInput,
);
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add proper type definition for matchingDataInput

Using any type reduces type safety. Based on the usage, create an interface that defines the expected structure:

+interface MatchingDataInput {
+  votes_data: any[];
+  strategy: string;
+  min_donation_threshold_amount: number;
+  matching_cap_amount: number;
+  matching_amount: number;
+  passport_threshold: number;
+}

-async fetchEstimatedClusterMatching(matchingDataInput: any) {
+async fetchEstimatedClusterMatching(matchingDataInput: MatchingDataInput) {

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +39 to +46
const matchingDataInput = {
votes_data: clusterMatchingDonations,
strategy: defaultMatchingStrategy,
min_donation_threshold_amount: activeQfRound.minimumValidUsdValue,
matching_cap_amount: activeQfRound.maximumReward,
matching_amount: activeQfRound.allocatedFundUSD,
passport_threshold: activeQfRound.minimumPassportScore,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for matching data

The code assumes all required fields from activeQfRound are present. Add validation to ensure the data is complete and valid before making the API call.

const validateMatchingInput = (round: QfRound) => {
  const required = [
    'minimumValidUsdValue',
    'maximumReward',
    'allocatedFundUSD',
    'minimumPassportScore'
  ];
  const missing = required.filter(field => round[field] === undefined);
  if (missing.length > 0) {
    throw new Error(`Missing required fields for matching: ${missing.join(', ')}`);
  }
};

// Add before constructing matchingDataInput
validateMatchingInput(activeQfRound);

Comment on lines 26 to 28
const matchingWorker = await spawn(
new Worker('../../workers/cocm/estimatedClusterMtchingWorker'),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo in worker path and add error handling

There's a typo in the worker path and missing error handling for worker spawn.

-    new Worker('../../workers/cocm/estimatedClusterMtchingWorker'),
+    new Worker('../../workers/cocm/estimatedClusterMatchingWorker'),
+  ).catch(error => {
+    logger.error('Failed to spawn estimated cluster matching worker:', error);
+    throw error;
+  });

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/server/bootstrap.ts (1)

366-369: Consider enhancing error handling for the view refresh.

While the test environment check is good, consider adding error handling for the view refresh operation to ensure system stability.

 if (!isTestEnv) {
   logger.debug(
     'continueDbSetup() before refreshProjectEstimatedMatchingView() ',
     new Date(),
   );
-  await refreshProjectEstimatedMatchingView();
+  try {
+    await refreshProjectEstimatedMatchingView();
+  } catch (error) {
+    logger.error('Failed to refresh project estimated matching view:', error);
+    // Consider if this should be a fatal error
+    // logger.fatal('Critical: Failed to refresh project estimated matching view:', error);
+  }
   logger.debug(
     'continueDbSetup() after refreshProjectEstimatedMatchingView() ',
     new Date(),
   );
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fc69928 and 5496642.

📒 Files selected for processing (1)
  • src/server/bootstrap.ts (2 hunks)
🔇 Additional comments (1)
src/server/bootstrap.ts (1)

71-73: LGTM!

The new imports are well-organized and follow the project's import style.

Comment on lines +366 to +369
if (process.env.ENABLE_CLUSTER_MATCHING === 'true') {
runSyncEstimatedClusterMatchingCronjob();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Add ENABLE_CLUSTER_MATCHING to environment configuration files

The feature flag implementation follows the established pattern, but the environment variable needs to be documented:

  • Add ENABLE_CLUSTER_MATCHING=false to config/example.env
  • Add ENABLE_CLUSTER_MATCHING=false to config/test.env
🔗 Analysis chain

Verify environment variable configuration.

The cron job initialization looks good, but let's verify the environment variable setup.

Run this script to check if the environment variable is properly documented:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if ENABLE_CLUSTER_MATCHING is documented in configuration files

# Look for environment variable documentation
echo "Checking for environment variable documentation..."
rg -i "ENABLE_CLUSTER_MATCHING" --type md

# Check for environment variable usage in configuration files
echo "Checking for environment variable usage in configuration files..."
fd -e env -e yaml -e json -X rg -i "ENABLE_CLUSTER_MATCHING"

Length of output: 354


Script:

#!/bin/bash
# Search for similar ENABLE_ environment variables to understand the pattern
echo "Checking for similar environment variables..."
rg "ENABLE_" -A 1 -B 1

# Look for environment setup files or templates
echo "Checking for environment templates..."
fd -e env.example -e env.template -e env.sample -X cat

# Check TypeScript files for environment variable usage
echo "Checking TypeScript files for environment configuration..."
rg "process\.env\.ENABLE" --type ts -A 2 -B 2

Length of output: 9135

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/workers/cocm/estimatedClusterMatchingWorker.ts (1)

36-42: Consider using a CTE for better query organization

The current SQL query could be restructured using a Common Table Expression (CTE) for better readability and maintainability.

-      const query = `
-        INSERT INTO estimated_cluster_matching ("projectId", "qfRoundId", matching)
-        VALUES ${values}
-        ON CONFLICT ("projectId", "qfRoundId")
-        DO UPDATE SET matching = EXCLUDED.matching
-        RETURNING "projectId", "qfRoundId", matching;
-      `;
+      const query = `
+        WITH new_values (project_id, qf_round_id, matching_amount) AS (
+          VALUES ${values}
+        )
+        INSERT INTO estimated_cluster_matching ("projectId", "qfRoundId", matching)
+        SELECT project_id, qf_round_id, matching_amount
+        FROM new_values
+        ON CONFLICT ("projectId", "qfRoundId")
+        DO UPDATE SET matching = EXCLUDED.matching
+        RETURNING "projectId", "qfRoundId", matching;
+      `;
src/services/cronJobs/syncEstimatedClusterMatchingJob.ts (1)

8-11: Add validation for cron expression

The cron expression from config should be validated to ensure it's properly formatted.

import { validate as validateCron } from 'cron-validator';

const cronJobTime = (() => {
  const configValue = config.get('SYNC_ESTIMATED_CLUSTER_MATCHING_CRONJOB_EXPRESSION') as string;
  const expression = configValue || '0 * * * * *';
  if (!validateCron(expression)) {
    logger.error(`Invalid cron expression: ${expression}, falling back to default`);
    return '0 * * * * *';
  }
  return expression;
})();
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5496642 and bfa1d12.

📒 Files selected for processing (2)
  • src/services/cronJobs/syncEstimatedClusterMatchingJob.ts (1 hunks)
  • src/workers/cocm/estimatedClusterMatchingWorker.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: test
🔇 Additional comments (5)
src/workers/cocm/estimatedClusterMatchingWorker.ts (4)

1-14: LGTM! Well-structured type definitions and imports.

The type definitions are clear and follow TypeScript best practices.


16-20: Add proper type definition for matchingDataInput

Using any type reduces type safety. A proper interface should be defined for the input parameters.


49-52: Improve error handling

The error message is too generic and the error is only logged at debug level.


54-56: LGTM! Proper worker export.

The worker is correctly exposed using the threads/worker library.

src/services/cronJobs/syncEstimatedClusterMatchingJob.ts (1)

39-46: Add input validation for matching data

The code assumes all required fields from activeQfRound are present. Validation should be added to ensure the data is complete and valid before making the API call.

Comment on lines +20 to +22
schedule(cronJobTime, async () => {
await fetchAndUpdateClusterEstimatedMatching();
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for scheduled task

The scheduled task should have error handling to prevent the cron job from stopping on errors.

-  schedule(cronJobTime, async () => {
-    await fetchAndUpdateClusterEstimatedMatching();
-  });
+  schedule(cronJobTime, async () => {
+    try {
+      await fetchAndUpdateClusterEstimatedMatching();
+    } catch (error) {
+      logger.error('Failed to execute scheduled cluster matching task:', error);
+    }
+  });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
schedule(cronJobTime, async () => {
await fetchAndUpdateClusterEstimatedMatching();
});
schedule(cronJobTime, async () => {
try {
await fetchAndUpdateClusterEstimatedMatching();
} catch (error) {
logger.error('Failed to execute scheduled cluster matching task:', error);
}
});

Comment on lines 48 to 63
try {
// Fetch from python api cluster matching
const matchingData =
await matchingWorker.fetchEstimatedClusterMatching(matchingDataInput);

// Insert the data
await matchingWorker.updateEstimatedClusterMatching(
activeQfRound.id,
matchingData,
);
} catch (e) {
logger.error('fetchAndUpdateClusterEstimatedMatching error', e);
}

await Thread.terminate(matchingWorker);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ensure worker cleanup in finally block

Move the worker termination to a finally block to ensure it's always cleaned up, even if an error occurs.

   try {
     // Fetch from python api cluster matching
     const matchingData =
       await matchingWorker.fetchEstimatedClusterMatching(matchingDataInput);
 
     // Insert the data
     await matchingWorker.updateEstimatedClusterMatching(
       activeQfRound.id,
       matchingData,
     );
   } catch (e) {
     logger.error('fetchAndUpdateClusterEstimatedMatching error', e);
+    throw e;
+  } finally {
+    await Thread.terminate(matchingWorker);
   }
-
-  await Thread.terminate(matchingWorker);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try {
// Fetch from python api cluster matching
const matchingData =
await matchingWorker.fetchEstimatedClusterMatching(matchingDataInput);
// Insert the data
await matchingWorker.updateEstimatedClusterMatching(
activeQfRound.id,
matchingData,
);
} catch (e) {
logger.error('fetchAndUpdateClusterEstimatedMatching error', e);
}
await Thread.terminate(matchingWorker);
};
try {
// Fetch from python api cluster matching
const matchingData =
await matchingWorker.fetchEstimatedClusterMatching(matchingDataInput);
// Insert the data
await matchingWorker.updateEstimatedClusterMatching(
activeQfRound.id,
matchingData,
);
} catch (e) {
logger.error('fetchAndUpdateClusterEstimatedMatching error', e);
throw e;
} finally {
await Thread.terminate(matchingWorker);
}
};

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/services/cronJobs/syncEstimatedClusterMatchingJob.ts (2)

20-22: 🛠️ Refactor suggestion

Add error handling to the scheduled task

The scheduled task lacks error handling, which means any unhandled exceptions could stop the cron job or prevent subsequent executions. Wrapping the task in a try-catch block ensures that errors are logged and do not halt the scheduling.

Apply this diff to add error handling:

 schedule(cronJobTime, async () => {
+  try {
     await fetchAndUpdateClusterEstimatedMatching();
+  } catch (error) {
+    logger.error('Scheduled task error:', error);
+  }
 });

39-46: 🛠️ Refactor suggestion

Add input validation for matching data

The code assumes all required fields from activeQfRound are present. Adding validation ensures that missing or undefined fields are caught early, preventing potential errors during the matching process.

Here's an example of how to add input validation:

const validateMatchingInput = (round: QfRound) => {
  const requiredFields = [
    'minimumValidUsdValue',
    'maximumReward',
    'allocatedFundUSD',
    'minimumPassportScore',
  ];
  const missingFields = requiredFields.filter(
    field => round[field] === undefined || round[field] === null,
  );
  if (missingFields.length > 0) {
    throw new Error(
      `Missing required fields for matching: ${missingFields.join(', ')}`,
    );
  }
};

// Add before constructing matchingDataInput
validateMatchingInput(activeQfRound);
🧹 Nitpick comments (3)
src/services/cronJobs/syncEstimatedClusterMatchingJob.ts (3)

58-60: Consider re-throwing errors after logging

Currently, errors caught in the catch block are logged but not re-thrown, which may hide issues from higher-level error handlers or monitoring systems. Re-throwing the error after logging ensures that the calling function is aware of the failure.

Apply this diff to re-throw the error:

 } catch (e) {
   logger.error('fetchAndUpdateClusterEstimatedMatching error', e);
+  throw e;
 } finally {

61-65: Remove redundant worker termination

The worker thread is already terminated in the finally block at lines 61-62. Calling await Thread.terminate(matchingWorker); again at lines 64-65 is redundant and unnecessary.

Apply this diff to remove the redundant termination:

   } finally {
     await Thread.terminate(matchingWorker);
   }

-  await Thread.terminate(matchingWorker);

25-65: Initialize the worker inside the try block

If the worker fails to spawn, it would be beneficial to handle that error within the try block to ensure consistent error handling and resource cleanup.

Apply this diff to adjust the code:

-export const fetchAndUpdateClusterEstimatedMatching = async () => {
+export const fetchAndUpdateClusterEstimatedMatching = async () => {
+  let matchingWorker;
   const activeQfRound = await findActiveQfRound();
   if (!activeQfRound?.id) return;

   const clusterMatchingDonations = await exportClusterMatchingDonationsFormat(
     activeQfRound?.id,
   );
   if (!clusterMatchingDonations || clusterMatchingDonations?.length === 0)
     return;

   const matchingDataInput = {
     votes_data: clusterMatchingDonations,
     strategy: defaultMatchingStrategy,
     min_donation_threshold_amount: activeQfRound.minimumValidUsdValue,
     matching_cap_amount: activeQfRound.maximumReward,
     matching_amount: activeQfRound.allocatedFundUSD,
     passport_threshold: activeQfRound.minimumPassportScore,
   };

   try {
+    matchingWorker = await spawn(
+      new Worker('../../workers/cocm/estimatedClusterMatchingWorker')
+    );
     // Fetch from python API cluster matching
     const matchingData = await matchingWorker.fetchEstimatedClusterMatching(
       matchingDataInput,
     );

     // Insert the data
     await matchingWorker.updateEstimatedClusterMatching(
       activeQfRound.id,
       matchingData,
     );
   } catch (e) {
     logger.error('fetchAndUpdateClusterEstimatedMatching error', e);
     throw e;
   } finally {
     if (matchingWorker) {
       await Thread.terminate(matchingWorker);
     }
   }
 };
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bfa1d12 and f0a04db.

📒 Files selected for processing (1)
  • src/services/cronJobs/syncEstimatedClusterMatchingJob.ts (1 hunks)
🔇 Additional comments (2)
src/services/cronJobs/syncEstimatedClusterMatchingJob.ts (2)

27-28: Add error handling for worker spawn

Spawning the worker could fail, and currently, there is no error handling for this. Wrapping the worker spawn in a try-catch block or handling the promise rejection ensures that any issues with spawning the worker are properly managed.

Apply this diff to handle errors during worker spawn:

-  const matchingWorker = await spawn(
-    new Worker('../../workers/cocm/estimatedClusterMatchingWorker'),
-  );
+  const matchingWorker = await spawn(
+    new Worker('../../workers/cocm/estimatedClusterMatchingWorker')
+  ).catch(error => {
+    logger.error('Failed to spawn estimated cluster matching worker:', error);
+    throw error;
+  });

27-27: Verify the worker path is correct

Ensure that the path '../../workers/cocm/estimatedClusterMatchingWorker' correctly points to the worker script. Any typo or incorrect path will cause the worker spawn to fail.

@CarlosQ96 CarlosQ96 merged commit 8ebfa96 into staging Jan 9, 2025
5 checks passed
@CarlosQ96 CarlosQ96 deleted the feature_cluster_matching branch January 9, 2025 10:28
CarlosQ96 added a commit that referenced this pull request Jan 9, 2025
* add cluster matching entity

* add cluster matching adapters

* finish cocm adapter

* improve error handling for cluster matching

* comment broken contract tests by missing eth_getCode Method

* add feedback to handle qf cases

* add cluster matching job to bootstrap file

* fix coderabbit feedback PR

* termine worker if an exception is raised
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants