Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getPatch: avoid failing on more than 300 files #210

Merged
merged 1 commit into from
Nov 21, 2024

Conversation

thypon
Copy link
Member

@thypon thypon commented Nov 21, 2024

github api allows maximum 300 files to be diffed.
In order to fix this issue, when we encounter any error, try to fetch the repo manually, so to have repo content, and diff locally.

@thypon thypon force-pushed the features/utils-fetch-bigger-prs branch from fb41722 to 780f2ec Compare November 21, 2024 04:02
github api allows maximum 300 files to be diffed.
In order to fix this issue, when we encounter any error,
try to fetch the repo manually, so to have repo content, and diff locally.
@thypon thypon force-pushed the features/utils-fetch-bigger-prs branch from 780f2ec to 9f1a06c Compare November 21, 2024 04:03
@brave brave deleted a comment from github-actions bot Nov 21, 2024
@brave brave deleted a comment from github-actions bot Nov 21, 2024
@brave brave deleted a comment from github-actions bot Nov 21, 2024
Copy link

openai debug - [puLL-Merge] - brave/pull-merge@210

Description

This pull request adds a fallback mechanism for the getPatch function to handle cases where fetching the diff from the GitHub API fails. If the attempt to get the diff fails, the new code tries to clone the repository, manually generate the diff, and then retrieve the patch body from the generated diff file. This makes the getPatch function more robust and ensures it can still retrieve patch information in cases where the API might not be fully reliable.

Possible Issues

  1. Performance Overhead: Cloning the entire repository to generate the diff might introduce a significant performance overhead, especially for large repositories or repositories with a long history.
  2. Temporary Storage: The function creates a temporary directory to clone the repository and does not account for maximum storage space or handle cleanup on failure scenarios.
  3. Error Handling: The fallback mechanism uses console.log for error logging which might not be suitable for production-level code.

Security Hotspots

  1. Command Injection: The use of execSync with variables derived from user input (i.e., owner and repo) without proper sanitization could be susceptible to command injection attacks.
  2. Temporary Directory Creation: Using os.tmpdir() to create a temporary directory for cloning the repository could lead to conflicts or predictable paths that might be exploited.
Changes

Changes

  • src/getPatch.js:
    • Added import statements for fs, path, os, and execSync from child_process.
    • Modified getPatch function to include a try-catch block around the GitHub API request for fetching the diff.
    • Added a fallback mechanism to clone the repository, fetch relevant branches, generate a diff file, and read the patch from the diff file when the API request fails.
    • Included error logging and cleanup of the temporary directory after diff generation.
sequenceDiagram
    participant User
    participant API
    participant Local
    User->>API: Request Diff
    API-->>Local: API Failure
    Local->>Local: Clone Repository
    Local->>Local: Fetch Branches
    Local->>Local: Generate Diff
    Local->>Local: Read Diff File
    Local-->>User: Return Patch Body
Loading
graph TD
    A[User] --> B[API Request]
    B -->|API Success| C[Get Patch From API]
    B -->|API Failure| D[Clone Repository]
    D --> E[Fetch Branches]
    E --> F[Generate Diff]
    F --> G[Read Diff File]
    G --> H[Return Patch Body]

    C --> H
    G -->|Cleanup Temporary Directory| I[End]
Loading

Copy link

bedrock debug - [puLL-Merge] - brave/pull-merge@210

Description

This PR enhances the getPatch function to handle scenarios where the initial attempt to fetch a PR diff fails. It introduces a fallback mechanism that clones the repository and generates the diff manually when the GitHub API request fails.

Possible Issues

  1. The fallback mechanism involves cloning the entire repository, which could be time-consuming and resource-intensive for large repositories.
  2. The use of execSync for Git operations might block the event loop, potentially affecting the performance of the application.
  3. There's no error handling for the Git operations in the fallback mechanism, which could lead to unhandled exceptions.

Security Hotspots

  1. The PR introduces the use of child_process.execSync, which can be a security risk if not properly sanitized. Although the inputs seem to be coming from trusted sources (GitHub API), it's important to ensure that all inputs are properly validated and sanitized before being used in shell commands.
Changes

Changes

src/getPatch.js:

  1. Added imports for fs, path, os, and child_process modules.
  2. Implemented a try-catch block around the initial GitHub API request for the PR diff.
  3. Added a fallback mechanism that:
    • Fetches repository and PR information from GitHub API
    • Clones the repository to a temporary directory
    • Fetches the relevant branches
    • Generates the diff manually using Git commands
    • Reads the generated diff file
    • Cleans up the temporary directory
  4. Moved the private repository check after the diff fetching logic.
sequenceDiagram
    participant getPatch
    participant GitHub API
    participant LocalFileSystem
    participant Git

    getPatch->>GitHub API: Request PR Diff
    alt Successful API Request
        GitHub API->>getPatch: Return PR Diff
    else API Request Fails
        GitHub API->>getPatch: Error
        getPatch->>GitHub API: Request Repo Info
        GitHub API->>getPatch: Return Repo Info
        getPatch->>GitHub API: Request PR Info
        GitHub API->>getPatch: Return PR Info
        getPatch->>Git: Clone Repository
        Git->>LocalFileSystem: Create Local Repo
        getPatch->>Git: Fetch Branches
        getPatch->>Git: Generate Diff
        Git->>LocalFileSystem: Save Diff File
        getPatch->>LocalFileSystem: Read Diff File
        getPatch->>LocalFileSystem: Delete Local Repo
    end
    getPatch->>GitHub API: Check if Repo is Private
    GitHub API->>getPatch: Return Privacy Status
Loading
graph TD
    A[getPatch Function] --> B{Use GitHub Token?}
    B -->|Yes| C[Fetch Patch via GitHub API]
    B -->|No| D[Try Fetch Diff via GitHub API]
    D --> E{Fetch Successful?}
    E -->|Yes| F[Use Fetched Diff]
    E -->|No| G[Fallback Mechanism]
    G --> H[Clone Repository]
    H --> I[Fetch Branches]
    I --> J[Generate Diff]
    J --> K[Read Diff File]
    K --> L[Clean Up]
    C --> M[Return Patch]
    F --> M
    L --> M
Loading

Copy link

anthropic debug - [puLL-Merge] - brave/pull-merge@210

Description

This PR modifies the getPatch.js file to add a fallback mechanism for retrieving PR diffs when the initial GitHub API request fails. It introduces a new approach that involves cloning the repository and generating the diff manually using Git commands.

Possible Issues

  1. The PR introduces a significant amount of new code, which increases complexity and may make the function harder to maintain.
  2. Cloning the entire repository for each failed API request could be resource-intensive and slow, especially for large repositories.
  3. The use of execSync for Git operations might block the event loop, potentially affecting the performance of the application.

Security Hotspots

  1. The use of execSync with user-provided input (repo URL, branch names) could potentially lead to command injection vulnerabilities if not properly sanitized.
  2. Cloning repositories to a temporary directory (os.tmpdir()) might pose a risk if the system's temp directory is not properly secured or if the cleanup process fails.
Changes

Changes

src/getPatch.js:

  1. Imports new modules: fs, path, os, and child_process.
  2. Adds a try-catch block around the initial GitHub API request.
  3. Implements a fallback mechanism that:
    • Retrieves repository and PR information using GitHub API.
    • Clones the repository to a temporary directory.
    • Fetches the relevant branches.
    • Generates the PR diff using Git commands.
    • Reads the generated diff file.
    • Cleans up the cloned repository.
  4. Moves the private repository check after the diff retrieval process.
sequenceDiagram
    participant Client
    participant getPatch
    participant GitHubAPI
    participant FileSystem
    participant Git

    Client->>getPatch: Call with PR details
    alt Use GitHub API
        getPatch->>GitHubAPI: Request PR diff
        alt API request successful
            GitHubAPI-->>getPatch: Return PR diff
        else API request fails
            GitHubAPI-->>getPatch: Error
            getPatch->>GitHubAPI: Request repo and PR info
            GitHubAPI-->>getPatch: Return info
            getPatch->>FileSystem: Create temp directory
            getPatch->>Git: Clone repository
            getPatch->>Git: Fetch branches
            getPatch->>Git: Generate diff
            getPatch->>FileSystem: Read diff file
            getPatch->>FileSystem: Clean up temp directory
        end
    else Use custom URL
        getPatch->>GitHubAPI: Fetch patch using custom URL
    end
    getPatch-->>Client: Return patch
Loading
graph TD
    A[getPatch function] --> B{Use GitHub API?}
    B -->|Yes| C[Fetch patch using GitHub API]
    B -->|No| D[Fetch patch using custom URL]
    C --> E{API request successful?}
    E -->|Yes| F[Return patch]
    E -->|No| G[Fallback to manual cloning]
    G --> H[Clone repository]
    H --> I[Fetch branches]
    I --> J[Generate diff]
    J --> K[Read diff file]
    K --> L[Clean up cloned repo]
    L --> F
    D --> F
Loading

src/getPatch.js Show resolved Hide resolved
src/getPatch.js Show resolved Hide resolved
src/getPatch.js Show resolved Hide resolved
src/getPatch.js Show resolved Hide resolved
src/getPatch.js Show resolved Hide resolved
src/getPatch.js Show resolved Hide resolved
@thypon thypon merged commit 87f2cce into main Nov 21, 2024
7 checks passed
@thypon thypon deleted the features/utils-fetch-bigger-prs branch November 21, 2024 04:06
Copy link
Member

@kdenhartog kdenhartog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostMerged reviewed. This looks to address the issue with the PR which is that we're limited to 300 files. What happens when it throws an error from not enough tokens available? Seems like a potentially more robust way to handle this would be to generate intermediate outputs that get passed back in with new context such that there's an intermediate response returned until there's no more context left to pass in the final prompt. I'm not sure if that's a plausible solution in this scenario but it's a pattern I've seen used when dealing with extremely large amounts of context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants