feat: add import data from gql script #247

karlprieb · 2024-12-10T17:35:50Z

This Nodejs script fetches ArDrive L1 transactions and bundles from a gateway through GQL interface.

The script can receive some parameters like:

--gqlEndpoint # defaults to Goldsky
--minHeight # defaults to 0
--maxHeight # defaults to the latest Arweave block
--blockRangeSize # defaults to 100

Example:
node --import=./register.js scripts/import-data/fetch-data-gql.ts --minHeight 1000000 --maxHeight 1000500

This Nodejs script fetches ArDrive L1 transactions and bundles from a gateway through GQL interface. The script can receive some parameters like: - --gqlEndpoint # defaults to Goldsky - --minHeight # defaults to 0 - --maxHeight # defaults to the latest Arweave block - --blockRangeSize # defaults to 100 Example: `node --import=./register.js scripts/import-data/fetch-data-gql.ts --minHeight 1000000 --maxHeight 1000500`

codecov · 2024-12-10T17:37:55Z

Codecov Report

Attention: Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 70.87%. Comparing base (c977f8d) to head (5e8fb38).
Report is 3 commits behind head on develop.

Files with missing lines	Patch %	Lines
src/workers/bundle-data-importer.ts	66.66%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           develop     #247   +/-   ##
========================================
  Coverage    70.86%   70.87%           
========================================
  Files           35       35           
  Lines         8958     8967    +9     
  Branches       523      523           
========================================
+ Hits          6348     6355    +7     
- Misses        2608     2610    +2     
  Partials         2        2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai · 2024-12-10T17:38:54Z

Warning

There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure.

🔧 eslint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

error @permaweb/[email protected]: The engine "yarn" is incompatible with this module. Expected version "please-use-npm". Got "1.22.22"
error Found incompatible module.

📝 Walkthrough

Walkthrough

The changes in this pull request include modifications to several files: the .gitignore, fetch-data-gql.ts, tsconfig.json, docker-compose.yaml, import-data.ts, utils.ts, config.ts, ar-io.ts, system.ts, and bundle-data-importer.ts. The .gitignore file now excludes specific directories related to data imports. The fetch-data-gql.ts file implements functionality for fetching transaction data from a GraphQL endpoint. The tsconfig.json file has been updated to include the scripts directory in the TypeScript compilation process, and various updates have been made to enhance the configuration and functionality of the data importer.

Changes

File	Change Summary
`.gitignore`	Added entries to ignore `/scripts/import-data/bundles`, `/scripts/import-data/transactions`, and `/scripts/import-data/parquet`.
`scripts/import-data/fetch-data-gql.ts`	New file created to fetch transaction data from a GraphQL endpoint, with multiple utility functions.
`tsconfig.json`	Updated `include` array to add the `scripts` directory for TypeScript compilation.
`docker-compose.yaml`	Added environment variable `BUNDLE_DATA_IMPORTER_QUEUE_SIZE` to `core` service.
`scripts/import-data/import-data.ts`	New file created for importing transaction and bundle data into AR.IO Gateway.
`scripts/import-data/utils.ts`	New file created with `fetchWithRetry` and `fetchLatestBlockHeight` functions.
`src/config.ts`	Added new constant `BUNDLE_DATA_IMPORTER_QUEUE_SIZE` initialized from environment variable.
`src/routes/ar-io.ts`	Added check in `/ar-io/admin/queue-bundle` endpoint for queue fullness, responding with 429 if full.
`src/system.ts`	Exported `bundleDataImporter` and updated its constructor to use `maxQueueSize` from config.
`src/workers/bundle-data-importer.ts`	Updated `BundleDataImporter` constructor to use `maxQueueSize` from config and added `isQueueFull` method.
`scripts/import-data/count-fetched-ids.ts`	New file created to count IDs from transaction and bundle files in specified directories.
`scripts/import-data/export-parquet.ts`	New file created for exporting data in Parquet format from AR.IO Gateway.

Possibly related PRs

feat(parquet-exporter): init parquet exporter #205: The changes in this PR involve the initial implementation of a parquet exporter, which is directly related to the main PR's modifications to the .gitignore file that includes paths for parquet data.
feat(parquet-exporter): export blocks, txs and data items to parquet #213: This PR enhances the parquet export functionality by enabling the export of blocks, transactions, and data items to Parquet format, which is closely related to the main PR's focus on managing ignored files for parquet data.
feat(parquet-exporter): run the exporter from worker #215: This PR refactors the ParquetExporter to run from a worker, which aligns with the main PR's changes regarding the management of parquet files.
feat: root_parent_offset and X-AR-IO-Data-Item-Data-Offset header #228: This PR introduces new columns related to offsets in the database schema, which complements the changes in the main PR regarding the handling of parquet data.
feat(clickhouse): add ClickHouse import script PE-6848 #237: This PR adds a ClickHouse import script that deals with parquet files, directly connecting to the main PR's changes that involve ignoring certain parquet-related paths.
feat: ClickHouse import automation PE-7137 #241: This PR introduces automation for importing data into ClickHouse, which is relevant to the main PR's focus on managing parquet files and their paths.

Suggested reviewers

djwhitt

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 035c895 and 5e8fb38.

📒 Files selected for processing (3)

scripts/import-data/export-parquet.ts (1 hunks)
scripts/import-data/fetch-data-gql.ts (1 hunks)
scripts/import-data/utils.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

scripts/import-data/utils.ts
scripts/import-data/export-parquet.ts
scripts/import-data/fetch-data-gql.ts

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (2)

scripts/import-data/fetch-data-gql.ts (2)

30-46: Consider using a command-line argument parsing library

Utilizing a library such as commander, yargs, or minimist can simplify argument handling, provide built-in validations, and improve code readability.

243-270: Improve error handling in writeTransactionsToFile

Enhance robustness by handling specific file system errors and providing more informative messages to the user.

Consider wrapping file operations in try-catch blocks and distinguishing between different types of errors (e.g., permission issues, disk space).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c977f8d and f384110.

📒 Files selected for processing (3)

.gitignore (1 hunks)
scripts/import-data/fetch-data-gql.ts (1 hunks)
tsconfig.json (1 hunks)

✅ Files skipped from review due to trivial changes (1)

.gitignore

🔇 Additional comments (2)

scripts/import-data/fetch-data-gql.ts (1)

273-275: Verify that MAX_BLOCK_HEIGHT is defined after fetching

Ensure that MAX_BLOCK_HEIGHT is correctly assigned after calling fetchLatestBlockHeight() to avoid potential issues in subsequent code.

Run the following script to confirm MAX_BLOCK_HEIGHT is properly assigned:

✅ Verification successful

MAX_BLOCK_HEIGHT assignment is properly implemented

The verification shows that:

MAX_BLOCK_HEIGHT is correctly declared as a number
It's properly assigned either from CLI arguments or by fetching from the Arweave network
fetchLatestBlockHeight correctly retrieves and returns the block height
The value is used consistently throughout the code for block range calculations and logging

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that MAX_BLOCK_HEIGHT is assigned after fetching the latest block height.

node -e "
(async () => {
  let MAX_BLOCK_HEIGHT;
  const fetchLatestBlockHeight = async () => {
    const response = await fetch('https://arweave.net/info');
    const data = await response.json();
    return data.height;
  };
  MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();
  console.log('MAX_BLOCK_HEIGHT:', MAX_BLOCK_HEIGHT);
})();
"

Length of output: 381

Script:

#!/bin/bash
# Let's examine how MAX_BLOCK_HEIGHT is used in the codebase
rg "MAX_BLOCK_HEIGHT" -A 3

# Also check the implementation of fetchLatestBlockHeight
ast-grep --pattern 'async function fetchLatestBlockHeight() { $$$ }'

Length of output: 1788

Script:

#!/bin/bash
# Let's check the implementation of fetchLatestBlockHeight with a different pattern
ast-grep --pattern 'const fetchLatestBlockHeight = async () => { $$$ }'

# Also check how getBlockRanges uses maxBlock parameter
ast-grep --pattern 'function getBlockRanges($$$) { $$$ }'

Length of output: 634

tsconfig.json (1)

28-28: Including the scripts directory in TypeScript compilation

Adding "scripts" to the include array ensures that TypeScript compiles the new script files, which is appropriate.

scripts/import-data/fetch-data-gql.ts

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

scripts/import-data/fetch-data-gql.ts (1)
323-325: Update log message to reflect the current block range

The log message indicates the full block range from MIN_BLOCK_HEIGHT to MAX_BLOCK_HEIGHT, which may be misleading during iteration. Modify it to display the current range being processed for clearer progress tracking.

Apply this diff to fix the log message:
   console.log(
-    `Transactions and bundles from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT} saved!`,
+    `Transactions and bundles from block ${range.min} to ${range.max} saved!`,
   );

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f384110 and d6cfb5f.

📒 Files selected for processing (1)

scripts/import-data/fetch-data-gql.ts (1 hunks)

🔇 Additional comments (4)

scripts/import-data/fetch-data-gql.ts (4)

111-112: ⚠️ Potential issue

Correct the property name when fetching the latest block height

The fetchLatestBlockHeight function attempts to extract blocks from the response, but the correct property for the latest block height is height. This change ensures that the latest block height is accurately retrieved.

Apply this diff to fix the property name:

 const { blocks } = await response.json();
- return blocks as number;
+ return height as number;

Or adjust the code accordingly:

   const data = await response.json();
-  const { blocks } = data;
+  const { height } = data;
   return height as number;

125-129: 🛠️ Refactor suggestion

Allow processing when minBlock equals maxBlock

The current condition prevents processing a single block range where minBlock equals maxBlock. Adjust the condition to include this scenario, allowing the script to handle ranges where both values are the same.

Apply this diff to modify the condition:

- if (minBlock >= maxBlock || rangeSize <= 0) {
+ if (minBlock > maxBlock || rangeSize <= 0) {
     throw new Error(
       'Invalid input: ensure minBlock < maxBlock and rangeSize > 0',
     );
   }

174-174: ⚠️ Potential issue

Conditionally include the after parameter in the GraphQL query

Passing an empty string for the after parameter may cause issues with the GraphQL endpoint. Omit the after parameter when cursor is undefined to prevent potential errors.

Apply this diff to adjust the query:

       sort: HEIGHT_ASC
-      after: "${cursor !== undefined ? cursor : ''}"
+      ${cursor !== undefined ? `after: "${cursor}"` : ''}
     ) {

239-241: ⚠️ Potential issue

Handle empty edges array to prevent errors

When processing the edges array, ensure it is not empty before accessing its elements to avoid runtime exceptions when the array is empty.

Apply this diff to add a condition:

     hasNextPage = pageInfo.hasNextPage;
-    cursor = hasNextPage ? edges[edges.length - 1].cursor : undefined;
+    if (hasNextPage && edges.length > 0) {
+      cursor = edges[edges.length - 1].cursor;
+    } else {
+      cursor = undefined;
+    }

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)

scripts/import-data/fetch-data-gql.ts (3)

158-171: Consider moving App-Name values to configuration

The hardcoded App-Name values should be moved to a configuration file for better maintainability and reusability.

Create a config file and import the values:

// config.ts
export const ARDRIVE_APP_NAMES = [
  "ArDrive-App",
  "ArDrive-Web",
  "ArDrive-CLI",
  "ArDrive-Desktop",
  "ArDrive-Mobile",
  "ArDrive-Core",
  "ArDrive-Sync",
];

Then update the query:

    tags: [
      {
        name: "App-Name"
-        values: [
-          "ArDrive-App"
-          "ArDrive-Web"
-          "ArDrive-CLI"
-          "ArDrive-Desktop"
-          "ArDrive-Mobile"
-          "ArDrive-Core"
-          "ArDrive-Sync"
-        ]
+        values: ARDRIVE_APP_NAMES
      }
    ]

227-262: Improve pagination handling with rate limiting

The while loop for pagination could benefit from rate limiting to prevent overwhelming the API.

Add rate limiting:

+ const sleep = (ms: number) => new Promise(resolve => setTimeout(resolve, ms));
+ const RATE_LIMIT_DELAY = 1000; // 1 second delay between requests

  while (hasNextPage) {
    console.log(
      `Fetching transactions and bundles from block ${min} to ${max}. Page ${page}`,
    );
    const {
      transactions: { edges, pageInfo },
    } = await fetchGql({
      minBlock: min,
      maxBlock: max,
      cursor,
    });

+   // Add delay between requests
+   await sleep(RATE_LIMIT_DELAY);

267-294: Consider batching file operations for large datasets

For better performance with large datasets, consider batching file operations and adding progress tracking.

+ const BATCH_SIZE = 100;
+ let processedCount = 0;
+ const totalCount = transactions.size;

- for (const [height, ids] of transactions.entries()) {
+ const entries = Array.from(transactions.entries());
+ for (let i = 0; i < entries.length; i += BATCH_SIZE) {
+   const batch = entries.slice(i, i + BATCH_SIZE);
+   await Promise.all(
+     batch.map(async ([height, ids]) => {
      if (ids.size === 0) return;

      const content = JSON.stringify([...ids], null, 2);
      const filePath = path.join(outputDir, `${height}.json`);

      try {
        await fs.writeFile(filePath, content);
+       processedCount += 1;
+       if (processedCount % 10 === 0) {
+         console.log(`Progress: ${processedCount}/${totalCount} files written`);
+       }
      } catch (error) {
        console.error(`Failed to write ${filePath}: ${error}`);
        throw error;
      }
+     })
+   );
  }

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d6cfb5f and 4bbf4b6.

📒 Files selected for processing (1)

scripts/import-data/fetch-data-gql.ts (1 hunks)

🔇 Additional comments (3)

scripts/import-data/fetch-data-gql.ts (3)

1-24: LGTM! Well-structured imports and license header

The code follows best practices by:

Including the AGPL v3 license header
Using Node.js built-in modules
Implementing proper ESM path resolution

111-112: ⚠️ Potential issue

Fix incorrect property access in fetchLatestBlockHeight

The response from arweave.net/info contains a height property, not blocks.

239-241: ⚠️ Potential issue

Handle empty edges array to prevent errors

When processing the edges array, ensure it's not empty before accessing its elements.

coderabbitai · 2024-12-10T21:16:46Z

scripts/import-data/fetch-data-gql.ts

+let GQL_ENDPOINT = 'https://arweave-search.goldsky.com/graphql';
+let MIN_BLOCK_HEIGHT = 0;
+let MAX_BLOCK_HEIGHT: number | undefined;
+let BLOCK_RANGE_SIZE = 100;
+
+args.forEach((arg, index) => {
+  switch (arg) {
+    case '--gqlEndpoint':
+      if (args[index + 1]) {
+        GQL_ENDPOINT = args[index + 1];
+      } else {
+        console.error('Missing value for --gqlEndpoint');
+        process.exit(1);
+      }
+      break;
+    case '--minHeight':
+      if (args[index + 1]) {
+        MIN_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
+      } else {
+        console.error('Missing value for --minHeight');
+        process.exit(1);
+      }
+      break;
+    case '--maxHeight':
+      if (args[index + 1]) {
+        MAX_BLOCK_HEIGHT = parseInt(args[index + 1], 10);
+      } else {
+        console.error('Missing value for --maxHeight');
+        process.exit(1);
+      }
+      break;
+    case '--blockRangeSize':
+      if (args[index + 1]) {
+        BLOCK_RANGE_SIZE = parseInt(args[index + 1], 10);
+      } else {
+        console.error('Missing value for --blockRangeSize');
+        process.exit(1);
+      }
+      break;
+    default:
+      break;
+  }
+});


🛠️ Refactor suggestion

Add input validation for parsed arguments

While the code handles missing values well, it should validate the parsed values to ensure they are valid numbers and within acceptable ranges.

Apply this diff to add validation:

case '--minHeight': if (args[index + 1]) { - MIN_BLOCK_HEIGHT = parseInt(args[index + 1], 10); + const value = parseInt(args[index + 1], 10); + if (isNaN(value) || value < 0) { + console.error('Invalid value for --minHeight. Must be a non-negative integer.'); + process.exit(1); + } + MIN_BLOCK_HEIGHT = value; } else { console.error('Missing value for --minHeight'); process.exit(1); } break; case '--maxHeight': if (args[index + 1]) { - MAX_BLOCK_HEIGHT = parseInt(args[index + 1], 10); + const value = parseInt(args[index + 1], 10); + if (isNaN(value) || value < 0) { + console.error('Invalid value for --maxHeight. Must be a non-negative integer.'); + process.exit(1); + } + MAX_BLOCK_HEIGHT = value; } else { console.error('Missing value for --maxHeight'); process.exit(1); } break; case '--blockRangeSize': if (args[index + 1]) { - BLOCK_RANGE_SIZE = parseInt(args[index + 1], 10); + const value = parseInt(args[index + 1], 10); + if (isNaN(value) || value <= 0) { + console.error('Invalid value for --blockRangeSize. Must be a positive integer.'); + process.exit(1); + } + BLOCK_RANGE_SIZE = value; } else { console.error('Missing value for --blockRangeSize'); process.exit(1); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let GQL_ENDPOINT = 'https://arweave-search.goldsky.com/graphql';

let MIN_BLOCK_HEIGHT = 0;

let MAX_BLOCK_HEIGHT: number | undefined;

let BLOCK_RANGE_SIZE = 100;

args.forEach((arg, index) => {

switch (arg) {

case '--gqlEndpoint':

if (args[index + 1]) {

GQL_ENDPOINT = args[index + 1];

} else {

console.error('Missing value for --gqlEndpoint');

process.exit(1);

}

break;

case '--minHeight':

if (args[index + 1]) {

MIN_BLOCK_HEIGHT = parseInt(args[index + 1], 10);

} else {

console.error('Missing value for --minHeight');

process.exit(1);

}

break;

case '--maxHeight':

if (args[index + 1]) {

MAX_BLOCK_HEIGHT = parseInt(args[index + 1], 10);

} else {

console.error('Missing value for --maxHeight');

process.exit(1);

}

break;

case '--blockRangeSize':

if (args[index + 1]) {

BLOCK_RANGE_SIZE = parseInt(args[index + 1], 10);

} else {

console.error('Missing value for --blockRangeSize');

process.exit(1);

}

break;

default:

break;

}

});

let GQL_ENDPOINT = 'https://arweave-search.goldsky.com/graphql';

let MIN_BLOCK_HEIGHT = 0;

let MAX_BLOCK_HEIGHT: number | undefined;

let BLOCK_RANGE_SIZE = 100;

args.forEach((arg, index) => {

switch (arg) {

case '--gqlEndpoint':

if (args[index + 1]) {

GQL_ENDPOINT = args[index + 1];

} else {

console.error('Missing value for --gqlEndpoint');

process.exit(1);

}

break;

case '--minHeight':

if (args[index + 1]) {

const value = parseInt(args[index + 1], 10);

if (isNaN(value) || value < 0) {

console.error('Invalid value for --minHeight. Must be a non-negative integer.');

process.exit(1);

}

MIN_BLOCK_HEIGHT = value;

} else {

console.error('Missing value for --minHeight');

process.exit(1);

}

break;

case '--maxHeight':

if (args[index + 1]) {

const value = parseInt(args[index + 1], 10);

if (isNaN(value) || value < 0) {

console.error('Invalid value for --maxHeight. Must be a non-negative integer.');

process.exit(1);

}

MAX_BLOCK_HEIGHT = value;

} else {

console.error('Missing value for --maxHeight');

process.exit(1);

}

break;

case '--blockRangeSize':

if (args[index + 1]) {

const value = parseInt(args[index + 1], 10);

if (isNaN(value) || value <= 0) {

console.error('Invalid value for --blockRangeSize. Must be a positive integer.');

process.exit(1);

}

BLOCK_RANGE_SIZE = value;

} else {

console.error('Missing value for --blockRangeSize');

process.exit(1);

}

break;

default:

break;

}

});

coderabbitai · 2024-12-10T21:16:46Z

scripts/import-data/fetch-data-gql.ts

+(async () => {
+  if (MAX_BLOCK_HEIGHT === undefined) {
+    MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();
+  }
+
+  const blockRanges = getBlockRanges({
+    minBlock: MIN_BLOCK_HEIGHT,
+    maxBlock: MAX_BLOCK_HEIGHT,
+    rangeSize: BLOCK_RANGE_SIZE,
+  });
+
+  console.log(
+    `Starting to fetch transactions and bundles from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,
+  );
+
+  for (const range of blockRanges) {
+    const { transactions, bundles } = await getTransactionsForRange(range);
+
+    await writeTransactionsToFile({
+      outputDir: path.join(__dirname, 'transactions'),
+      transactions,
+    });
+    await writeTransactionsToFile({
+      outputDir: path.join(__dirname, 'bundles'),
+      transactions: bundles,
+    });
+
+    if (transactions.size !== 0 || bundles.size !== 0) {
+      console.log(
+        `Transactions and bundles from block ${range.min} to ${range.max} saved!`,
+      );
+    }
+  }
+})();


🛠️ Refactor suggestion

Add interrupt handling and detailed progress tracking

The main execution could benefit from proper cleanup on interruption and more detailed progress tracking.

+ let interrupted = false; + + process.on('SIGINT', () => { + console.log('\nGracefully shutting down...'); + interrupted = true; + }); + + const totalRanges = blockRanges.length; + let processedRanges = 0; + for (const range of blockRanges) { + if (interrupted) { + console.log('Operation interrupted by user'); + process.exit(0); + } + const { transactions, bundles } = await getTransactionsForRange(range); await writeTransactionsToFile({ outputDir: path.join(__dirname, 'transactions'), transactions, }); await writeTransactionsToFile({ outputDir: path.join(__dirname, 'bundles'), transactions: bundles, }); + processedRanges += 1; if (transactions.size !== 0 || bundles.size !== 0) { console.log( - `Transactions and bundles from block ${range.min} to ${range.max} saved!`, + `Progress: ${processedRanges}/${totalRanges} ranges processed. ` + + `Transactions and bundles from block ${range.min} to ${range.max} saved!`, ); } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

(async () => {

if (MAX_BLOCK_HEIGHT === undefined) {

MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();

}

const blockRanges = getBlockRanges({

minBlock: MIN_BLOCK_HEIGHT,

maxBlock: MAX_BLOCK_HEIGHT,

rangeSize: BLOCK_RANGE_SIZE,

});

console.log(

`Starting to fetch transactions and bundles from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,

);

for (const range of blockRanges) {

const { transactions, bundles } = await getTransactionsForRange(range);

await writeTransactionsToFile({

outputDir: path.join(__dirname, 'transactions'),

transactions,

});

await writeTransactionsToFile({

outputDir: path.join(__dirname, 'bundles'),

transactions: bundles,

});

if (transactions.size !== 0 || bundles.size !== 0) {

console.log(

`Transactions and bundles from block ${range.min} to ${range.max} saved!`,

);

}

}

})();

(async () => {

if (MAX_BLOCK_HEIGHT === undefined) {

MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();

}

const blockRanges = getBlockRanges({

minBlock: MIN_BLOCK_HEIGHT,

maxBlock: MAX_BLOCK_HEIGHT,

rangeSize: BLOCK_RANGE_SIZE,

});

console.log(

`Starting to fetch transactions and bundles from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,

);

let interrupted = false;

process.on('SIGINT', () => {

console.log('\nGracefully shutting down...');

interrupted = true;

});

const totalRanges = blockRanges.length;

let processedRanges = 0;

for (const range of blockRanges) {

if (interrupted) {

console.log('Operation interrupted by user');

process.exit(0);

}

const { transactions, bundles } = await getTransactionsForRange(range);

await writeTransactionsToFile({

outputDir: path.join(__dirname, 'transactions'),

transactions,

});

await writeTransactionsToFile({

outputDir: path.join(__dirname, 'bundles'),

transactions: bundles,

});

processedRanges += 1;

if (transactions.size !== 0 || bundles.size !== 0) {

console.log(

`Progress: ${processedRanges}/${totalRanges} ranges processed. ` +

`Transactions and bundles from block ${range.min} to ${range.max} saved!`,

);

}

}

})();

scripts/import-data/fetch-data-gql.ts

djwhitt

We need an extra GQL call to get the root parent for BDIs. It also would be worthwhile to make the GQL query we're using to find data items configurable.

I commented on a couple places where we could pull in libraries too, but perhaps the intent is to make this script as standalone as possible. I think that's worth doing for now, but if it becomes to cumbersome pulling in some libraries is fine too.

djwhitt · 2024-12-11T14:33:16Z

scripts/import-data/fetch-data-gql.ts

+      min: ${minBlock}
+      max: ${maxBlock}
+    }
+    tags: [


Let's make this configurable (maybe via cli specified file or environment variable) so users can use custom filters. We can leave the ArDrive tags as default right now for our own convenience.

scripts/import-data/fetch-data-gql.ts

djwhitt · 2024-12-11T14:49:20Z

scripts/import-data/fetch-data-gql.ts

+      }
+
+      if (bundleId !== undefined) {
+        bundles.get(blockHeight)?.add(bundleId);


We should do another GQL call here (one for each unique bundle ID) to retrieve the bundle itself and check whether it has a parent (to get the root ID in case of BDIs). Let's make this behavior configurable via CLI arg too, but default it to enabled.

Do you want to be able to get BDIs and L1 bundles and make it configurable like "get only L1 bundles"/"get L1 bundles and BDIs"?

I was thinking it might be nice to skip the extra lookups and only get the BDIs for cases where someone wants the script to run faster.

So it would be a setting for bundles where you can get the root tx id OR the bdi id (current behavior in this PR), right?

coderabbitai

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (7)

scripts/import-data/import-data.ts (2)

36-103: Refactor command-line argument parsing for scalability and maintainability

The repetitive pattern in the command-line argument parsing can make the code harder to maintain as new options are added. Consider using a dedicated command-line argument parsing library like commander, yargs, or minimist to simplify this section and improve scalability.

31-31: Verify existence of directories and handle missing paths

The script assumes that TRANSACTIONS_DIR and BUNDLES_DIR exist. If these directories do not exist, the script may fail. It's advisable to check for the existence of these directories and create them if necessary or inform the user about the missing directories.

Would you like assistance in adding code to verify and create directories if they are missing?

Also applies to: 70-85
scripts/import-data/utils.ts (1)
19-62: Include response details in error messages

When a non-OK HTTP response is received, the error thrown only includes the status code. Including the response body or status text can provide more context and aid in debugging.

Apply this diff to enhance error messages:
...
if (!response.ok) {
-   throw new Error(`HTTP error! status: ${response.status}`);
+   const errorText = await response.text();
+   throw new Error(`HTTP error! status: ${response.status}, body: ${errorText}`);
}
...
src/workers/bundle-data-importer.ts (1)
141-143: Add documentation for the isQueueFull method

The new isQueueFull method is a valuable addition. Adding JSDoc comments would improve code readability and maintainability by explaining the purpose and usage of this method.

Apply this diff to add documentation:
+ /**
+  * Checks if the bundle data importer queue has reached its maximum capacity.
+  * @returns {Promise<boolean>} True if the queue is full, false otherwise.
+  */
  async isQueueFull(): Promise<boolean> {
    return this.queue.length() >= this.maxQueueSize;
  }
docker-compose.yaml (1)
111-111: Set a default value for BUNDLE_DATA_IMPORTER_QUEUE_SIZE

The environment variable BUNDLE_DATA_IMPORTER_QUEUE_SIZE currently defaults to an empty string if not set, which may lead to unexpected behavior. Consider setting a sensible default value to ensure the application functions correctly even if the environment variable is not provided.

Apply this diff to provide a default value (e.g., 1000):
-       - BUNDLE_DATA_IMPORTER_QUEUE_SIZE=${BUNDLE_DATA_IMPORTER_QUEUE_SIZE:-}
+       - BUNDLE_DATA_IMPORTER_QUEUE_SIZE=${BUNDLE_DATA_IMPORTER_QUEUE_SIZE:-1000}
scripts/import-data/fetch-data-gql.ts (2)
241-285: Improve error handling and progress tracking in getTransactionsForRange

The function could benefit from:

Better error handling for GraphQL responses

Progress tracking for long-running operations

Graceful interruption handling

Apply this diff to enhance the function:
 const getTransactionsForRange = async ({ min, max }: BlockRange) => {
   let cursor: string | undefined;
   let hasNextPage = true;
   const transactions: BlockTransactions = new Map();
   const bundles: BlockTransactions = new Map();
+  let totalProcessed = 0;
+  const startTime = Date.now();
 
   while (hasNextPage) {
+    try {
       const {
         transactions: { edges, pageInfo },
       } = await fetchGql({
         minBlock: min,
         maxBlock: max,
         cursor,
       });
 
       hasNextPage = pageInfo.hasNextPage;
-      cursor = hasNextPage ? edges[edges.length - 1].cursor : undefined;
+      if (hasNextPage && edges.length > 0) {
+        cursor = edges[edges.length - 1].cursor;
+      } else {
+        cursor = undefined;
+      }
 
       for (const edge of edges) {
         const blockHeight = edge.node.block.height;
         const bundleId = edge.node.bundledIn?.id;
         const id = edge.node.id;
 
         if (!transactions.has(blockHeight)) {
           transactions.set(blockHeight, new Set());
         }
         if (!bundles.has(blockHeight)) {
           bundles.set(blockHeight, new Set());
         }
 
         if (bundleId !== undefined) {
           if (BUNDLES_FETCH_ROOT_TX) {
             const rootTxId = await getRootTxId(bundleId);
             bundles.get(blockHeight)?.add(rootTxId);
           } else {
             bundles.get(blockHeight)?.add(bundleId);
           }
         } else {
           transactions.get(blockHeight)?.add(id);
         }
+        totalProcessed++;
       }
+
+      // Log progress every 1000 items
+      if (totalProcessed % 1000 === 0) {
+        const elapsedSeconds = (Date.now() - startTime) / 1000;
+        console.log(
+          `Progress: Processed ${totalProcessed} items in ${elapsedSeconds.toFixed(2)}s`,
+        );
+      }
+    } catch (error) {
+      console.error(`Error processing range ${min}-${max}: ${error}`);
+      throw error;
+    }
   }
 
   return { transactions, bundles };
 };
287-314: Enhance error handling in writeTransactionsToFile

The function should handle more specific error cases and provide better error messages.

Apply this diff to improve error handling:
 const writeTransactionsToFile = async ({
   outputDir,
   transactions,
 }: {
   outputDir: string;
   transactions: BlockTransactions;
 }) => {
   try {
     await fs.mkdir(outputDir, { recursive: true });
   } catch (error) {
-    console.error(`Failed to create directory: ${error}`);
+    const errorMessage = error instanceof Error ? error.message : String(error);
+    console.error(`Failed to create directory ${outputDir}: ${errorMessage}`);
     throw error;
   }
 
   for (const [height, ids] of transactions.entries()) {
     if (ids.size === 0) continue;
 
-    const content = JSON.stringify([...ids]);
     const filePath = path.join(outputDir, `${height}.json`);
 
     try {
+      const content = JSON.stringify([...ids], null, 2); // Pretty print for better readability
       await fs.writeFile(filePath, content);
     } catch (error) {
-      console.error(`Failed to write ${filePath}: ${error}`);
+      const errorMessage = error instanceof Error ? error.message : String(error);
+      console.error(
+        `Failed to write to ${filePath} for block height ${height}: ${errorMessage}`,
+      );
       throw error;
     }
   }
 };

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4bbf4b6 and 4b57bb4.

📒 Files selected for processing (8)

docker-compose.yaml (1 hunks)
scripts/import-data/fetch-data-gql.ts (1 hunks)
scripts/import-data/import-data.ts (1 hunks)
scripts/import-data/utils.ts (1 hunks)
src/config.ts (1 hunks)
src/routes/ar-io.ts (1 hunks)
src/system.ts (1 hunks)
src/workers/bundle-data-importer.ts (3 hunks)

🔇 Additional comments (3)

src/routes/ar-io.ts (1)

213-216: LGTM! Good addition of queue overflow protection

The check for queue capacity before accepting new bundles is a good practice to prevent system overload.

src/config.ts (1)

201-205: LGTM! Well-documented configuration constant

The new configuration constant follows the established pattern and provides a reasonable default value.

src/system.ts (1)

472-477: LGTM! Verify the usage of the exported bundleDataImporter.

The changes look good. The export of bundleDataImporter and the addition of maxQueueSize parameter align with the PR objectives for importing data from GQL.

Let's verify the usage of the exported bundleDataImporter:

✅ Verification successful

The previous search might have been too specific. Let's broaden our search to find any references to bundleDataImporter and system imports, as well as locate route files where this exported instance might be used.

We found references to bundleDataImporter in three files. Let's examine how it's actually used in the routes and test files to verify if the export is properly utilized and necessary.

Export of bundleDataImporter is correctly used and necessary

The export is properly utilized in the routes file (src/routes/ar-io.ts) where it's accessed through the system instance to check queue status and process bundles. The test file confirms the implementation details and usage patterns align with the exported functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for imports and usage of bundleDataImporter
rg -A 3 "import.*bundleDataImporter.*from.*system"

Length of output: 52

Script:

#!/bin/bash
# Let's try a broader search for bundleDataImporter usage
rg -l "bundleDataImporter"

# Also search for any imports from system.ts
rg "from ['\"].*system['\"]"

# Let's also check if there are any route files that might use it
fd "routes?"

Length of output: 163

Script:

#!/bin/bash
# Let's check how bundleDataImporter is used in the routes file
rg -A 5 "bundleDataImporter" src/routes/ar-io.ts

# Also check the test file to understand the usage pattern
rg -A 5 "bundleDataImporter" src/workers/bundle-data-importer.test.ts

Length of output: 2442

scripts/import-data/import-data.ts

scripts/import-data/utils.ts

scripts/import-data/fetch-data-gql.ts

coderabbitai

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (3)

scripts/import-data/utils.ts (1)
76-100: Optimize file processing and add folder validation

The function could benefit from several improvements:

Validate folder existence before processing

Use async iteration for better memory efficiency with large directories

Add input validation for min/max values

Apply this diff to implement the improvements:
 export const getFilesInRange = async ({
   folder,
   min,
   max,
 }: {
   folder: string;
   min: number;
   max: number;
 }): Promise<string[]> => {
+  if (min > max) {
+    throw new Error('Min value cannot be greater than max value');
+  }
+
   try {
+    await fs.access(folder);
     const files = await fs.readdir(folder);
 
     const filesInRange = files
       .filter((file) => path.extname(file) === '.json')
       .filter((file) => {
         const match = file.match(/^\d+/);
-        const number = match ? parseInt(match[0], 10) : null;
+        if (!match) return false;
+        const number = parseInt(match[0], 10);
+        if (isNaN(number)) return false;
         return number !== null && number >= min && number <= max;
       });
 
     return filesInRange;
   } catch (error) {
+    if (error instanceof Error && error.message.includes('ENOENT')) {
+      throw new Error(`Folder does not exist: ${folder}`);
+    }
     throw new Error(`Error processing files: ${(error as Error).message}`);
   }
 };
scripts/import-data/count-fetched-ids.ts (2)
26-29: Move configuration to a separate file

Consider moving these constants to a dedicated configuration file for better maintainability and reusability across scripts.

Create a new file config.ts:
import path from 'node:path';
import { fileURLToPath } from 'node:url';

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

export const DEFAULT_CONFIG = {
  TRANSACTIONS_DIR: path.join(__dirname, 'transactions'),
  BUNDLES_DIR: path.join(__dirname, 'bundles'),
  MIN_BLOCK_HEIGHT: 0,
  MAX_BLOCK_HEIGHT: Infinity,
} as const;
86-130: Remove commented-out code

The commented-out importFromFiles function should be removed if it's no longer needed. If it's for future use, consider moving it to a separate file or tracking it in an issue.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b57bb4 and 72c9922.

📒 Files selected for processing (3)

scripts/import-data/count-fetched-ids.ts (1 hunks)
scripts/import-data/import-data.ts (1 hunks)
scripts/import-data/utils.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

scripts/import-data/import-data.ts

scripts/import-data/utils.ts

scripts/import-data/count-fetched-ids.ts

djwhitt

Noticed some commented code that was left in that could be removed, but otherwise looks good.

djwhitt · 2024-12-12T22:26:55Z

scripts/import-data/count-fetched-ids.ts

+  return counter;
+};
+
+// const importFromFiles = async ({


Looks like this could be removed.

Yep, my bad

djwhitt · 2024-12-12T22:39:05Z

Marked my review as pending again since the Parquet export script will go into this branch too.

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (1)

scripts/import-data/export-parquet.ts (1)
25-31: Use consistent naming convention for constants.

Consider using SCREAMING_SNAKE_CASE for all constant declarations to maintain consistency with JavaScript/TypeScript conventions:
-let ARIO_ENDPOINT = 'http://localhost:4000';
-let ADMIN_KEY: string | undefined;
-let OUTPUT_DIR = path.join(__dirname, 'parquet');
-let MAX_FILE_ROWS = 1_000_000;
-let MIN_BLOCK_HEIGHT = 0;
-let MAX_BLOCK_HEIGHT: number | undefined;
+const ARIO_ENDPOINT = 'http://localhost:4000';
+const ADMIN_KEY: string | undefined;
+const OUTPUT_DIR = path.join(__dirname, 'parquet');
+const MAX_FILE_ROWS = 1_000_000;
+const MIN_BLOCK_HEIGHT = 0;
+const MAX_BLOCK_HEIGHT: number | undefined;

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a970197 and 035c895.

📒 Files selected for processing (5)

.gitignore (1 hunks)
scripts/import-data/count-fetched-ids.ts (1 hunks)
scripts/import-data/export-parquet.ts (1 hunks)
scripts/import-data/fetch-data-gql.ts (1 hunks)
scripts/import-data/utils.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

.gitignore
scripts/import-data/count-fetched-ids.ts
scripts/import-data/utils.ts
scripts/import-data/fetch-data-gql.ts

🔇 Additional comments (1)

scripts/import-data/export-parquet.ts (1)

1-24: LGTM! Well-structured imports and proper license header.

The code follows best practices by using the node: protocol for built-in modules and includes the necessary .js extension for ESM imports.

scripts/import-data/export-parquet.ts

coderabbitai · 2024-12-13T16:43:31Z

scripts/import-data/export-parquet.ts

+(async () => {
+  if (ADMIN_KEY === undefined) {
+    throw new Error('Missing admin key');
+  }
+
+  if (MAX_BLOCK_HEIGHT === undefined) {
+    MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();
+  }
+
+  await fetchWithRetry(`${ARIO_ENDPOINT}/ar-io/admin/export-parquet`, {
+    method: 'POST',
+    headers: {
+      'Content-Type': 'application/json',
+      Authorization: `Bearer ${ADMIN_KEY}`,
+    },
+    body: JSON.stringify({
+      outputDir: OUTPUT_DIR,
+      startHeight: MIN_BLOCK_HEIGHT,
+      endHeight: MAX_BLOCK_HEIGHT,
+      maxFileRows: MAX_FILE_ROWS,
+    }),
+  });
+
+  console.log(
+    `Parquet export started from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,
+  );
+})();


🛠️ Refactor suggestion

Enhance error handling and user feedback.

The main execution block needs improvements in error handling and user feedback:

(async () => { + try { if (ADMIN_KEY === undefined) { throw new Error('Missing admin key'); } if (MAX_BLOCK_HEIGHT === undefined) { MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight(); } - await fetchWithRetry(`${ARIO_ENDPOINT}/ar-io/admin/export-parquet`, { + const response = await fetchWithRetry(`${ARIO_ENDPOINT}/ar-io/admin/export-parquet`, { method: 'POST', headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${ADMIN_KEY}`, }, body: JSON.stringify({ outputDir: OUTPUT_DIR, startHeight: MIN_BLOCK_HEIGHT, endHeight: MAX_BLOCK_HEIGHT, maxFileRows: MAX_FILE_ROWS, }), }); + + if (!response.ok) { + throw new Error(`Export failed with status: ${response.status}`); + } console.log( `Parquet export started from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`, ); + } catch (error) { + console.error('Export failed:', error.message); + process.exit(1); + } })();

Also consider adding:

Progress updates for long-running exports

Status endpoint to check export progress

Graceful shutdown handling

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

(async () => {

if (ADMIN_KEY === undefined) {

throw new Error('Missing admin key');

}

if (MAX_BLOCK_HEIGHT === undefined) {

MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();

}

await fetchWithRetry(`${ARIO_ENDPOINT}/ar-io/admin/export-parquet`, {

method: 'POST',

headers: {

'Content-Type': 'application/json',

Authorization: `Bearer ${ADMIN_KEY}`,

},

body: JSON.stringify({

outputDir: OUTPUT_DIR,

startHeight: MIN_BLOCK_HEIGHT,

endHeight: MAX_BLOCK_HEIGHT,

maxFileRows: MAX_FILE_ROWS,

}),

});

console.log(

`Parquet export started from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,

);

})();

(async () => {

try {

if (ADMIN_KEY === undefined) {

throw new Error('Missing admin key');

}

if (MAX_BLOCK_HEIGHT === undefined) {

MAX_BLOCK_HEIGHT = await fetchLatestBlockHeight();

}

const response = await fetchWithRetry(`${ARIO_ENDPOINT}/ar-io/admin/export-parquet`, {

method: 'POST',

headers: {

'Content-Type': 'application/json',

Authorization: `Bearer ${ADMIN_KEY}`,

},

body: JSON.stringify({

outputDir: OUTPUT_DIR,

startHeight: MIN_BLOCK_HEIGHT,

endHeight: MAX_BLOCK_HEIGHT,

maxFileRows: MAX_FILE_ROWS,

}),

});

if (!response.ok) {

throw new Error(`Export failed with status: ${response.status}`);

}

console.log(

`Parquet export started from block ${MIN_BLOCK_HEIGHT} to ${MAX_BLOCK_HEIGHT}`,

);

} catch (error) {

console.error('Export failed:', error.message);

process.exit(1);

}

})();

IT will throw and error if POST fail

@karlprieb, understood. Since fetchWithRetry throws an error if the POST fails, additional error handling might not be necessary. Thank you for the clarification.

🐰🤔

✏️ Learnings added

Learnt from: karlprieb PR: ar-io/ar-io-node#247 File: scripts/import-data/export-parquet.ts:0-0 Timestamp: 2024-12-13T17:40:34.289Z Learning: The `fetchWithRetry` function throws an error if the POST request fails.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

scripts/import-data/export-parquet.ts

karlprieb requested a review from djwhitt December 10, 2024 17:35

coderabbitai bot reviewed Dec 10, 2024

View reviewed changes

karlprieb and others added 2 commits December 10, 2024 16:53

Update scripts/import-data/fetch-data-gql.ts

6a50880

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update scripts/import-data/fetch-data-gql.ts

d6cfb5f

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai bot reviewed Dec 10, 2024

View reviewed changes

karlprieb added 2 commits December 10, 2024 17:03

chore: update log

16f07ca

chore: update to 15 retry attempts

4bbf4b6

coderabbitai bot reviewed Dec 10, 2024

View reviewed changes

feat(routes): return 429 when bundle importer queue is full

1f15119

djwhitt reviewed Dec 11, 2024

View reviewed changes

scripts/import-data/fetch-data-gql.ts Show resolved Hide resolved

djwhitt reviewed Dec 11, 2024

View reviewed changes

karlprieb added 6 commits December 12, 2024 10:51

fix: option to fetch root tx id

b106aab

fix: option for gql tags

4c7ab81

better logging

1da5557

writes minified json

eef376a

create utils

2561dd1

feat: queue transactions and bundles

4b57bb4

coderabbitai bot reviewed Dec 12, 2024

View reviewed changes

chore: add script to count fetched ids

72c9922

coderabbitai bot reviewed Dec 12, 2024

View reviewed changes

djwhitt approved these changes Dec 12, 2024

View reviewed changes

djwhitt self-requested a review December 12, 2024 22:38

chore: add export parquet and fixes to fetch-data-gql

035c895

karlprieb force-pushed the PE-7172-import-data-from-gql branch from a970197 to 035c895 Compare December 13, 2024 16:41

coderabbitai bot reviewed Dec 13, 2024

View reviewed changes

fix: track missing root tx ids

3e409f4

karlprieb force-pushed the PE-7172-import-data-from-gql branch from cfe7e45 to 3e409f4 Compare December 13, 2024 17:19

fix: wait for parquet export to finish

5e8fb38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add import data from gql script #247

feat: add import data from gql script #247

karlprieb commented Dec 10, 2024

codecov bot commented Dec 10, 2024 •

edited

Loading

coderabbitai bot commented Dec 10, 2024 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Dec 10, 2024

coderabbitai bot Dec 10, 2024

djwhitt left a comment

djwhitt Dec 11, 2024

djwhitt Dec 11, 2024

karlprieb Dec 11, 2024

djwhitt Dec 11, 2024

karlprieb Dec 11, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

djwhitt left a comment

djwhitt Dec 12, 2024

karlprieb Dec 12, 2024

djwhitt commented Dec 12, 2024

coderabbitai bot left a comment

coderabbitai bot Dec 13, 2024 •

edited

Loading

karlprieb Dec 13, 2024

coderabbitai bot Dec 13, 2024

feat: add import data from gql script #247

Are you sure you want to change the base?

feat: add import data from gql script #247

Conversation

karlprieb commented Dec 10, 2024

codecov bot commented Dec 10, 2024 • edited Loading

Codecov Report

coderabbitai bot commented Dec 10, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

djwhitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

djwhitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djwhitt commented Dec 12, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Dec 13, 2024

Choose a reason for hiding this comment

codecov bot commented Dec 10, 2024 •

edited

Loading

coderabbitai bot commented Dec 10, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Dec 13, 2024 •

edited

Loading