You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today we generate one large car file based on epoch. This is about ~600GB in size. We then split it using carlet. Carlet does splitting in a naive way where it takes one block (block here is in the IPLD sense of the word) at a time till it reaches the desired size. As a result, blocks belonging to the same dag and "connected" to eachother could be stored in separate CAR files and as a result in different filecoin deals.
This means when we try to retrieve data, only a retrieval protocol that fetches 1 block at a time i.e., bitswap, would work. This means all the separate split CAR files need to be stored with SPs who are serving data over bitswap.
However, we already have introduced the concept of a subset which collects a bunch of Blocks (in the solana sense of the word) together. Since we control which Blocks go in a subset, we could instead split the subsets in a way where each subset is <32GB and will fit in a filecoin sector. This way we have all the data for a subset in a single deal and now retrievable via bitswap as well as graphsync.
This would require the following work:
Rewrite the car writing code in faithful to generate subset of "correct" size
Write a standalone tool that can take an existing epoch CAR file and split them into "correct" sized smaller car files where each smaller car file is a subset.
The text was updated successfully, but these errors were encountered:
Today we generate one large car file based on epoch. This is about ~600GB in size. We then split it using carlet. Carlet does splitting in a naive way where it takes one block (block here is in the IPLD sense of the word) at a time till it reaches the desired size. As a result, blocks belonging to the same dag and "connected" to eachother could be stored in separate CAR files and as a result in different filecoin deals.
This means when we try to retrieve data, only a retrieval protocol that fetches 1 block at a time i.e., bitswap, would work. This means all the separate split CAR files need to be stored with SPs who are serving data over bitswap.
However, we already have introduced the concept of a subset which collects a bunch of Blocks (in the solana sense of the word) together. Since we control which Blocks go in a subset, we could instead split the subsets in a way where each subset is <32GB and will fit in a filecoin sector. This way we have all the data for a subset in a single deal and now retrievable via bitswap as well as graphsync.
This would require the following work:
The text was updated successfully, but these errors were encountered: