Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For files over 100MB, they cannot be pushed to GitHub with regular git #20

Open
TimDaub opened this issue Aug 5, 2022 · 4 comments
Open
Labels
bug Something isn't working

Comments

@TimDaub
Copy link
Collaborator

TimDaub commented Aug 5, 2022

@TimDaub TimDaub added the bug Something isn't working label Aug 5, 2022
@TimDaub
Copy link
Collaborator Author

TimDaub commented Aug 8, 2022

The call-block-logs crawl is pretty huge. And potentially, we want to make it available to projects without the necessity to run a node.

Actually, it remains a question if we truly want that as an alternative is the user running the nodes themselves. But then still, just to keep the threshold low: What are the costs of storing roughly 1TB in the cloud or IPFS. What would be the costs of bandwidth? And do these costs motivate us to either store or not store that data?

Here's the current call-block-logs crawl starting from block: 11565020

-rw-r--r-- 1 root root 73G Aug 7 21:07 11565020-12000000
-rw-r--r-- 1 root root 187G Aug 7 22:40 12000001-13000000
-rw-r--r-- 1 root root 175G Aug 8 00:54 13000001-14000000
-rw-r--r-- 1 root root 187G Aug 8 13:10 14000001-15000000

It's 621GB for now, with probably another 100GB (721GB) being added until block 15301264 (current tip) is included.

AWS S3

Storage pricing: $0.023 per GB per month
Storage cost (721GB): $16.53 per month

Then for data pricing, we'd need to understand how many users are indeed interested in downloading this data. But the model is 721GB per user, so it's 721GB*x, where x represents the user count.

Data Transfer pricing: $0.09 per GB
Data Transfer cost (721GBx): 0.09$/GB721GB*x, so at 10 users, that'd be roughly $600.

All information is based on the official AWS pricing page [1].

Self-hosted Erigon

Erigon suggests running a node with more than 3TB of SSD storage [1]. Hetzner's AX101, which we use to run a node using rugpullindex.com, totals 111.86€ monthly [3]. Their dynamic block storage in the cloud for 2TB is 97.48€ per month.

Running an Erigon node doesn't require a bit of effort operationally. It's an initial effort to monitor the chain synching, but then it runs mostly smoothly without any necessary user intervention. Using neume, a node's interfaces can be blocked for the internet - with mostly neume handling the crawling (and later potentially the API exposure).

Unchained Index With IPFS

Trueblocks features a "unchained index" hosted on IPFS. It allows a user to directly access Ethereum data through carefully specified identifiers. For E.g. an account's entire historical balance can be accessed by crafting the identifier and then selectively downloading partial data from IPFS.

This is great as the index's identifiers have a permanence guarantee [1]. It's good because anyone can host those indexes (e.g., by pinning). But it's truly exceptional as it allows a user to traverse and directly access the data through this neat identification scheme.

So assuming we'd create and host such an index on the IPFS network, what would the costs be? Truly there are two options: One being to use e.g. Hetzner's Block Storage or an IPFS cloud provider like Infura

Infura

Infura doesn't allow storing beyond 200 GB of data. But at "Unlimited Storage $0.08/GB", they're much cheaper than AWS S3, making 721GB unchained index cost a total of 57.68€. At $0.12GB transfer costs, and a model centered around x as the total users, 0.12$/GB721GBx = $865.2 for x=10.

Pinata (IPFS)

Bandwidth; 7.5 TB per month.
Storage: 2.5TB for $1000 per month [6].

Eternum

Bandwidth: No info.
Storage: $0.14/GB per month [7].

Fleek

Bandwidth: $0.05/GB
Storage: $0.10GB per month [8].

References

@TimDaub
Copy link
Collaborator Author

TimDaub commented Aug 8, 2022

Screenshot 2022-08-08 at 15 08 04

15 data packs per month would allow us to upload the data to GitHub in this repository using https://git-lfs.github.com/ until we found a way to effectively store it on e.g. IPFS.

@shaunchurch
Copy link

It looks like Filebase.com is approx. $6/TB/month for storage and bandwidth, with the first terabyte of each included in the $5.99/month sub. I think it's IPFS backed by Sia.tech

https://docs.filebase.com/billing-and-pricing/pricing-model

Subscription
Our minimum subscription fee of $5.99 includes up to your first 1 TB of storage and 1 TB of bandwidth. 
Additional storage and outgoing bandwidth transfer is billed at $0.0059 / GB.
This subscription will renew monthly until canceled. 

@shaunchurch
Copy link

shaunchurch commented Aug 8, 2022

Have you seen https://estuary.tech / https://docs.estuary.tech?

Alpha phase
Estuary is currently in its alpha testing phase. Because of this, there are some restrictions to the service:

A maximum of 32 GB per upload. This limit will increase soon.
The service is temporarily limited to users wanting to store meaningful public data. You can apply for an invite →

Uploading data and Filecoin deals are free for now thanks to verified Filecoin deals from the Filecoin Plus program.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants