Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size and bandwidth limits on GitHub #4

Open
CloudyLex opened this issue Nov 4, 2019 · 17 comments
Open

size and bandwidth limits on GitHub #4

CloudyLex opened this issue Nov 4, 2019 · 17 comments

Comments

@CloudyLex
Copy link
Member

The good news is that Marios and I got the entire head of the trunk, including "BigBoy", the 0.5 GB Rydberg state radiative data, onto GitHub. That repo is
https://github.com/cloudy-astrophysics/cloudy_lfs
Marios has an older Mac that cannot install lfs but was able to download the entire trunk, including BigBoy. So that all works.

I am concerned about the file and bandwidth limits on the academic GitHub license I have. The nublado.org log shows that our monthly download varies between 18 - 40 GB. It looks like the GitHub limit is 1 GB?

@ogoann
Copy link
Member

ogoann commented Nov 6, 2019

Hi Gary,

The cost of additional bandwidth/space on github lfs is 5$/50GB/month. How does that compare to the current storage costs?

However, I am still concerned that github lfs is not the best way to go, because it requires the user to install the github extension as well (see my other question about how do we expect users to install Cloudy in the future). I just downloaded cloudy_lfs repository, and the "big boy" is not there - there is only the small refernce file of github lfs extenstion.

version https://git-lfs.github.com/spec/v1
oid sha256:7f017707990b80ee7a9bc7f74c3da5bbe0769a0fe693a696e000ea10a30c277e
size 466744290

Could the solution be to host the data base files (these change very little) on some FTP we have access to (the current one?), and have it be a part of the makefile/installation process to have these data base files directly downloaded from said FTP?

Best,
Anna

@CloudyLex
Copy link
Member Author

CloudyLex commented Nov 6, 2019 via email

@ogoann
Copy link
Member

ogoann commented Nov 7, 2019

Hi Gary,

I am still thinking about this. I cannot find any information whether regular github repository has any bandwidth limits, in which case if you keep all files <100 MB (and then combine them with makefile or something), things should be fine?

Best,
Anna

@ogoann
Copy link
Member

ogoann commented Nov 7, 2019

Gary,

I have no time right now to look in detail into this, but I wonder if https://zenodo.org/ could be the answer, as it is a place where we could store version of Cloudy that users should download/and or just the big files. There seems ot be a 50 GB limit for the size (but exceptions can be granted, and with Cloud's citation count this should be no problem, but we don't even need this currently). There is no download limit for users. One way that I see this working is that this is where data bases are stored, as well as tarred files for installation, and github is used for code development, sharing user examples and Cloudy papers, and user questions.

https://about.zenodo.org/policies/
https://about.zenodo.org/terms/
-a

@Morisset
Copy link
Member

Morisset commented Nov 7, 2019 via email

@CloudyLex
Copy link
Member Author

CloudyLex commented Nov 7, 2019 via email

@ogoann
Copy link
Member

ogoann commented Nov 7, 2019

I think this is supposed to be extremely open, and I am not aware of any geographical restrictions. I also believe they are committed to make sure that this is a stable place to store scientific data. Btw, they also have "Communities" feature that could be useful for Cloudy workshops? Haven't really checked it out thoroughly though.

@CloudyLex
Copy link
Member Author

CloudyLex commented Nov 7, 2019 via email

@ogoann
Copy link
Member

ogoann commented Nov 7, 2019

ADS lists more than 2k citations to zenodo record just this year !

@CloudyLex
Copy link
Member Author

CloudyLex commented Nov 7, 2019 via email

@CloudyLex
Copy link
Member Author

CloudyLex commented Nov 7, 2019 via email

@ogoann
Copy link
Member

ogoann commented Nov 7, 2019

Hi Gary, unsure what you mean. The point I was trying to make is that zenodo seems to be more and more popular in astro (just looking at the statistics). Specific papers either cite other people's codes that they used, or link to their own. I believe that arxiv is down right now, hence the broken links.

-a

@Morisset
Copy link
Member

Morisset commented Nov 8, 2019 via email

@will-henney
Copy link
Member

A couple of quick responses to this. I tend to agree with Anna that LFS is not the way to go. Here is a comment of mine from the private email thread of last month:

Thanks for pointing that out - I had forgotten that conversation from 18 months ago. Are you sure that LFS is needed though. According to the following page, the hard limits are 100MB per file and 100GB (!!!) per repository.

https://help.github.com/en/github/managing-large-files/what-is-my-disk-quota

They do mention a softer limit of 1GB, where they send you a polite email, but hopefully that is negotiable. I just looked at my copy and the .git directory is 1.1G and the only seriously large file is data/hydro_tpnl.dat at 448M - I can't find anything else bigger than 100MB

If it were possible to keep under the 100MB-per-file limit, then I think we would not have to pay anything on github, however large the total bandwidth use is.

I would also second the idea of using Zenodo or similar to host the released versions of Cloudy. That would mean that it would only be developers and power-users who would be directly downloading from Github. The zenodo size limit is 50 GB per record, which gives plenty of room for growth (a record would be, for instance, a single released version).

@CloudyLex
Copy link
Member Author

CloudyLex commented Nov 15, 2019 via email

@will-henney
Copy link
Member

Hi Gary,

But the bandwidth limits only apply if using LFS. There are no bandwidth limits on regular repos are there? Just the restriction of no individual files larger than 100MB.

I will open a separate thread on your yahoo query

@ogoann
Copy link
Member

ogoann commented Nov 15, 2019

I second Will. If we split all files to 100 MB chunks and have github be used only for development, while zenodo is used for downloading the code by users, then this is a long term sustainable solution.

I also want to remind us that github offers free web hosting, meaning that the webpage (which could of course link to groups.io and zenodo) and wiki all can be in one place. This makes it much easier to maintain the webpage in the future, since it's code would also be on github and people can submit issues/requests/bugs and generally help out with maintaining it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants