-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
size and bandwidth limits on GitHub #4
Comments
Hi Gary, The cost of additional bandwidth/space on github lfs is 5$/50GB/month. How does that compare to the current storage costs? However, I am still concerned that github lfs is not the best way to go, because it requires the user to install the github extension as well (see my other question about how do we expect users to install Cloudy in the future). I just downloaded cloudy_lfs repository, and the "big boy" is not there - there is only the small refernce file of github lfs extenstion.
Could the solution be to host the data base files (these change very little) on some FTP we have access to (the current one?), and have it be a part of the makefile/installation process to have these data base files directly downloaded from said FTP? Best, |
Hi Anna,
I am even more concerned about the bandwidth. The current trunk is 1.,2 GB
with Big Boy and 0.73 GB without. The c17 checkout is 0.5 GB. The
bandwidth stats I quoted were probably mainly c17 exports (although anybody
could get our trunk or the whole repository). The citation rate for
Cloudy's documentation is going up about 20% per year. I have not checked
how the bandwidth is changing, but expect it is rising due to an increasing
number of users and will rise due to the size of the checkout.
Big Boy was a design error. It allows for very high precision radiative
data up in the Rydberg levels. As Robin pointed out, those data are known
from asymptotic limits. We do not need precision up there since
populations of high levels are dominated by collisions even for modest
densities. The collision rates are mainly from Born approximation formulae
(the so-called g-bar approximation) and are highly uncertain. Without Big
Boy, allowing for some growth in the trunk over the next year, and a
reasonable rate of usage increase, we will soon be looking at ~100 GB/month.
The current nublado.org is hosted by Webfaction which was purchased by
GoDaddy within the last year. Their future is unclear due to negative
vibes about GoDaddy, but nothing has changed yet. We now pay Webfaction
$10/month which allows for 1 TB / month. The problems are that we have to
maintain it and my university is uncomfortable using grants to pay for
external web sites (they are terrified of an OMB audit which would affect
the NIG grants over in the medical school and university hospital as
collateral damage). Similarly, the university has its computers behind a
firewall which requires access through a VPN. They have a firm policy of
discouraging computers being openly exposed to the internet. We can't host
anything here - I have checked.
(They did let me keep the Cloudy summer school site on cloud9 but said
that was the limit of what they would allow).
Like most universities, we have a relationship with Google and unlimited
access to a university-related but Google-hosted Google drive. Tarballs
could be placed up there and made public. China could not access that, due
to China not Google, but most Chiese universities have ways to get to the
open web. But if development were out in the open on GitHub then anybody
could clone whatever we have up there so the bandwidth hit on GitHub might
still be over the limit.
It is curious that there is no simple way to host a broadly used community
project like Cloudy. Seems like NASA or NSF would have an interest in
helping out directly. They do help with grants but that brings in our
accountants.
thanks for any further ideas,
Gary
…On Wed, Nov 6, 2019 at 1:38 PM Anna Ogorzalek ***@***.***> wrote:
Hi Gary,
The cost of additional bandwidth/space on github lfs is 5$/50GB/month. How
does that compare to the current storage costs?
However, I am still concerned that github lfs is not the best way to go,
because *it requires the user to install the github extension as well*
(see my other question about how do we expect users to install Cloudy in
the future). I just downloaded cloudy_lfs repository, and the "big boy" is
not there - there is only the small refernce file of github lfs extenstion.
version https://git-lfs.github.com/spec/v1
oid sha256:7f017707990b80ee7a9bc7f74c3da5bbe0769a0fe693a696e000ea10a30c277e
size 466744290
Could the solution be to host the data base files (these change very
little) on some FTP we have access to (the current one?), and have it be a
part of the makefile/installation process to have these data base files
directly downloaded from said FTP?
Best,
Anna
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=ANFITNCQYB7JABPXZJQMVLDQSMFL3A5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDHRZKA#issuecomment-550444200>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANFITNC32JBYZUZIPMBAGWLQSMFL3ANCNFSM4JI3KGGQ>
.
--
Gary J. Ferland
Physics, Univ of Kentucky
Lexington KY 40506 USA
Tel: 859 257-8795
https://pa.as.uky.edu/users/gary
|
Hi Gary, I am still thinking about this. I cannot find any information whether regular github repository has any bandwidth limits, in which case if you keep all files <100 MB (and then combine them with makefile or something), things should be fine? Best, |
Gary, I have no time right now to look in detail into this, but I wonder if https://zenodo.org/ could be the answer, as it is a place where we could store version of Cloudy that users should download/and or just the big files. There seems ot be a 50 GB limit for the size (but exceptions can be granted, and with Cloud's citation count this should be no problem, but we don't even need this currently). There is no download limit for users. One way that I see this working is that this is where data bases are stored, as well as tarred files for installation, and github is used for code development, sharing user examples and Cloudy papers, and user questions. https://about.zenodo.org/policies/ |
It should be quite easy to test any limit in bandwidth. How much time do we
need to download the package to reach it?
The zip file on github is 185M. The git clone leads to a package of 910M.
If I tar.gz this package, I obtain a 364M file (182 M are in the .git
directory...).
Anyway, downloading the package costs close to 200M. The mean download is
20G/month said Gary, i.e. close to 100 downloads. Can each of us install 20
times cloudy during the next 24 h and we will see if any limitation is
reached?
Christophe
Le jeu. 7 nov. 2019 à 09:14, Anna Ogorzalek <[email protected]> a
écrit :
… Hi Gary,
I am still thinking about this. I cannot find any information whether
regular github repository has any bandwidth limits, in which case if you
keep all files <100 MB (and then combine them with makefile or something),
things should be fine?
Best,
Anna
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=AADMER6ALPAEZ5LWULHCFHLQSREFZA5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNELXY#issuecomment-551175647>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADMER5TP45OCKWXOEZJNCLQSREFZANCNFSM4JI3KGGQ>
.
--
Dr. Christophe MORISSET Tel: +52 646 174 45 80 ext 230
Instituto de Astronomia UNAM
Apdo. Postal 106, C.P. 22800
Ensenada Baja California MEXICO
|
Cern funds https://zenodo.org/
In past encounters with them, they have been free and open with Euros but
had different rules for this side of the pond. They might be ideal if they
will let us in.
thanks, Anna!
Gary
…On Thu, Nov 7, 2019 at 1:04 PM Anna Ogorzalek ***@***.***> wrote:
Gary,
I have no time right now to look in detail into this, but I wonder if
https://zenodo.org/ is an answer, as a place where we could store version
of Cloudy that users should downloaded. There seems ot be a 50 GB limit for
the size (but exceptions can be granted, and with Cloud's citation count
this should be no problem, but we don't even need this currently). There is
no download limit for users. One way that I see this working is that this
is where data bases are stored, as well as tarred files for installation,
and github is used for code development, sharing user examples and Cloudy
papers, and user questions.
-a
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=ANFITNGOP5OQ4G6HJ4BVAT3QSRKC5A5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNJJCQ#issuecomment-551195786>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANFITNAWRQ4KR6VW24ZOUDDQSRKC5ANCNFSM4JI3KGGQ>
.
--
Gary J. Ferland
Physics, Univ of Kentucky
Lexington KY 40506 USA
Tel: 859 257-8795
https://pa.as.uky.edu/users/gary
|
I think this is supposed to be extremely open, and I am not aware of any geographical restrictions. I also believe they are committed to make sure that this is a stable place to store scientific data. Btw, they also have "Communities" feature that could be useful for Cloudy workshops? Haven't really checked it out thoroughly though. |
my interactions with Cern where over software they developed and may have
been well more than a decade ago. I agree that the landscape is different
(better) now.
…On Thu, Nov 7, 2019 at 1:56 PM Anna Ogorzalek ***@***.***> wrote:
I think this is supposed to be extremely open, and I am not aware of any
geographical restrictions. I also believe they are committed to make sure
that this is a stable place to store scientific data. Btw, they also have
"Communities" feature that could be useful for Cloudy workshops? Haven't
really checked it out thoroughly though.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=ANFITNGJD5T6KFMQGFFLPMDQSRQHNA5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNOEEY#issuecomment-551215635>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANFITNECS3EX3Q5N4ZUWROTQSRQHNANCNFSM4JI3KGGQ>
.
--
Gary J. Ferland
Physics, Univ of Kentucky
Lexington KY 40506 USA
Tel: 859 257-8795
https://pa.as.uky.edu/users/gary
|
ADS lists more than 2k citations to zenodo record just this year ! |
it would be interesting to hear what Robin things about this - he would be
far more tuned into what is going on over there.
…On Thu, Nov 7, 2019 at 2:03 PM Anna Ogorzalek ***@***.***> wrote:
ADS lists
<https://ui.adsabs.harvard.edu/search/p_=0&q=%20full%3A%22zenodo%22&sort=date%20desc%2C%20bibcode%20desc>
more than 2k citations to zenodo record just this year !
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=ANFITNBSZPWQ5SLL3CBM46LQSRQ6JA5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNOXTY#issuecomment-551218127>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANFITNHDJKSN4ITFNHBO6FTQSRQ6JANCNFSM4JI3KGGQ>
.
--
Gary J. Ferland
Physics, Univ of Kentucky
Lexington KY 40506 USA
Tel: 859 257-8795
https://pa.as.uky.edu/users/gary
|
following through on the ADS links - i got to mostly broken links to arxiv
and one where they linked to a copy of the paper on zenodo. Please use it
for publication archives?
…On Thu, Nov 7, 2019 at 2:03 PM Anna Ogorzalek ***@***.***> wrote:
ADS lists
<https://ui.adsabs.harvard.edu/search/p_=0&q=%20full%3A%22zenodo%22&sort=date%20desc%2C%20bibcode%20desc>
more than 2k citations to zenodo record just this year !
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=ANFITNBSZPWQ5SLL3CBM46LQSRQ6JA5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNOXTY#issuecomment-551218127>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANFITNHDJKSN4ITFNHBO6FTQSRQ6JANCNFSM4JI3KGGQ>
.
--
Gary J. Ferland
Physics, Univ of Kentucky
Lexington KY 40506 USA
Tel: 859 257-8795
https://pa.as.uky.edu/users/gary
|
Hi Gary, unsure what you mean. The point I was trying to make is that zenodo seems to be more and more popular in astro (just looking at the statistics). Specific papers either cite other people's codes that they used, or link to their own. I believe that arxiv is down right now, hence the broken links. -a |
There is ways to connect a github repository with a zenodo account. I used
it to store the Pyneb and pyCloudy packages. It automatically creates new
zenodo version when a new tag is made on github. It allows to have a DOI
for the code, and for each version.
Example for pyneb, which I wrongly named Pyneb_devel, while a devel branch
was only necessary for what I wanted to do:
https://doi.org/10.5281/zenodo.1246922
Ch.
Le jeu. 7 nov. 2019 à 12:07, Anna Ogorzalek <[email protected]> a
écrit :
Hi Gary, unsure what you mean. The point I was trying to make is that
zenodo seems to be more and more popular in astro (just looking at the
statistics). Specific papers either cite other people's codes that they
used, or link to their own. I believe that arxiv is down right now, hence
the broken links.
-a
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=AADMER55C42JQ3UMKXEP2OTQSRYRZA5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNUWZQ#issuecomment-551242598>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADMER3JZPS2SVUBTGHTTADQSRYRZANCNFSM4JI3KGGQ>
.
--
Dr. Christophe MORISSET Tel: +52 646 174 45 80 ext 230
Instituto de Astronomia UNAM
Apdo. Postal 106, C.P. 22800
Ensenada Baja California MEXICO
|
A couple of quick responses to this. I tend to agree with Anna that LFS is not the way to go. Here is a comment of mine from the private email thread of last month:
If it were possible to keep under the 100MB-per-file limit, then I think we would not have to pay anything on github, however large the total bandwidth use is. I would also second the idea of using Zenodo or similar to host the released versions of Cloudy. That would mean that it would only be developers and power-users who would be directly downloading from Github. The zenodo size limit is 50 GB per record, which gives plenty of room for growth (a record would be, for instance, a single released version). |
Hi Will,
Thanks for the comments - intense days with setting up the new group and
figuring out how to sunset the old one.
a single checkout of the trunk is over 1 GB - I don't have the numbers
right now. The bandwidth on GitHub is so small that we could not do two
checkouts in a month. I don't see this working - do you? You know far
more than I do.
The "big boy", the huge file that is a good fraction of the checkout, was a
design error and will be removed. That will help.
any suggestions on how to move the user community from yahoo to groups.io?
Everyone must have received the email when the transfer happened. Most
people are busy and will not pay attention.
Gary
…On Thu, Nov 14, 2019 at 1:42 PM William Henney ***@***.***> wrote:
A couple of quick responses to this. I tend to agree with Anna that LFS is
not the way to go. Here is a comment of mine from the private email thread
of last month:
Thanks for pointing that out - I had forgotten that conversation from 18
months ago. Are you sure that LFS is needed though. According to the
following page, the hard limits are 100MB per file and 100GB (!!!) per
repository.
https://help.github.com/en/github/managing-large-files/what-is-my-disk-quota
They do mention a softer limit of 1GB, where they send you a polite email,
but hopefully that is negotiable. I just looked at my copy and the .git
directory is 1.1G and the only seriously large file is data/hydro_tpnl.dat
at 448M - I can't find anything else bigger than 100MB
If it were possible to keep under the 100MB-per-file limit, then I think
we would not have to pay anything on github, however large the total
bandwidth use is.
I would also second the idea of using Zenodo or similar to host the
*released* versions of Cloudy. That would mean that it would only be
developers and power-users who would be directly downloading from Github.
The zenodo size limit is 50 GB *per record*, which gives plenty of room
for growth (a record would be, for instance, a single released version).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=ANFITNF53VHGNAFPUJKORPTQTWL2RA5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEC3VOA#issuecomment-554023608>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANFITNBZO2XWVEWYPZNCTXTQTWL2RANCNFSM4JI3KGGQ>
.
--
Gary J. Ferland
Physics, Univ of Kentucky
Lexington KY 40506 USA
Tel: 859 257-8795
https://pa.as.uky.edu/users/gary
|
Hi Gary, But the bandwidth limits only apply if using LFS. There are no bandwidth limits on regular repos are there? Just the restriction of no individual files larger than 100MB. I will open a separate thread on your yahoo query |
I second Will. If we split all files to 100 MB chunks and have github be used only for development, while zenodo is used for downloading the code by users, then this is a long term sustainable solution. I also want to remind us that github offers free web hosting, meaning that the webpage (which could of course link to groups.io and zenodo) and wiki all can be in one place. This makes it much easier to maintain the webpage in the future, since it's code would also be on github and people can submit issues/requests/bugs and generally help out with maintaining it. |
The good news is that Marios and I got the entire head of the trunk, including "BigBoy", the 0.5 GB Rydberg state radiative data, onto GitHub. That repo is
https://github.com/cloudy-astrophysics/cloudy_lfs
Marios has an older Mac that cannot install lfs but was able to download the entire trunk, including BigBoy. So that all works.
I am concerned about the file and bandwidth limits on the academic GitHub license I have. The nublado.org log shows that our monthly download varies between 18 - 40 GB. It looks like the GitHub limit is 1 GB?
The text was updated successfully, but these errors were encountered: