-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
list of contributions #2
Comments
Yeah, that makes sense. Just for my own brain's sake, this is how I imagine this to happen:
Although I would still keep the option around to include local repositories (for testing unpublished datasets in a local instance). |
What about something like this:
Or maybe we hardwire two |
Actually, we might want to push the analogy with dictionaria even further: I.e. have separate repository for the editorial backend - even though right now this might just look like
I'd guess we might want to include at least a short textual description with each dataset, so this may be additional content here. |
Sounds good. We could go even more analogous and split the
|
True. I'm a bit torn, though, because I wouldn't want CrossGram to add information to datasets, which should be already included in the dataset on Zenodo. But then, I can already think of
and possibly
|
Yeah, it definitely should not contain the wealth of information that Dictionaria's metadata files provide. I wouldn't make the metadata much more elaborate than the entries in the current
( |
Yes. I'm not really happy with the path for |
Or actually have |
Yup, that was what I was about to suggest, too. |
Btw. here's my code to download from zenodo (we might want to include it here, until I get around to finishing the import io
import re
import json
import pathlib
import zipfile
import urllib.request
from bs4 import BeautifulSoup as bs
import requests
def download_from_doi(doi, outdir=pathlib.Path('.')):
res = requests.get('https://doi.org/{0}'.format(doi))
assert re.search('zenodo.org/record/[0-9]+$', res.url)
res = requests.get(res.url + '/export/json')
soup = bs(res.text, 'html.parser')
res = json.loads(soup.find('pre').text)
assert any(kw.startswith('cldf:') for kw in res['metadata']['keywords'])
for f in res['files']:
if f['type'] == 'zip':
r = requests.get(f['links']['self'], stream=True)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall(str(outdir))
elif f['type'] == 'gz':
# what about a tar in there?
raise NotImplementedError()
elif f['type'] == 'gz':
raise NotImplementedError()
else:
urllib.request.urlretrieve(
f['links']['self'],
outdir / f['links']['self'].split('/')[-1],
)
return outdir The resulting directory can then be searched for datasets using |
Huh…
(<_<)" |
I feel stupid for not specifying something like "known locations" for the metadata files in the CLDF standard :) |
Well, checking for the |
My ideal for the list of contributions to be included in CrossGram would be a simple list of Zenodo DOIs for (particular versions of) datasets. (I have some code - to be included in cldf-zenodo - to fetch such datasets from Zenodo.)
The CrossGram app would then function mostly as (selective) catalogue of CLDF datasets on Zenodo, augmented with as much visualization as is possible generically.
The text was updated successfully, but these errors were encountered: