The AwesomeData community [JOIN] consists primarily, although not solely, of its online presence in mailing lists and activities such as blog postings and comments, the GitHub repository, and so on. The vision of the AwesomeData community is contributing a pure list of high quality
datasets for open communities such as academia, research, education etc.
The following policy is a guideline to propose new data items and maintain existing items with outdated information:
-
A dataset is considered as
high quality
when one or more of the following criteria are met:- Uncommon to obtain in the open community legally;
- Contributing valuable knowledge for a specific domain;
- Able to be downloaded directly from the linked site, i.e., not barred by login or purchasing;
- No advertisement! No Spam! No reputation promotion!
-
A new pull request will be merged into the core repository after passing automatic validation and maintainer's review.
-
An existing dataset item with outdated information (e.g., unavailable site) will be removed after a while without new update.
It is simple to contribute to APD:
-
Fork
apd-core
repository into your own namespace such asyourname/apd-core
. -
Clone your project locally:
git clone https://github.com/yourname/apd-core.git
cd apd-core
- Create a new data entry from template
PULL_REQUEST_TEMPLATE.yml
.
For example, we want to create NEW_DATASET.yml
under category folder of Government
:
cp PULL_REQUEST_TEMPLATE.yml ./core/Government/NEW_DATASET.yml
Then edit data fields as you want:
vim ./core/Government/NEW_DATASET.yml
For data validation, it requires three essential data fields: title
, homepage
and category
, while the category
should be the same with the folder name, i.e., "Government" in the example.
In a nutshell, you should get a basic entry like
---
title: New Dataset Name
homepage: https://example.com
category: Government
- Run local test to validate your modification:
# With python
sudo pip install -r tests/requirements.txt
./tests/testing.sh
- Commit local modifications to your repository:
git add ./core/Government/NEW_DATASET.yml
git commit -m "Add NEW_DATASET under government" # Any message as you want
git push origin master
- Create a new Pull Request to the trunk repository on Github page, usually
https://github.com/yourname/apd-core/pulls