A utility for loading data into CKAN from remote sources based on Python Pandas and ckanapi.
Ensure the required system libraries (libxml-dev, libxslt-dev, python-dev) are installed, example:- on Ubuntu/Debian based systems
sudo apt-get install libxml2-dev libxslt-dev python-dev
Installation is similar to most of other Python packages as a global python package or within a virtual enviroment.
NOTE: Using a Python virtual environment is not mandatory but it is highly recommended.
Using pip
pip install git+https://github.com/WorldBank-Transport/ckan-loaddata
Or by downloading or clonnning the source code then directly using setup.py
git clone https://github.com/WorldBank-Transport/ckan-loaddata.git cd ckan-loaddata python setup.py install
Or for development installation
git clone https://github.com/WorldBank-Transport/ckan-loaddata.git cd ckan-loaddata python setup.py develop
ckan_loaddata <path-to-your-yaml-task-file>
In order to automate periodic publishing of new dataset resources using
the ckan_loaddata
command a CRON job can be used.
Your yaml task file can be in this format
--- address: <your-ckan-host> apikey: <your-api-key> resources: - url: '<your-data-source-url>' input: format: '<input-file-format>' # other input parameters output: format: <output-file-format> # other output parameters metadata: package_id: 'your-ckan-package-id' # resource-metadata
For example:
--- address: http:ckan.example.com apikey: 'your-api-key' resources: - url: 'http://remote.example.com/remote-data-source-file-url.xls' input: format: excel output: format: csv filename: '%Y-%m-your-target-resource-file-name.csv' metadata: package_id: 'your-package-id' name: '%Y-%m: Your target resource title' url: '' format: csv
For more information about YAML file syntax you can check online
address: | A root URL of the target CKAN instance. |
---|---|
apikey: | A CKAN API key. default: |
user_agent: | The User Agent string. default: |
resources: | a collection/list of resources that to be loaded. default: |
Each resource item in the resources collection may contain the following arguments
url: | A full URL of the resource file to be loaded. |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
input: | Arguments to be use in processing the input resource file
|
||||||||||
output: | Arguments to be use in uploading the resource to CKAN
|