From 085a30076561f0ce4de0b732a9385e4ada5217a3 Mon Sep 17 00:00:00 2001 From: Tilly Woodfield <22456167+tillywoodfield@users.noreply.github.com> Date: Mon, 5 Aug 2024 09:48:44 +0300 Subject: [PATCH] Reorder and expand README --- README.md | 105 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 59 insertions(+), 46 deletions(-) diff --git a/README.md b/README.md index 20c09a6..245212c 100644 --- a/README.md +++ b/README.md @@ -1,62 +1,81 @@ # IATI Tables -## Documentation +IATI Tables transforms IATI data into relational tables. -https://iati-tables.readthedocs.io/en/latest/ +To access the data please go to the [website](https://iati-tables.codeforiati.org/) and for more information on how to use the data please see the [documentation site](https://iati-tables.readthedocs.io/en/latest/). -## Installation +## How to run the processing job -### Backend Python code (batch job) +The processing job is a Python application which downloads the data from the [IATI Data Dump](https://iati-data-dump.codeforiati.org/), transforms the data into tables, and outputs the data in various formats such as CSV, PostgreSQL and SQLite. It is a batch job, designed to be run on a schedule. + +### Prerequisites + +- postgresql +- sqlite +- zip + +### Install Python requirements ``` -git clone https://github.com/codeforIATI/iati-tables.git -cd iati-tables python3 -m venv .ve source .ve/bin/activate -pip install -r requirements_dev.txt +pip install pip-tools +pip-sync requirements_dev.txt ``` -Install postgres, sqlite and zip. e.g. on Ubuntu: +### Set up the PostgreSQL database + +Create user `iatitables`: ``` -sudo apt install postgresql sqlite3 zip +sudo -u postgres psql -c "create user iatitables with password 'PASSWORD_CHANGEME'" ``` -Create a iatitables user and database: +Create database `iatitables` ``` -sudo -u postgres psql -c "create user iatitables with password 'PASSWORD_CHANGEME'" sudo -u postgres psql -c "create database iatitables encoding utf8 owner iatitables" ``` -Run the code: +Set `DATABASE_URL` environment variable ``` export DATABASE_URL="postgresql://iatitables:PASSWORD_CHANGEME@localhost/iatitables" -export IATI_TABLES_S3_DESTINATION=- -export IATI_TABLES_SCHEMA=iati -python -c 'import iatidata; iatidata.run_all(processes=6, sample=50)' ``` -Run with refresh=False to avoid fetching all the data every time it's run. This -is very useful for quicker debugging. +### Configure the processing job -``` -python -c 'import iatidata; iatidata.run_all(processes=6, sample=50, refresh=False)' -``` +The processing job can be configured using the following environment variables: + +`DATABASE_URL` (Required) + +- The postgres database to use for the processing job. + +`IATI_TABLES_OUTPUT` (Optional) + +- The path to output data to. The default is the directory that IATI Tables is run from. + +`IATI_TABLES_SCHEMA` (Optional) + +- The schema to use in the postgres database. + +`IATI_TABLES_S3_DESTINATION` (Optional) -`processes` is the number of processes spawned, and `sample` is the number of -publishers data processed. A sample size of 50 is pretty quick and generally -works. Smaller sample sizes, e.g. 1 fail because not all tables get created, -see https://github.com/codeforIATI/iati-tables/issues/10 +- By default, IATI Tables will output local files in various formats, e.g. pg_dump, sqlite, and CSV. To additionally upload files to S3, set the environment variable `IATI_TABLES_S3_DESTINATION` with the path to your S3 bucket, e.g. `s3://my_bucket`. -Running the tests: +### Run the processing job ``` -python -m pytest iatidata/ +python -c 'import iatidata; iatidata.run_all(processes=6, sample=50, refresh=False)' ``` -Linting: +Parameters: + +- `processes` (`int`, default=`5`): The number of workers to use for parts of the process which are able to run in parallel. +- `sample` (`int`, default=`None`): The number of datasets to process. This is useful for local development because processing the entire data dump can take several hours to run. A minimum sample size of 50 is recommended due to needing enough data to dynamically create all required tables (see https://github.com/codeforIATI/iati-tables/issues/10). +- `refresh` (`bool`, default=`True`): Whether to download the latest data at the start of the processing job. It is useful to set this to `False` when running locally to avoid re-downloading the data every time the process is run. + +## How to run linting and formating ``` isort iatidata/ @@ -65,25 +84,27 @@ flake8 iatidata/ mypy iatidata/ ``` -### Web front-end - -Install Node JS 20. e.g. on Ubuntu: +## How to run unit tests ``` -curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - -sudo apt install nodejs +python -m pytest iatidata/ ``` -Install yarn: +## How to run the web front-end + +### Prerequisites: + +- Node.js v20 + +Change the working directory: ``` -sudo npm install -g yarn +cd site ``` Install dependencies: ``` -cd site yarn install ``` @@ -93,7 +114,7 @@ Start the development server: yarn serve ``` -Build and view the site: +Or, build and view the site in production mode: ``` yarn build @@ -101,18 +122,10 @@ cd site/dist python3 -m http.server --bind 127.0.0.1 8000 ``` -### Docs +## How to run the documentation -For live preview while writing docs, run the following command and go to http://127.0.0.1:8000 +The documentation site is built with Sphinx. To view the live preview locally, run the following command: ``` sphinx-autobuild docs docs/_build/html ``` - -## Update requirements - -``` -pip install pip-tools -pip-compile --upgrade -pip-sync requirements.txt -```