Reorder and expand README

codeforIATI · Aug 5, 2024 · 085a300 · 085a300
1 parent 1c67178
commit 085a300
Showing 1 changed file with 59 additions and 46 deletions.
diff --git a/README.md b/README.md
@@ -1,62 +1,81 @@
 # IATI Tables
 
-## Documentation
+IATI Tables transforms IATI data into relational tables.
 
-https://iati-tables.readthedocs.io/en/latest/
+To access the data please go to the [website](https://iati-tables.codeforiati.org/) and for more information on how to use the data please see the [documentation site](https://iati-tables.readthedocs.io/en/latest/).
 
-## Installation
+## How to run the processing job
 
-### Backend Python code (batch job)
+The processing job is a Python application which downloads the data from the [IATI Data Dump](https://iati-data-dump.codeforiati.org/), transforms the data into tables, and outputs the data in various formats such as CSV, PostgreSQL and SQLite. It is a batch job, designed to be run on a schedule.
+
+### Prerequisites
+
+- postgresql
+- sqlite
+- zip
+
+### Install Python requirements
 
 ```
-git clone https://github.com/codeforIATI/iati-tables.git
-cd iati-tables
 python3 -m venv .ve
 source .ve/bin/activate
-pip install -r requirements_dev.txt
+pip install pip-tools
+pip-sync requirements_dev.txt
 ```
 
-Install postgres, sqlite and zip. e.g. on Ubuntu:
+### Set up the PostgreSQL database
+
+Create user `iatitables`:
 
 ```
-sudo apt install postgresql sqlite3 zip
+sudo -u postgres psql -c "create user iatitables with password 'PASSWORD_CHANGEME'"
 ```
 
-Create a iatitables user and database:
+Create database `iatitables`
 
 ```
-sudo -u postgres psql -c "create user iatitables with password 'PASSWORD_CHANGEME'"
 sudo -u postgres psql -c "create database iatitables encoding utf8 owner iatitables"
 ```
 
-Run the code:
+Set `DATABASE_URL` environment variable
 
 ```
 export DATABASE_URL="postgresql://iatitables:PASSWORD_CHANGEME@localhost/iatitables"
-export IATI_TABLES_S3_DESTINATION=-
-export IATI_TABLES_SCHEMA=iati
-python -c 'import iatidata; iatidata.run_all(processes=6, sample=50)'
 ```
 
-Run with refresh=False to avoid fetching all the data every time it's run. This
-is very useful for quicker debugging.
+### Configure the processing job
 
-```
-python -c 'import iatidata; iatidata.run_all(processes=6, sample=50, refresh=False)'
-```
+The processing job can be configured using the following environment variables:
+
+`DATABASE_URL` (Required)
+
+- The postgres database to use for the processing job.
+
+`IATI_TABLES_OUTPUT` (Optional)
+
+- The path to output data to. The default is the directory that IATI Tables is run from.
+
+`IATI_TABLES_SCHEMA` (Optional)
+
+- The schema to use in the postgres database.
+
+`IATI_TABLES_S3_DESTINATION` (Optional)
 
-`processes` is the number of processes spawned, and `sample` is the number of
-publishers data processed. A sample size of 50 is pretty quick and generally
-works. Smaller sample sizes, e.g. 1 fail because not all tables get created,
-see https://github.com/codeforIATI/iati-tables/issues/10
+- By default, IATI Tables will output local files in various formats, e.g. pg_dump, sqlite, and CSV. To additionally upload files to S3, set the environment variable `IATI_TABLES_S3_DESTINATION` with the path to your S3 bucket, e.g. `s3://my_bucket`.
 
-Running the tests:
+### Run the processing job
 
 ```
-python -m pytest iatidata/
+python -c 'import iatidata; iatidata.run_all(processes=6, sample=50, refresh=False)'
 ```
 
-Linting:
+Parameters:
+
+- `processes` (`int`, default=`5`): The number of workers to use for parts of the process which are able to run in parallel.
+- `sample` (`int`, default=`None`): The number of datasets to process. This is useful for local development because processing the entire data dump can take several hours to run. A minimum sample size of 50 is recommended due to needing enough data to dynamically create all required tables (see https://github.com/codeforIATI/iati-tables/issues/10).
+- `refresh` (`bool`, default=`True`): Whether to download the latest data at the start of the processing job. It is useful to set this to `False` when running locally to avoid re-downloading the data every time the process is run.
+
+## How to run linting and formating
 
 ```
 isort iatidata/
@@ -65,25 +84,27 @@ flake8 iatidata/
 mypy iatidata/
 ```
 
-### Web front-end
-
-Install Node JS 20. e.g. on Ubuntu:
+## How to run unit tests
 
 ```
-curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
-sudo apt install nodejs
+python -m pytest iatidata/
 ```
 
-Install yarn:
+## How to run the web front-end
+
+### Prerequisites:
+
+- Node.js v20
+
+Change the working directory:
 
 ```
-sudo npm install -g yarn
+cd site
 ```
 
 Install dependencies:
 
 ```
-cd site
 yarn install
 ```
 
@@ -93,26 +114,18 @@ Start the development server:
 yarn serve
 ```
 
-Build and view the site:
+Or, build and view the site in production mode:
 
 ```
 yarn build
 cd site/dist
 python3 -m http.server --bind 127.0.0.1 8000
 ```
 
-### Docs
+## How to run the documentation
 
-For live preview while writing docs, run the following command and go to http://127.0.0.1:8000
+The documentation site is built with Sphinx. To view the live preview locally, run the following command:
 
 ```
 sphinx-autobuild docs docs/_build/html
 ```
-
-## Update requirements
-
-```
-pip install pip-tools
-pip-compile --upgrade
-pip-sync requirements.txt
-```