Skip to content

Commit

Permalink
Reorder and expand README
Browse files Browse the repository at this point in the history
  • Loading branch information
tillywoodfield committed Aug 5, 2024
1 parent 1c67178 commit 085a300
Showing 1 changed file with 59 additions and 46 deletions.
105 changes: 59 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,81 @@
# IATI Tables

## Documentation
IATI Tables transforms IATI data into relational tables.

https://iati-tables.readthedocs.io/en/latest/
To access the data please go to the [website](https://iati-tables.codeforiati.org/) and for more information on how to use the data please see the [documentation site](https://iati-tables.readthedocs.io/en/latest/).

## Installation
## How to run the processing job

### Backend Python code (batch job)
The processing job is a Python application which downloads the data from the [IATI Data Dump](https://iati-data-dump.codeforiati.org/), transforms the data into tables, and outputs the data in various formats such as CSV, PostgreSQL and SQLite. It is a batch job, designed to be run on a schedule.

### Prerequisites

- postgresql
- sqlite
- zip

### Install Python requirements

```
git clone https://github.com/codeforIATI/iati-tables.git
cd iati-tables
python3 -m venv .ve
source .ve/bin/activate
pip install -r requirements_dev.txt
pip install pip-tools
pip-sync requirements_dev.txt
```

Install postgres, sqlite and zip. e.g. on Ubuntu:
### Set up the PostgreSQL database

Create user `iatitables`:

```
sudo apt install postgresql sqlite3 zip
sudo -u postgres psql -c "create user iatitables with password 'PASSWORD_CHANGEME'"
```

Create a iatitables user and database:
Create database `iatitables`

```
sudo -u postgres psql -c "create user iatitables with password 'PASSWORD_CHANGEME'"
sudo -u postgres psql -c "create database iatitables encoding utf8 owner iatitables"
```

Run the code:
Set `DATABASE_URL` environment variable

```
export DATABASE_URL="postgresql://iatitables:PASSWORD_CHANGEME@localhost/iatitables"
export IATI_TABLES_S3_DESTINATION=-
export IATI_TABLES_SCHEMA=iati
python -c 'import iatidata; iatidata.run_all(processes=6, sample=50)'
```

Run with refresh=False to avoid fetching all the data every time it's run. This
is very useful for quicker debugging.
### Configure the processing job

```
python -c 'import iatidata; iatidata.run_all(processes=6, sample=50, refresh=False)'
```
The processing job can be configured using the following environment variables:

`DATABASE_URL` (Required)

- The postgres database to use for the processing job.

`IATI_TABLES_OUTPUT` (Optional)

- The path to output data to. The default is the directory that IATI Tables is run from.

`IATI_TABLES_SCHEMA` (Optional)

- The schema to use in the postgres database.

`IATI_TABLES_S3_DESTINATION` (Optional)

`processes` is the number of processes spawned, and `sample` is the number of
publishers data processed. A sample size of 50 is pretty quick and generally
works. Smaller sample sizes, e.g. 1 fail because not all tables get created,
see https://github.com/codeforIATI/iati-tables/issues/10
- By default, IATI Tables will output local files in various formats, e.g. pg_dump, sqlite, and CSV. To additionally upload files to S3, set the environment variable `IATI_TABLES_S3_DESTINATION` with the path to your S3 bucket, e.g. `s3://my_bucket`.

Running the tests:
### Run the processing job

```
python -m pytest iatidata/
python -c 'import iatidata; iatidata.run_all(processes=6, sample=50, refresh=False)'
```

Linting:
Parameters:

- `processes` (`int`, default=`5`): The number of workers to use for parts of the process which are able to run in parallel.
- `sample` (`int`, default=`None`): The number of datasets to process. This is useful for local development because processing the entire data dump can take several hours to run. A minimum sample size of 50 is recommended due to needing enough data to dynamically create all required tables (see https://github.com/codeforIATI/iati-tables/issues/10).
- `refresh` (`bool`, default=`True`): Whether to download the latest data at the start of the processing job. It is useful to set this to `False` when running locally to avoid re-downloading the data every time the process is run.

## How to run linting and formating

```
isort iatidata/
Expand All @@ -65,25 +84,27 @@ flake8 iatidata/
mypy iatidata/
```

### Web front-end

Install Node JS 20. e.g. on Ubuntu:
## How to run unit tests

```
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install nodejs
python -m pytest iatidata/
```

Install yarn:
## How to run the web front-end

### Prerequisites:

- Node.js v20

Change the working directory:

```
sudo npm install -g yarn
cd site
```

Install dependencies:

```
cd site
yarn install
```

Expand All @@ -93,26 +114,18 @@ Start the development server:
yarn serve
```

Build and view the site:
Or, build and view the site in production mode:

```
yarn build
cd site/dist
python3 -m http.server --bind 127.0.0.1 8000
```

### Docs
## How to run the documentation

For live preview while writing docs, run the following command and go to http://127.0.0.1:8000
The documentation site is built with Sphinx. To view the live preview locally, run the following command:

```
sphinx-autobuild docs docs/_build/html
```

## Update requirements

```
pip install pip-tools
pip-compile --upgrade
pip-sync requirements.txt
```

0 comments on commit 085a300

Please sign in to comment.