- Linux Operating System (E.g., Ubuntu)
- PostGres and PostGIS
- Python 3.10 or above
- Psycopg2 and alive_progress Python Package
pip install psycopg2-binary alive_progress
- Google Maps API key
- Data
-
Create a database in postgres to store the air pollution data.
Create database soramame
-
Connect to the database
\c soramame
-
Enable PostGIS extension to the database (admin privileges are needed)
create extension postgis
-
Create a table to store the location information of the sensors.
CREATE TABLE station_info(stationid varchar not null primary key, name varchar, address varchar, location geography(POINT,4326));
-
Create a table to store the hourly observations of the sensors on a daily basis.
create table hourly_observations(stationid varchar not null, obsDate timestamp, SO2 double precision, no double precision, no2 double precision, nox double precision, co double precision, ox double precision, nmhc double precision, ch4 double precision, thc double precision, spm double precision, pm25 double precision, sp double precision, wd varchar, ws double precision, temp double precision, hum double precision, constraint SOH unique (stationID,obsDate));
Note:
- Do not establish primary key and foreign key relation for stationid attribute in station_info and hourly_observations tables
- Constraint SOH unique (stationID,obsDate) is used to prevent data repetition. A sensor cannot have multiple transactions with the same timestamp.
-
[Optional Steps:] We recommend it for the users who are not familar with the concepts of PostGres
- Create a user to work with this data
CREATE USER aeros WITH PASSWORD 'aeros123';
- Grant access for the new user to work on this data
GRANT CONNECT ON DATABASE soramame TO aeros;
- Grant permission to read and write data in the tables of the database
GRANT SELECT, INSERT ON ALL TABLES IN SCHEMA public TO aeros;
- Create a user to work with this data
-
Exit from the Postgres database.
control + D
- Visit the sensors location page in AEROS website
- Copy the first three columns (stationID, stationName, stationAddress) of all the stations and store them in an excel file.
- Carefully convert (or export) the Excel file into a csv file named 'rawStationsInfo.csv'.
- Execute the below Python program that reads the address of each sensors and derives the spatial location.
python3 address2LatLong.py rawStationsInfo.csv finalStationsInfo.csv <googleMaps API key>
- Open the Python file 'insertStationInfoInDatabase.py'
vi insertStationInfoInDatabase.py
- Go to the following line:
conn = psycopg2.connect(database="soramame", user="", password="", host="", port=5432)
- Specify the appropriate database, user, password, host ipaddress of postgres, and port number.
conn = psycopg2.connect(database="soramame", user="aeros", password="aeros123", host="163.143.165.136", port=5432)
- Save the file and exit.
:wq
- Execute the following command to store the station information into the database.
python3 insertStationInfoInDatabase.py finalStationsInfo.csv
- Visit the download page of AEROS website
- In the first dropdown menu choose any month other than current month. For example, if the current month is November, choose any previous months, such as August and September.
- In the second dropdown menu choose the default option, which is 'nationwide'.
- Click the download button.
- A zip file named 'data.zip' will be downloaded onto your local computer.
- Create a folder 'hourlyData'
mkdir hourlyData
- Unzip the data.zip file by saving its contents in hourlyData folder.
unzip -d hourlyData ~/Downloads/data.zip
-
[Optional] If you have multiple zips representing various months, perform this two steps.
- Enter into the hourlyData directory
cd hourlyData
- Create a shell script file using the following command
vi uncompressNewZipFiles.sh
- Copy the below provided shell script code and paste it in the above file
#add the below provided code zipFiles=`ls *.zip` for eachZipFile in $zipFiles do unzip $eachZipFile rm $eachZipFile done
- Execute the shell script
sh uncompressNewZipFiles.sh
- remove the shell script file
rm -rf uncompressNewZipFiles.sh
- Enter into the hourlyData directory
-
Delete or rename the zip file to yyyymm_00.zip for backup.
#format: mv data.zip yyyymmdd_00.zip mv data.zip 20240101_00.zip
-
Move back to the parent directory
cd ..
-
Open the Python file 'insertNewHourlyObservationsData.py'
vi insertNewHourlyObservationsData.py
-
Go to the following line:
conn = psycopg2.connect(database="", user="", password="", host="", port=5432)
-
Specify the appropriate database, user, password, host ipaddress of postgres, and port number.
conn = psycopg2.connect(database="soramame", user="aeros", password="aeros123", host="163.143.165.136", port=5432)
-
Save the file and exit.
-
Run the Python program 'insertNewHourlyObservationsData.py' by specifying the folder.
python3 insertNewHourlyObservationsData.py ./hourlyData
Duration of the data: 2018-01-01 to 2021-03-31
- Visit the download page of AEROS website
- In the first dropdown menu choose any month other than current month. For example, if the current month is November, choose any previous months, such as August and September.
- In the second dropdown menu choose the default option, which is 'nationwide'.
- Click the download button.
- A zip file named 'data.zip' will be downloaded onto your local computer.
-
Create a temporary directory, say temp.
mkdir temp
-
Move or upload the zip files into the temp directory.
-
Enter into the temp directory
cd temp
-
Create a shell script file to read every zip file and uncompress it.
vi uncompressOldZipFiles.sh
-
Copy and paste the following shell script
#add the below provided code zipFiles=`ls *.zip` for eachZipFile in $zipFiles do unzip $eachZipFile rm $eachZipFile done subZipFiles=`ls *.zip` for eachZipfile in $subZipFiles do echo 'unzipping ' $eachZipfile unzip $eachZipfile rm -rf $eachZipfile done
-
Execute the shell script.
sh uncompressOldZipFiles.sh
The above program will create the folders '01' to '47'. Each folder represents a Prefecture in Japan.
-
Delete the shell script file
rm -rf uncompressOldZipFiles.sh
-
Move back to the parent directory
cd ..
-
Open the Python file 'insertOldHourlyObservationsData.py'
vi insertOldHourlyObservationsData.py
-
Go to the following line:
conn = psycopg2.connect(database="", user="", password="", host="", port=5432)
-
Specify the appropriate database, user, password, host ipaddress of postgres, and port number.
conn = psycopg2.connect(database="soramame", user="aeros", password="aeros123", host="163.143.165.136", port=5432)
-
Save the file and exit.
-
Run the Python program 'insertNewHourlyObservationsData.py' by specifying the folder.
python3 insertOldHourlyObservationsData.py ./temp