Skip to content
/ es-load Public

A Spring Boot application to parse Global Historic Climatology Network Daily (GHCND) data into json format and index into Elasticsearch.

Notifications You must be signed in to change notification settings

wtoldt/es-load

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Elasticsearch Load

A Spring Boot application to parse Global Historic Climatology Network Daily (GHCND) data into json format and index into Elasticsearch.

What is GHCND?

GHCND is the most popular data set offered at the National Centers for Environmental Information (NCEI). I chose this data set because it has useful elements (weather observations like temperature and rain) and a simple frequency (one observation per day) to work with. Refer to the GHCN readme for more details on exactly how the GHCN data works. TL;DR: ghcnd-stations.txt has the list of stations in the GHCN, each station has an id (ex: USC00045352), name, location, elevation, and other stuff. Each station will have a dly file with all the data for that station (ex: USC00045352.dly).

Usage

Included with the project is a small subset (15 stations) of the GHCND data set. The default configuration will run GHCND stations loader, then GHCND data loader, the order is important. GHCND stations loader will create and index station objects for each station in test-data/stations.txt. GHCND data loader will get all stations from Elasticsearch and parse each station file.

  • Make sure to download Elasticsearch and run it.
  • Clone.
  • Run Maven package (mvn package).
  • Execute jar file (java -jar target\es-load-0.0.1-SNAPSHOT.jar

Alternatively, you could import the project into Eclipse/Spring Tool Suite.

Planned features

  • Make GHCND data load multi-threaded.
  • Currently a data day observation is one observation for one element for one day. I want to refactor so a data day is one observation for all the elements observed by the station for one day.
  • Could potentially support more data sets.

About

A Spring Boot application to parse Global Historic Climatology Network Daily (GHCND) data into json format and index into Elasticsearch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages