Skip to content

Work done at the H2O Open Tour NYC 2016 Hackathon, and later refinements

Notifications You must be signed in to change notification settings

jiutinghaole/WindTurbineOutputPrediction

 
 

Repository files navigation

WindTurbineOutputPrediction

This repository contains the Python and R Jupyter notebooks I used to work on H2O's Open Tour NYC Hackathon on July 19 and 20, 2016, and afterwards. See blog post at http://lucdemortier.github.io/articles/17/WindPower for a description of the results.

Contents

  • 1_data_preparation.ipynb: Reads hackathon input csv files (for training and testing), creates data frames, and pickles them for Python notebooks or feathers them for R notebooks.
  • 2_exploratory_visuals.ipynb: Generates various plots to explore the data prior to modeling.
  • 3_random_forest_regressor.ipynb: A random forest regression model which models all ten turbines as a single turbine with a "zone id" setting.
  • 4_random_forest_regressor.ipynb: A random forest regression model which separately models each of the ten turbines, using wind velocity measurements from all zones.
  • 5_xgboost_regressor.ipynb: An XGBoost regression model.
  • 6_xgboost_classifier_plus_regressor.ipynb: A combination of an XGBoost classifier and regressor. The classifier predicts which turbine outputs are zero, the regressor predicts the values of the non-zero outputs.
  • 7_gamlss_R.ipynb: A generalized linear model. This notebook runs an R kernel and uses the R package GAMLSS.
  • 8_check_solution.ipynb: Uses csv files with predictions created by the other notebooks to compute the RMSE for the hackathon's public and private leaderboards.
  • summarynoprint.R and wp_withdata.R are routines from the GAMLSS package that I had to modify slightly for the R notebook.

Problem Statement

Given daily 24-hours-at-a-time wind forecasts, predict the nominal wind turbine output for 10 turbines. The provided data are the turbine number, timestamp of the forecast, and forecasted zonal and meridional wind vectors at 10 meters and 100 meters above ground. The wind data were taken in 2012 and 2013. The training data consist of the first 19 months, and the test set of the following five months (the last month only has ten records). The public leaderboard is based on the first two months of the test dataset (Aug-2013 and Sep-2013), while the rest of the test dataset is used for the private leaderboard.

Note:

Data:

Variable Definition
ID Unique ID of observation
ZONEID Zone (turbine) ID
TIMESTAMP Date and time of observation
U10 Zonal wind velocity at 10 m above ground
V10 Meridional wind velocity at 10 m above ground
U100 Zonal wind velocity at 100 m above ground
V100 Meridional wind velocity at 100 m above ground
TARGETVAR Output of wind turbine, as a fraction of maximum capacity

To learn more about the U and V wind velocity components, click here.

The full data set (including the target variable values for the test subset used for the public and private leaderboards) is available from Dr. Tao Hong's Energy Forecasting website, under "GEFCom2014".

About

Work done at the H2O Open Tour NYC 2016 Hackathon, and later refinements

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • R 0.2%