Skip to content

AWS lambda function for preparing data for FPL data analysers, such as the FPL Advisor (177arc/fpl-advisor)

License

Notifications You must be signed in to change notification settings

177arc/fpl-data

Repository files navigation

Python 3.8

AWS lambda function for calculating FPL data statistics

The purpose of this project is provide to an AWS lambda function that:

  1. retrieves data from the FPL API
  2. calculates various statistics, including expected points for each game week, using the prep-data.ipynb Jupyter notebook
  3. makes the prepared data sets available for data analysers such as the FPL Advisor. The data sets are published in the public fpl.177arc.net S3 bucket

The lambda function runs in AWS on an hourly schedule during the day and continously updates the data.

Important data points

The following data points are worth highlighting:

Expected points calculation methodology

The fundamental idea is that the best evidence for a player's ability to generate points is to look over a sliding window of past fixtures while taking into account the difficulty of the opposing team.

The expected points for each game week for each player are calculated by taking the average points earned by
each player for every event type (e.g. goals scored, goals conceded, clean sheets, yellow cards, etc.) separately over a sliding window of past fixtures (currently 12). These averages are adjusted based on the relative strength of the opposing team compared to the relative strength of the opposing teams that the player has played so far.

Features

  • Team estimated strength and expected goals based on rolling window of fixtures of this and the last season
  • Player estimated points for each game week based on opposing team strength and points earned over short rolling window of past fixtures
  • Where past data is not available, estimates for past data are patched either using manually curated data or, if not available, using machine learning
  • Data completeness indicator for each player and game week combination
  • Estimated points for each past player and game week combination to support back testing
  • Correctly handles game weeks when team does not play as well as double game weeks

For more details, see prep-data.ipynb Jupyter notebook.

Limitations

  • Although player availability data is used, the textual news information is not interpreted to project a return date as part of the longer-term expected points calculation. In short, player availibiltiy is reliable for the upcoming game week but not thereafter.
  • Only data points from the FPL API are used but no alternative data is incorporated.

List of data sets and data dictionaries

  • player_gw_next_eps_ext.csv (~120,000 data points, data dictionary): Contains a row for each player in the current season with expected points for the next game week up to the last one. The data is indexed by the player code which is unique across season.
  • players_gw_team_eps_ext.csv (~7,000,000 data points, data dictionary): Contains a row for each player and game week combination for the current and last season with the expected points for past and upcoming game weeks. The data is indexed by the player code, the season and the game week number.
  • team_fixture_stats_ext.csv (~100,000 data points, data dictionary): Contains a row for each fixture with the corresponing team info. It has stats for each fixture that are possible indicators of the outcome. These stats are eventually used in the calculation of the expected points. The data is index by the fixture code that is unique across different seasons.
  • players_history_ext.csv (~70,000 data points, data dictionary): Contains a row for each player fixture combination for the current and the last season with most attributes published by this FPL API endpoint: . The data is index by the player code and the fixture code, both of them are unique across seasons.
  • fixtures_ext.csv (~12,000 data points, data dictionary): Contains a row for each fixture in the current and the last season with most attributes published by this FPL API endpoint: . The data is indexed by the fixture code that is unique across different seasons.
  • player_teams.csv (~36,000 data points, data dictionary): Contains a row for each player in the current season with the corresponding team info. The data is index by the player code that is unique across seasons.
  • teams.csv (120 data points, data dictionary): Contains a row for each team playing in the current season with most attributes published by this FPL API endpoint: . The data is indexed by the team code that is unique across different seasons.
  • players_ext.csv (~42,000 data points, data dictionary): Contains a row for each player in the current and last season with most of the attributes published by this FPL API endpoint: . The data is indexed by the player code that is unique across seasons.
  • gws.csv (646 data points, data dictionary): Contains a row for each game week of the current season wth most of the game week attributes published by this FPL API endpoint: . The data is indexed by the game week ID.

About

AWS lambda function for preparing data for FPL data analysers, such as the FPL Advisor (177arc/fpl-advisor)

Resources

License

Stars

Watchers

Forks

Packages

No packages published