Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[StatsPerform] Bugfixes for tracking data (MA25) + Support for event data (MA3) #310

Merged
merged 8 commits into from
Jun 7, 2024

Conversation

probberechts
Copy link
Contributor

Bugfixes for deserializing Stats Perform tracking data (MA1 + MA25 feed)

First, this PR fixes a few bugs in the tracking deserializer:

  • Use the correct coordinate system and pitch dimensions (fixes Incorrect Stats Perform coordinate system #299).
  • Remove the "BALL_OWNING_TEAM" flag. The Stats Perform tracking data does not define which team has possession.
  • The periods field of the metadata should be a list, not a dict.
  • Skip frames in which the game is paused when only_alive = True.

Support for deserializing Stats Perform event data (MA1 + MA3 feed)

Second, it adds support for deserializing event stream data. The content of the Stats Perform MA3 feed is identical to the Opta F27 feed, only the format is different. To avoid duplicate code, I refactored the OptaDeserializer to first parse the data into OptaEvents which are subsequently deserialized to kloppy objects. I also renamed the OptaDeserializer to StatsPerformDeserializer to convey that it now deserializes feeds that are distributed by Stats Perform (and not only the legacy Opta feeds).

from kloppy import statsperform

dataset = statsperform.load_event(
    ma1_data="kloppy/tests/files/statsperform_event_ma1.json",
    ma3_data="kloppy/tests/files/statsperform_event_ma3.json",
    coordinates="opta"
)

Both the XML and JSON versions of the feed are supported.

Other changes

  • All coordinate systems can now be created without pitch dimensions
  • Raise an informative warning when the default pitch dimensions are used to perform a coordinate system transformation

- Use the correct coordinate system and pitch dimensions
- Remove "BALL_OWNING_TEAM" flag
- Metadata.periods should be a list, not a dict
- Skip frames for which ball_state = DEAD when only_alive = True

Fixes PySport#299.
@probberechts probberechts force-pushed the feat/statsperform-ma23 branch from 49ac769 to 31db470 Compare April 13, 2024 21:14
@JanVanHaaren JanVanHaaren self-requested a review April 22, 2024 19:31
@koenvo koenvo added this to the 3.15 milestone Apr 23, 2024
@koenvo
Copy link
Contributor

koenvo commented May 22, 2024

@JanVanHaaren were you able to give this a review?

@JanVanHaaren JanVanHaaren self-assigned this May 27, 2024
Copy link
Collaborator

@JanVanHaaren JanVanHaaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I haven't been able to test each potential scenario, this pull request generally looks good to me. Thank you for this nice contribution!

I suggest to replace the mentions of ma25_data with opt_data. Stats Perform typically refers to the raw tracking data as the OPT data. While the MA25 feed provides a download link to the OPT data, the OPT data can also be obtained directly through other Stats Perform platforms.

@probberechts
Copy link
Contributor Author

I was also unsure how to name that input parameter since it is indeed not the MA25 feed itself but the txt file that is referenced in the feed. However, I had never heard of OPT data before, the Stats Perform API docs do not seem to make any mention of it and I cannot find anything via Google either. Hence, from my personal experience, I assume that people can figure out what is meant by ma25_data but I would not know what opt_data is (moreover, people might confuse it with "opta"). In which context is it referred to as OPT data?

@JanVanHaaren
Copy link
Collaborator

I agree that naming that input parameter is not trivial. I am fine with keeping ma25_data but I was leaning towards opt_data because the Stats Perform Data Delivery Team always uses the term OPT files in their communication with clubs. Those files can be obtained through the Stats Perform API or the Stats Perform Download Portal.

Dear all,
 
Please find enclosed the Fitness Report, players analysis & fitness 15min of : Club Brugge v Cercle Brugge
 
Following files are also available for download from https://pro-download-portal.statsperform.com/ : 
 
    Fitness Report.
    OPT Files.
    Advanced XML.
 
Best Regards,
 
Data Delivery Team – STATS PERFORM

@probberechts
Copy link
Contributor Author

How about improving the docstring? Something like:

ma25_data: txt file linked in the MA25 Match Tracking Feed; also known as an OPT file

@JanVanHaaren
Copy link
Collaborator

How about improving the docstring? Something like:

ma25_data: txt file linked in the MA25 Match Tracking Feed; also known as an OPT file

This solution looks good to me!

@probberechts probberechts removed their assignment May 28, 2024
@probberechts probberechts requested a review from JanVanHaaren May 28, 2024 20:07
Copy link
Collaborator

@JanVanHaaren JanVanHaaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@koenvo koenvo merged commit 8025427 into PySport:master Jun 7, 2024
19 checks passed
@probberechts probberechts deleted the feat/statsperform-ma23 branch June 20, 2024 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect Stats Perform coordinate system
3 participants