Python script to generate miscellaneous stats that were of interest to the author.
- Have python 3.6 or later installed
- Download the latest WCA database export. Note, you must get the .tsv version of the database, not the .sql version
- Unpack the database export into some convenient directory.
- Clone this repo to your computer, or just download the individual
wca_db_stats.py
file from here on Github. - From your command line, invoke this script with the location of your database export, the name of one of the supported stats, and any other parameters required by that stat.
For example, presuming that this script is in your current directory, and you have unpacked the database export to ~/WCA_db:
python wca_db_stats.py --dump ~/WCA_db/ --stat cahby 2023
will generate for you a Competitions Attended Histogram by Year report for 2023.
--stat ftcbc <comp ID>
You provide the WCA ID of any comp, and it will generate for you a list of all the people who were first-timers at that comp. As well, it tells you how many comps those people have gone on to particpate in.
--stat epby <year>
You provide a year, and it shows you a table of how many unique people competed in each event during that year, sorted by popularity. No surprise, 3x3 always comes out on top.
--stat epat
Same as Event Popularity by Year, but for all years together.
--stat cahby <year>
You provide a year, and it generates for you a histogram of number of competitors who have attended different amounts of comps during the year. That is, how many people only went to 1 comp, 2 comps, ... up through whatever the largest number is for the year. Fun fact: in 2023, some wildly dedicated cuber went to 85 comps!
--stat yape
Shows a simple table of what year each event was first held at competition recognized by the WCA.
--stat ppc
Shows a simple table of how many people are registered to represent each country, both by the raw number and by percentage.
Fun fact: as of this writing, Monaco, Haiti, Togo, Guyana, Democratic Republic of the Congo, Grenada, Somalia, Maldives, Cameroon, Saint Lucia, San Marino, Antigua and Barbuda, and Mauritania are all represented by just one lonely cuber each. Time to hold some competitions in the Caribbean, I think!
--stat srbe <event>
Shows a listing of the 10 slowest individual solves in the database for a given event. The event must be specified as the WCA database identifier for the event, e.g. "333" for 3x3, "444bf" for 4x4 blindfolded, etc. Note that this program does not implement a 10-fastest solves per event, because the WCA website's results page already does that.
If you somehow think this hacky little tool is worth your time to add some additional stats, I've tried to make that easy:
- Add a new stat-generating function. Use the existing ones as a template. You'll see they all have the same call signature and very similar structures. Please follow the same naming convention I've used, which is that the stat names have the general form:
thing it computes + "by" or "per" + what the thing is relative to
If a stat is "by" something, then users should be expected to provide some sort of parameter(s) to the computation, such as a year or a competition name or a WCA ID or whatever you want, really. Those parameters will be available to your function as elements of the options
list passed to your function.
-
Register a new short-name for your function in the
callTable
dictionary near the bottom of the script. -
Add documentation for your new function to this file and to the
usage()
function. -
Submit a pull request.