Maybe create public statistics of datasets? #2

JackKelly · 2014-12-13T09:56:39Z

One thing that people have asked for in the past is a detailed statistical analysis of multiple datasets. Nilmtk can do lots of stats (and the list keeps growing).

Also, we need to regularly test all our stats functions on as many datasets as possible.

It feels like these two aims can both be solved by the "real data test" project. Perhaps we'd have a script which loads a dataset and runs it through every stats function (and preprocessing and disaggregation?) and produces visualisations (perhaps using an ipython notebook into which we pass each dataset).

This would give the functions a good test. And produce a really useful description of each dataset. And provide a reference for how to use each stats function in nilmtk, and what those functions produce.

What do you think?

oliparson · 2014-12-13T10:32:37Z

This an awesome idea Jack. I'm sure it would improve the robustness of the dataset importers, as well as producing genuinely useful output. A live deployment of this might even be worthy of a paper, or at least presentation of some kind at the NILM workshop.

JackKelly · 2014-12-13T10:36:03Z

Glad you like it! I should have said that this idea is based on the
suggestions you made on this topic at the London NILM meetup ;)
On 13 Dec 2014 10:32, "Oliver Parson" [email protected] wrote:

This an awesome idea Jack. I'm sure it would improve the robustness of the
dataset importers, as well as producing genuinely useful output. A live
deployment of this might even be worthy of a paper, or at least
presentation of some kind at the NILM workshop.

—
Reply to this email directly or view it on GitHub
#2 (comment).

nipunbatra · 2014-12-13T21:26:44Z

Hmm. Really useful. The "real data test" repo should serve this purpose. I'd planned a dasboard kind of web app on top of the results generated from each of the building in each of the data sets. Will keep updated with the progress on that front.

JackKelly · 2014-12-13T21:33:19Z

sounds awesome ;)

don't worry too much about building an all-singing-all-dancing web interface if time is short. Just static pages (like a set of IPython notebooks) giving lots of stats would be great (and would make it easy to link to and for google to index). I'm basically just thinking of a simple script which runs all NILMTK's stats functions on all datasets and spits out some visualisations. Hence both providing a vigorous test of NILMTK and some useful stats.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maybe create public statistics of datasets? #2

Maybe create public statistics of datasets? #2

JackKelly commented Dec 13, 2014

oliparson commented Dec 13, 2014

JackKelly commented Dec 13, 2014

nipunbatra commented Dec 13, 2014

JackKelly commented Dec 13, 2014

Maybe create public statistics of datasets? #2

Maybe create public statistics of datasets? #2

Comments

JackKelly commented Dec 13, 2014

oliparson commented Dec 13, 2014

JackKelly commented Dec 13, 2014

nipunbatra commented Dec 13, 2014

JackKelly commented Dec 13, 2014