Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rolling window result centering #889

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

XePeleato
Copy link

@XePeleato XePeleato commented Mar 17, 2021

Pandas-like center parameter; allows centering the result values instead of having a bunch of NaNs at the beginning of the column.

Thanks for contributing.

Description

It adds the option to center the result of a RollingColumn instead of having all the missing values at the beginning, like pandas does.

Testing

Did you add a unit test?
Yes.

Pandas-like center parameter; allows centering the result values instead of having a bunch of NaNs at the beginning of the column.
@@ -164,6 +164,10 @@ default RollingColumn rolling(final int windowSize) {
return new RollingColumn(this, windowSize);
}

default RollingColumn rolling(final int windowSize, final boolean center) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could use a javadoc for people not familiar with pandas's center

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies if this is duplicating something i wrote elsewhere. I made a comment but don't see it now, so perhaps i didn't save it.

I think the javadoc for this method needs to incorporate a slightly shorter version of the explanation you wrote for the PR. The comment here isn't enough for anyone to really get what it does, but the explanation you wrote is very clear. If you can add that, I think we can resolve this and approve and merge the PR

@benmccann
Copy link
Collaborator

so this puts n/a at front and back? I'm curious when you might use this

@XePeleato
Copy link
Author

XePeleato commented Mar 18, 2021

(I deleted my old comment in order to write a more detailed one)

I use this centering when working with date(time)s indices so that the label is set to the midpoint of the rolling window, here's an example:

Let's say we have this Table

LocalDate base = LocalDate.of(2020, 01, 01);
DateColumn index = DateColumn.create("index", IntStream.range(0, 92).boxed().map(base::plusDays));
DoubleColumn data = DoubleColumn.create("data", IntStream.range(0, 92).asDoubleStream());

Table table = Table.create(index, data);
   index     |  data  |
-----------------------
 2020-01-01  |     0  |
 2020-01-02  |     1  |
 2020-01-03  |     2  |
 2020-01-04  |     3  |
 2020-01-05  |     4  |
        ...  |   ...  |
 2020-03-28  |    87  |
 2020-03-29  |    88  |
 2020-03-30  |    89  |
 2020-03-31  |    90  |
 2020-04-01  |    91  |

Now let's calculate a 7-Day rolling mean for this table as usual

table.addColumns(data.rolling(7).calc(AggregateFunctions.mean));

   index     |  data  |  data 7-period Mean  |
----------------------------------------------
 2020-01-01  |     0  |                      |
 2020-01-02  |     1  |                      |
 2020-01-03  |     2  |                      |
 2020-01-04  |     3  |                      |
 2020-01-05  |     4  |                      |
 2020-01-06  |     5  |                      |
 2020-01-07  |     6  |                   3  |
 2020-01-08  |     7  |                   4  |
 2020-01-09  |     8  |                   5  |
 2020-01-10  |     9  |                   6  |
        ...  |   ...  |                 ...  |
 2020-03-29  |    88  |                  85  |
 2020-03-30  |    89  |                  86  |
 2020-03-31  |    90  |                  87  |
 2020-04-01  |    91  |                  88  |

As you can see, the mean corresponding to the period 2020-01-01 to 2020-01-07 is set to the last value of the interval (2020-01-07). Now let's do the same centering the value:

   index     |  data  |  data 7-period Mean  |
----------------------------------------------
 2020-01-01  |     0  |                      |
 2020-01-02  |     1  |                      |
 2020-01-03  |     2  |                      |
 2020-01-04  |     3  |                   3  |
 2020-01-05  |     4  |                   4  |
 2020-01-06  |     5  |                   5  |
 2020-01-07  |     6  |                   6  |
        ...  |   ...  |                 ...  |
 2020-03-26  |    85  |                  85  |
 2020-03-27  |    86  |                  86  |
 2020-03-28  |    87  |                  87  |
 2020-03-29  |    88  |                  88  |
 2020-03-30  |    89  |                      |
 2020-03-31  |    90  |                      |
 2020-04-01  |    91  |                      |

Now the same value is set to the central point of the interval, this can be useful when working with dates, specially if this data is going to be subject to interpolation, where the results would be way more accurate this way.

I hope I made it clear enough, I'll be happy to answer any other question,
Thank you for your time!

@benmccann
Copy link
Collaborator

Uhh. This PR looks messed up now. It says 79 files changed

@XePeleato
Copy link
Author

You're right, I mistakenly pushed other changes to master, it should be fine now.

@benmccann
Copy link
Collaborator

Can you give a more concrete use case? It's not clear to me why you might want to use this. In the example dataset you shared, you're utilizing data from the future, which makes it useless for prediction, so I'm not sure what you'd use it for. Perhaps there are use cases I'm not aware of though

@benmccann
Copy link
Collaborator

I'm personally not a big fan of adding more code complication to handle specific use cases that can already be handled rather simply. In this case you could use lead or lag with windowSize / 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants