Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harmonize ECDF between "h" and "v", removed "reversed" mode #3407

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

eisenlohr
Copy link

@eisenlohr eisenlohr commented Oct 1, 2021

Changes

px.ecdf functionality

I changed three aspects of the ECDF functionality:

  • The plotted line always spans from 0 to 1 instead of omitting the first step (or last step in case of 'complementary').
  • Harmonized the plot output such that x=values,y=weights,orientation='v' is the EXACT transpose of y=values,x=weights,orientation='h'.
    This symmetry was lacking in the former implementation as can be seen in the examples mentioned below.
  • The redundant 'reversed' option has been removed as it was essentially a copy of 'complementary' with the opposite end omitted.

Example plots of original functionality:

original output

Example plots of updated functionality now exhibiting proper symmetry:

updated output

Above images were produced with

import numpy as np
import pandas as pd
import plotly.express as px
import panel as pn

data = pd.DataFrame({'first':np.arange(1,7),
                     'second':[20.56,22.17,23.66,25.3,25.9,27.7],
                     'weights':1+np.arange(6),
                     'reversed':6-np.arange(6),
                    })

grid = pn.GridSpec(ncols=4,nrows=2,height=800,width=1600)

for i,mode in enumerate(['standard','complementary']):
    for k,orientation in enumerate(['v','h']):
        for j,x in enumerate([True,False]):
            grid[i,2*j+k] = px.ecdf(data,
                                x=['first','second'] if x else 'weights',
                                y='weights' if x else ['first','second'],
                                orientation=orientation,
                                range_x={'v':[0,30] if x else [0,10],
                                         'h':[0,1]}[orientation],
                                range_y={'v':[0,1],
                                         'h':[0,10] if x else [0,30]}[orientation],
                                ecdfmode=mode,
                                width=400,
                                title=f'{mode} {orientation} {"x" if x else "y"}',
                               )

grid

px doc-strings

  • Corrected some spelling mistakes.
  • Unified format of first and second lines across parameters.

Code PR

  • I have read through the contributing notes and understand the structure of the package. In particular, if my PR modifies code of plotly.graph_objects, my modifications concern the codegen files and not generated files.
  • I have added tests (if submitting a new feature or correcting a bug) or
    modified existing tests.
  • For a new feature, I have added documentation examples in an existing or
    new tutorial notebook (please see the doc checklist as well).
  • I have added a CHANGELOG entry if fixing/changing/adding anything substantial.
  • For a new feature or a change in behaviour, I have updated the relevant docstrings in the code to describe the feature or behaviour (please see the doc checklist as well).

@@ -426,20 +426,21 @@
"If provided, and if `trendline` is set, all trendlines will be drawn in this color rather than in the same color as the traces from which they draw their inputs.",
],
trendline_scope=[
"str (one of `'trace'` or `'overall'`, default `'trace'`)",
"str (default `'trace'`)",
"One of `'trace'` or `'overall'`",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot one "." here...

@nicolaskruchten
Copy link
Contributor

Thanks for this PR! I'll be able to take a look at it next week, before the next release :)

Have a nice weekend!

@nicolaskruchten
Copy link
Contributor

One quick note is that I will not be able to merge this as-is given that it removes functionality. We still need to accept reversed as an argument for backwards-compatibility. If you want to propose to make this equivalent to complementary we'd need to implement and document that in the docstring :)

@eisenlohr
Copy link
Author

eisenlohr commented Oct 1, 2021 via email

@eisenlohr
Copy link
Author

@nicolaskruchten, who would be able to perform the code review?

@nicolaskruchten
Copy link
Contributor

Hello @eisenlohr! I'm sorry for the long delays in responding here, I've gotten very far behind in my work! I will try to review this before the end of the year.

I do have some hesitations about the idea of merging reverse and complementary, as they don't really mean the same thing... from the docstring:

If 'standard', the ECDF is plotted such that values represent data at or below the point.
If 'complementary', the CCDF is plotted such that values represent data above the point.
If 'reversed', a variant of the CCDF is plotted such that values represent data at or above the point.

The complementary ECDF is 1-ECDF, so the definitions of standard and complementary are correct here. I added reverse mode because, indeed, there was a missing point.

As to the idea of adding an extra point to the series, I am also ambivalent, as this point is not in the underlying data, and will cause duplicated hoverlabels: the 'extra' point will have the same hover_data as the point it was copied from, even though it's not a "real" data point.

@eisenlohr
Copy link
Author

eisenlohr commented Nov 28, 2021

Hi @nicolaskruchten,

Thanks for giving this some thought.
My MR addresses a couple of (somewhat independent) aspects.

  1. I would expect the output of h x (or v x) to be a transpose of v y (or h y) when using the same input data. This was not the case in the current implementation (see diagrams above).

  2. The standard definition of cumulative probability is based on 'up to and including'. I think we are in violent agreement that what is called 'standard' and 'complementary' should be (and is) a mirror image. Where I deviate is in seeing a meaningful reason for modifying the classical definition and using 'up to but excluding', i.e. what was called 'reversed'. In my opinion, there is only one way to use and interpret the cumulative probability: if it is P for up to and including x, then it is 1-P for down to and excluding x. Therefore, no need for 'reversed'. Moreover, for the current (your) implementation, there is essentially no difference (except for the missing "closure" at either 1 or 0) between the choice of 'complementary' and 'reversed' in case of the (in my opinion principally correct) v x plot:
    standard_vx
    complementary_vx
    reversed_vx

  3. Spanning the full range from zero to one appeared to be the usual way for ECDF plots. If one would be really pedantic, the plots should actually extend from -infty at zero cum probabilty to +infty at 1 cum probabilty, i.e. have horizontal (in the standard v x plot) "legs" at the bottom left and top right. Ignoring this detail, the only option that I was able to come up with was to add a copy of the lowest value at a cum prob of zero. In light of what I wrote above, the more appropriate choice might actually be -infty. However, then the plot would look lopsided, since only one of the mentioned "legs" is present... The cleanest look resulted from the choice I proposed in the MR. I agree that the hover information would be a copy, but am not certain whether this would actually be a serious concern?

In summary, I believe that most of the confusion arose due to "1." not being correct, i.e. the current implementation does not yield properly transposed images. Once this is fixed, then it becomes apparent that there is no difference anymore between plots that would result from 'complementary' compared to 'reversed'.

@eisenlohr
Copy link
Author

Hi @nicolaskruchten ,

Did you get a chance to contemplate my above remarks?

Many greetings,
Philip

@nicolaskruchten
Copy link
Contributor

Hi Philip,

I sincerely apologize for letting this languish for so long... I've been a bit distracted by various life events recently (I've just started graduate school!) so I'm behind in Plotly.py work like reviewing PRs. I will definitely try to merge some or all of this PR in an upcoming release :)

Cheers,
Nicolas

@eisenlohr
Copy link
Author

Hi @nicolaskruchten, to lend some further support to the argument I made regarding item 3 above, the ECDF is plotted in the same way, i.e. spanning fully between 0 and 1, in Seaborn (see second-to-last example at https://seaborn.pydata.org/generated/seaborn.displot.html#seaborn.displot).

@eisenlohr
Copy link
Author

eisenlohr commented Sep 30, 2022

Hi @nicolaskruchten, I am trying another ping at this.

The Seaborn image I was referring to in my previous post is this:

Seaborn ECDF example

@eisenlohr
Copy link
Author

@nicolaskruchten, there is another argument that I would like to stress: the cumulative probability for a single-valued population should be a step function at that value.

This is indeed reflected in the output of Seaborn, but not (yet) in plotly.express:

import pandas as pd
import seaborn as sns
import plotly.express as px

df = pd.DataFrame(dict(x=[1]))
sns.displot(data=df,
            x='x',
            kind='ecdf',
           ).savefig('ECDF_seaborn.png')
px.ecdf(df,
        x='x',
       ).write_image('ECDF_plotlyexpress.png')

Seaborn output

ECDF_seaborn

Ploty.express output

ECDF_plotlyexpress

@nicolaskruchten
Copy link
Contributor

Thank you for your persistence with this issue! I'm still quite embarassed at how long it has been open and how hard it has been for me to find time to properly consider it and I continue to apologize. Your arguments make sense, so my silence is not intended as passive disagreement!

@eisenlohr
Copy link
Author

@nicolaskruchten, no offense taken. I am aware that there is always more to do than hours in a day!

@gvwilson gvwilson self-assigned this Jul 5, 2024
@gvwilson gvwilson removed their assignment Aug 2, 2024
@gvwilson gvwilson changed the title harmonized ECDF between "h" and "v", removed "reversed" mode; polishe… harmonize ECDF between "h" and "v", removed "reversed" mode Aug 12, 2024
@gvwilson gvwilson added feature something new P2 considered for next cycle community community contribution labels Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community community contribution feature something new P2 considered for next cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants