Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add online JupyterChart widget based on AnyWidget #3119

Merged
merged 24 commits into from
Aug 2, 2023

Conversation

jonmmease
Copy link
Contributor

@jonmmease jonmmease commented Jul 22, 2023

This overview has been updated after the rename to JupyterChart and the change to using traitlets for params and selections.

Background

This PR is a follow on to #3108. After a lot of good discussion (thanks @manzt, @binste , @mattijn, @domoritz, and @joelostblom), we decided that the best first step is for Altair to bundle an AnyWidget-based Jupyter Widget that loads it's JavaScript dependencies from a CDN. This approach requires an internet connection (like the default html renderer) but it doesn't result in an increase in Altair's wheel size, and it doesn't introduce a development dependency on npm. Furthermore, AnyWidget's architecture leaves open the future possibility of providing an optional "altair-offline" Python package that would include an offline bundle of the same JavaScript logic included in this PR.

Now that we have the distribution decision out of the way, this PR focuses on the widget's design and logic.

Basic Usage

To use the widget, simply wrap an Altair chart in alt.JupyterChart.
Here's the simplest example of using the widget:

import altair as alt
import pandas as pd

source = pd.DataFrame({
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

chart = alt.Chart(source).mark_bar().encode(
    x='a',
    y='b'
)

jchart = alt.JupyterChart(chart)
jchart

visualization (3)

Updating charts in-place

The JupyterChart's chart property can be assigned to a new Altair chart, and the new chart will immediately be displayed in place of the old one. And the update looks really smooth, parts of the chart that don't change don't seem to flash at all.

import altair as alt
import pandas as pd

source = pd.DataFrame({
    'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})

chart = alt.Chart(source).mark_bar().encode(
    x='a',
    y='b'
)

jchart = alt.JupyterChart(chart)
jchart
Screen.Recording.2023-07-28.at.10.39.35.AM.mov

Params and Selections

The JupyterChart includes special handling for Altair params (both regular and selection params). The current state of regular non-selection params are stored in the .params property, and the current state of selection params are stored in the .selections property.

Observe changes to a regular param in Python

Here's what this looks like for the Slider Cutoff gallery example:

import altair as alt
import pandas as pd
import numpy as np

rand = np.random.RandomState(42)

df = pd.DataFrame({
    'xval': range(100),
    'yval': rand.randn(100).cumsum()
})

slider = alt.binding_range(min=0, max=100, step=1)
cutoff = alt.param(name="cutoff", bind=slider, value=50)

chart = alt.Chart(df).mark_point().encode(
    x='xval',
    y='yval',
    color=alt.condition(
        alt.datum.xval < cutoff,
        alt.value('red'), alt.value('blue')
    )
).add_params(
    cutoff
)
jchart = alt.JupyterChart(chart)
jchart
Screen.Recording.2023-07-28.at.10.26.04.AM.mov

The .params property is a traitlet class with dynamic properties with one attribute for each regular parameter (only cutoff in this case). The property is updated whenever a regular param's value changes. The .params property is a traitlet, so it's possible to [set up callbacks] to listen for changes to individual param values. See https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Events.html#registering-callbacks-to-trait-changes-in-the-kernel

Screen.Recording.2023-07-28.at.10.31.00.AM.mov

Set the value of a regular param from Python

It's also possible to set the value of a regular param from Python by simply assigning to them. For the example above:

Screen.Recording.2023-07-28.at.10.33.43.AM.mov

Linking params with ipywidgets

Because params is a traitlet object, it's possible to use the ipywidgets link and dlink functions to bind params to other ipywidgets.

Screen.Recording.2023-07-28.at.10.35.58.AM.mov

In fact, we can update the Altair example to remove the slider binding, and just drive the interaction with ipywidgets:

import altair as alt
import pandas as pd
import numpy as np

rand = np.random.RandomState(42)

df = pd.DataFrame({
    'xval': range(100),
    'yval': rand.randn(100).cumsum()
})

cutoff = alt.param(name="cutoff", value=50)

chart = alt.Chart(df).mark_point().encode(
    x='xval',
    y='yval',
    color=alt.condition(
        alt.datum.xval < cutoff,
        alt.value('red'), alt.value('blue')
    )
).add_params(
    cutoff
)
jchart = alt.JupyterChart(chart)
jchart
Screen.Recording.2023-07-28.at.10.37.55.AM.mov

Observing selection in Python: Selection types

After a bit of contemplation, I settled on distinguishing three kinds of selections for the purpose of presenting them to the user, each with a dedicated dataclass: PointSelection, IndexSelection, and IntervalSelection.

PointSelection

The PointSelection dataclass is used to store the current state of an Altair point selection (as created by alt.selection_point()) when either a fields or encodings specification is provided. One common example is a point selection with encodings=["color"] that is bound to the legend.

Here's an example:

import altair as alt
from vega_datasets import data

source = data.cars()
brush = alt.selection_point(name="point", encodings=["color"], bind="legend")

chart = alt.Chart(source).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(brush, 'Origin:N', alt.value('grey')),
).add_params(brush)

jchart = alt.JupyterChart(chart)
jchart
Screen.Recording.2023-07-28.at.10.42.03.AM.mov

The jchart.selections property is a traitlet class with properties for each selection ("point" in this case) to one of the three selection dataclasses. Each of these selection dataclasses have value and store properties. The value property is designed to be the easiest to use. The store is Vega-Lite's internal representation of the selection that is used to apply filtering. I wanted to include it here because I'm working towards using it to automatically filter the input dataset based on the selection, and this store value has the info needed to do that.

IndexSelection

What I'm calling an "Index Selection" is an Altair point selection (as created by alt.selection_point()) when neither the encodings nor fields properties are specified. In this situation, Vega-Lite generates a special 1-indexed column (using the identifier transform) named _vgsid_, and builds a point selection referencing this column.

We could use PointSelection above to represent these selections, but the selection specification would contain references to this internal _vgsid_ column. In this case it's a lot more useful to convert the selection state into a list of zero-based indices that are compatible with pandas iloc indexing.

Here's an example:

import altair as alt
from vega_datasets import data

source = data.cars()
brush = alt.selection_point(name="point")

chart = alt.Chart(source).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(brush, 'Origin:N', alt.value('grey')),
).add_params(brush)

widget = alt.ChartWidget(chart)
widget
Screen.Recording.2023-07-28.at.10.45.29.AM.mov

Looking at the store we can see what the actual selection is referencing:

widget.selections["point"].store
[{'unit': '', '_vgsid_': 163},
 {'unit': '', '_vgsid_': 341}]

IntervalSelection

The IntervalSelection dataclass is used to store the current state of an Altair interval selection (as created by alt.selection_interval()). One common example is a box selection. For example:

import altair as alt
from vega_datasets import data

source = data.cars()
brush = alt.selection_interval(name="interval")

chart = alt.Chart(source).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(brush, 'Cylinders:O', alt.value('grey')),
).add_params(brush)

jchart = alt.JupyterChart(chart)
jchart
Screen.Recording.2023-07-28.at.10.47.53.AM.mov

Coverage

I tried all of the Interactive Charts gallery examples, and they all fall nicely into these four concepts (regular params, point selections, index selections, and interval selections).

A dashboard

Just for run, here's a mini Jupyter Widgets dashboard that displays the selected rows in a pandas HTML table:

import ipywidgets
from IPython.display import display
from ipywidgets import HTML, VBox

import altair as alt
from vega_datasets import data

source = data.cars()
brush = alt.selection_interval(name="brush")

chart_widget = alt.JupyterChart(alt.Chart(source).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color=alt.condition(brush, 'Cylinders:O', alt.value('grey')),
).add_params(brush))

table_widget = HTML(value=source.iloc[:0].to_html())

def on_select(change):
    sel = change.new.value
    if sel is None or 'Horsepower' not in sel:
        filtered = source.iloc[:0]
    else:
        filtered = source.query(
            f"{sel['Horsepower'][0]} <= `Horsepower` <= {sel['Horsepower'][1]} & "
            f"{sel['Miles_per_Gallon'][0]} <= `Miles_per_Gallon` <= {sel['Miles_per_Gallon'][1]}"
        )
    table_widget.value = filtered.to_html()
    
chart_widget.selections.observe(on_select, ["brush"])

VBox([chart_widget, table_widget])
Screen.Recording.2023-07-22.at.7.36.29.PM.mov

Follow-on work

Binary serialization

This PR doesn't tackle binary serialization, but the foundation is here since the Jupyter Widget protocol supports raw binary buffers without base64 encoding.

Listen to and update arbitrary Vega signals and datasets

In order to migrate VegaFusion widget to use this ChartWidget, I'll need to add support for updating arbitrary Vega signals and datasets (not only the explicit params declared in the Vega-Lite spec). I'll also need to be able to register callbacks to run in response to changes to arbitrary signals and datasets.

Since I'm not ready to use this yet, and it would add a bit more complexity, I decided not to include this functionality in the initial version.

Setting selections

In the Vega that Vega-Lite produces, it's not always straightforward to set selections from the outside. There may be more we can do here, but for you get an error if you try to set a selection property to a new value in Python.

@jonmmease
Copy link
Contributor Author

cc @philippjfr. I know you've put a lot of thought into how Panel's Vega Pane handles Altair chart selections, so if you happen to have a chance to take a look at this design I'd definitely appreciate your input. It might also be nice to see if we can work toward a future where this Jupyter widget and Panel's Vega Pane share a similar API for dealing with selections/params.

@@ -0,0 +1,70 @@
import embed from "https://cdn.jsdelivr.net/npm/vega-embed@6/+esm";
import { debounce } from "https://cdn.jsdelivr.net/npm/[email protected]/lodash.js"
Copy link
Contributor

@manzt manzt Jul 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Know it’s not bundled, but lodash-es is a pretty large dep (98kb minified, https://bundlephobia.com/package/[email protected] ) for just one import.

The modern alternative I’ve been using is just-debounce-it from https://github.com/angus-c/just

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the recommendation! Switched in ff29ea1

@joelostblom
Copy link
Contributor

I haven't reviewed the code, but all the high level design decisions here make sense to me! I really like the examples and think they will be a great addition to the docs. I also really like the future vision of sharing a similar API as Panel, so that it is easy to transition between the two.

One comment about the name. To me, a widget traditionally is a component like a dropdown or slider. I know that this is not technically true and a JupyterWidget can be almost anything, but I wonder if some users might find the naming confusing or if it is mostly me. I don't have a better suggestion for a name, but we might want to do some rewording in the docs if we call this ChartWidget because currently a widget specifically refers to the dropdowns etc already in Altair/Vega-Lite . If we are also planning to add a section of how to use regular jupyter widgets together with the ChartWidget, we just need to be careful with the wording and making sure we are clearly distinguishing which tools to use for what purpose.

@jonmmease
Copy link
Contributor Author

Thanks for that feedback @joelostblom. You make a good point about the potential confusion that could result from overloading the term "Widget" like this.

Another possibility that just came to mind would be to use the term "Jupyter" in the name instead of "Widget". e.g. JupyterChart. This isn't ideal because the "chart widget" works in some non-jupyter environments like Colab and VSCode. But then again, you could argue that it works in these environments because they provide support for Jupyter widgets, and so they are Jupyter environments as well.

An argument against calling it JupyterChart would be that this isn't the only way to display an Altair chart in Jupyter (e.g. the mimetype and html renderers do this as well). But in the future, this "JupyterChart" might be the primary recommendation for using Altair in Jupyter (we could connect it with the rendering system in the future so that alt.renderers.enable("jupyter") uses the "JupyterChart" behind the scenes.) What do you think?

How does that sound to you?

@joelostblom
Copy link
Contributor

joelostblom commented Jul 25, 2023

I do quite like the explicit of using the word "Jupyter" in the name, but I agree with you that it might not be obvious that it then also works in eg VSCode... although maybe it would be clear enough? We can be explicit in the docs that the JupyterChart works in "any Jupyter-compatible environment such as JupyterLab, VSCode, etc". This name would make it very explicit that it does not work in non-jupyter environment, which I think is not clear if the name is ChartWidget. I would like to hear what others think here as well.

I do like the idea of aiming for alt.renderers.enable("jupyter") in the future and wonder if it would even make sense down the road for altair to try to detect the environment and use the most appropriate renderer automatically, but maybe that is to magical / confusing / hard to implement.

@mattijn
Copy link
Contributor

mattijn commented Jul 28, 2023

I agree with @joelostblom on the potential confusion on the name.
Another two things.

Given this spec:

import altair as alt

bind = alt.binding_range(min=0, max=100, step=1, name="my label😀")
param = alt.param(name="my_param", bind=bind, value=50)

chart = alt.Chart().mark_point().add_params(param)
widget = alt.ChartWidget(chart)
widget.params

It returns a dict:

{'my_param': 50}

Dare to dream: could this also be returned as an attribute so we can set it pythonic using a property setter?

As such:

widget.params.my_param = 50

Another thing. If I make a JavaScript error. Like adding a space in the param name, for example: name="my param" the current ChartWidget returns an

Click to show JavaScript error.

Where the Chart returns a JavaScript trace back:

Javascript Error: Unrecognized signal name: "my_param". This usually means there's a typo in your chart specification. See the javascript console for the full traceback.

With this spec:

import altair as alt

bind = alt.binding_range(min=0, max=100, step=1, name="my label😀")
param = alt.param(name="my param", bind=bind, value=50)

chart = alt.Chart().mark_point().encode(
    color=alt.condition(param,alt.value('red'), alt.value('blue'))
).add_params(param)
widget = alt.ChartWidget(chart)
widget

Would this type of trace back also be possible on the AnyWidget model?

@jonmmease
Copy link
Contributor Author

Would this type of trace back also be possible on the AnyWidget model?

I'll look into implementing a similar traceback

Dare to dream: could this also be returned as an attribute so we can set it pythonic using a property setter?

@manzt, do you have any ideas on this? Right now widget.params is a Dict traitlet with one key per param, and I have a separate set_param method to set individual parameters by name. Would there be a way to have widget.params be a widget itself with dynamic traitlet properties?

One option I can think of is that I could replace widget.params with a private widget._params traitlet. Then make the widget.set_params method private. Then make widget.params a custom non-widget class that dispatches __getattr__ to look up value sin widget._params and dispatches __setattr__ to call widget._set_params. I think this would work fine, but wondering if there's anything more widget-native that's possible.

@jonmmease
Copy link
Contributor Author

Also @mattijn, how do you feel about JupyterChart as an alternative name? Do you have any other ideas?

@mattijn
Copy link
Contributor

mattijn commented Jul 28, 2023

JupyterChart seems fine or AnyChart ..

@jonmmease jonmmease changed the title Add online ChartWidget based on AnyWidget Add online JupyterChart widget based on AnyWidget Jul 28, 2023
@jonmmease
Copy link
Contributor Author

jonmmease commented Jul 28, 2023

Dare to dream: could this also be returned as an attribute so we can set it pythonic using a property setter?

@mattijn, thanks to your dream I worked on it some more, and I figured out how to do it! I updated the PR description above with the new syntax. And it's now possible to use the Jupyter Widget's link function to connect Altair params directly to Jupyter widgets!

Screen.Recording.2023-07-28.at.10.35.58.AM.mov

I also renamed it to JupyterChart

@mattijn
Copy link
Contributor

mattijn commented Jul 28, 2023

Very great (and inspiring!). I find the interplay very intuitive, also the natural linkage with other components of ipywidgets is 👍👍

@jonmmease
Copy link
Contributor Author

@mattijn I got the error's matching in 037b84f

@philippjfr
Copy link

This looks amazing @jonmmease, really great work! On aligning the selection I would indeed love to work on that although I do envision some difficulty. Panel leans very heavily on Param, so it dynamically creates parameters corresponding to Vega selection params. This makes it work nicely with Panel's API's because you can now do things like binding the selection (and even use it for interactive selections using the hvplot.interactive) API. I can imagine that your JupyterChart also publishes a .selections accessor which makes the current named parameters available (maybe as dynamically created traitlets?) but obviously that won't have the same affordances as a parameter. Happy to brainstorm further but those are my initial thoughts.

On the naming I'll throw out one more suggestion: IPyChart, that doesn't have the same problem with tying naming to the Jupyter ecosystem but perhaps is confusing in other ways.

altair/jupyter/js/README.md Outdated Show resolved Hide resolved
altair/jupyter/js/README.md Outdated Show resolved Hide resolved
jonmmease and others added 2 commits July 29, 2023 10:02
ChartWidget -> JupyterChart

Co-authored-by: Mattijn van Hoek <[email protected]>
ChartWidget -> JupyterChart

Co-authored-by: Mattijn van Hoek <[email protected]>
@jonmmease
Copy link
Contributor Author

Thanks for the review and approval @mattijn! I see @binste self-assigned a review so will leave this open until he has a chance to take a look.

Copy link
Contributor

@binste binste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such an awesome PR!! 🔥 Thank you @jonmmease for this great contribution and
@manzt for the anywidget project! I'm very excited to see what people will come up with. Personally, I'll definitely use this to create interactive reports using widgets linked to charts with jslink and then convert with Quarto to standalone html 😍

I added a few smaller comments but afterwards this looks good to me.

altair/jupyter/__init__.py Outdated Show resolved Hide resolved
altair/jupyter/__init__.py Outdated Show resolved Hide resolved
altair/jupyter/jupyter_chart.py Outdated Show resolved Hide resolved
altair/jupyter/jupyter_chart.py Show resolved Hide resolved
@jonmmease
Copy link
Contributor Author

Thanks for the helpful review @binste, I'll address your comments shortly.

Personally, I'll definitely use this to create interactive reports using widgets linked to charts with jslink and then convert with Quarto to standalone html 😍

I didn't test this explicitly, but I'm afraid this might not work with jslink the way it's currently implemented. I'll take a closer look and report back either way.

@binste
Copy link
Contributor

binste commented Jul 31, 2023

No worries at all! I can also use the normal Vega input widgets for this. The use cases with a running Python kernel are more exciting anyway :)

@jonmmease
Copy link
Contributor Author

Ok, I think I addressed your comments @binste, thanks again for taking a look!

Copy link
Contributor

@binste binste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) Thanks!

@jonmmease
Copy link
Contributor Author

I did a little refactoring to move the widget independent data classes and construction logic to altair.utils.selection.

My hope is that other dashboard toolkits will be able to present selections to end users using these same dataclasses. That way users will have a partially consistent experience when moving between JupyterChart and other dashboard toolkits. And if I'm able to write general data filtering logic that uses these selections as input, users could take advantage of it even if they aren't using JupyterChart.

@jonmmease
Copy link
Contributor Author

I'm going to revert back to using lodash's debounce (even though it pulls in a larger dependency), because the maxWait functionality is pretty important for the selection case.

Here is how it works with just-debounce-it, which implements true debouncing (where you wait until no updates have been made for a particular length of time before evaluating the debounced function). I've set the debounce wait time to 500ms to make the behavior obvious.

Screen.Recording.2023-08-01.at.6.26.40.PM.mov

You can see that while continually dragging the selection region around, no updates are made to the Python side. Updates aren't sent until 500ms after dragging has completed.

lodash's debounce wrapper has an optional maxWait parameter that allow you to specify that, even if the parameter is changing contunually, an update should still be sent at intervals of maxWait. Here's what this looks like with maxWait set to the same as the debounce wait time (500ms in this case).

Screen.Recording.2023-08-01.at.6.36.17.PM.mov

You can see that updates are sent to Python every 500ms while the selection region is continually dragged around. This is the more desirable behavior for our use case.

In the future we can look for a smaller implementation of this functionality.

@jonmmease
Copy link
Contributor Author

jonmmease commented Aug 2, 2023

Ok, going to merge this and start working on documentation. Thanks again everyone!

Let me know if anyone has thoughts on where this should be documented as I haven't started thinking about that yet.

@jonmmease jonmmease merged commit bfd68e4 into main Aug 2, 2023
@@ -0,0 +1,80 @@
import embed from "https://cdn.jsdelivr.net/npm/vega-embed@6/+esm";
import { debounce } from "https://cdn.jsdelivr.net/npm/[email protected]/lodash.js"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can import various lodash functions independently from lodash-es:

Suggested change
import { debounce } from "https://cdn.jsdelivr.net/npm/[email protected]/lodash.js"
import debounce from "https://cdn.jsdelivr.net/npm/[email protected]/debounce/+esm";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool. Does this have an impact on bundle size?

Copy link
Contributor

@manzt manzt Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (and eliminates unnecessary imports). Should have suggested this originally (sorry!), see #3135

Copy link
Contributor

@manzt manzt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me – excellent work @jonmmease! Really excited to see this come together. Just a minor comment about lodash.

@manzt
Copy link
Contributor

manzt commented Aug 2, 2023

Oops, missed submitting – no worries!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Language extensions
Development

Successfully merging this pull request may close these issues.

6 participants