Skip to content

Commit

Permalink
Merge pull request #13 from kthyng/widget
Browse files Browse the repository at this point in the history
Selector: widget for human text-selection
  • Loading branch information
kthyng authored Nov 7, 2022
2 parents 1481fbf + b1e388c commit e32c1b5
Show file tree
Hide file tree
Showing 20 changed files with 412 additions and 15 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ repos:
rev: v0.982
hooks:
- id: mypy
additional_dependencies: [types-setuptools]
additional_dependencies: [types-requests, types-setuptools]
exclude: docs/source/conf.py
args: [--ignore-missing-imports]

Expand Down
18 changes: 18 additions & 0 deletions binder/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
channels:
- conda-forge
dependencies:
# Required for full project functionality (don't remove)
- pytest
# Examples (remove and add as needed)
- bs4
- ipywidgets
- jupyterlab
- jupyterlab_widgets
- jupytext
- lxml
- pandas
- pip
- regex
- requests
- pip:
- git+https://github.com/axiom-data-science/cf-pandas
28 changes: 28 additions & 0 deletions binder/postBuild
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash

set -ex

FILES="docs/demo_*.md"
for f in $FILES
do
jupytext $f --to ipynb
<!-- echo "Processing $f file..."
# take action on each file. $f store current file name
cat "$f" -->
done

<!-- jupytext demo_reg.md --to ipynb
jupytext demo_vocab.md --to ipynb

invoke build --env-name=root --no-kernel
invoke demofiles
invoke talk -t demo
rm -rf demofiles
rm -rf notebooks
rm -rf narrative
rm -rf slides
rm demo/notebooks/Julia.ipynb
jupyter lab clean

# Setup a workspace
jupyter lab workspaces import binder/workspace.json -->
3 changes: 2 additions & 1 deletion cf_pandas/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@
from .accessor import CFAccessor # noqa
from .options import set_options # noqa
from .reg import Reg
from .utils import always_iterable, astype, match_criteria_key
from .utils import always_iterable, astype, match_criteria_key, standard_names
from .vocab import Vocab
from .widget import Selector, dropdown


try:
Expand Down
21 changes: 21 additions & 0 deletions cf_pandas/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,24 @@ def match_criteria_key(
if key in available_values:
results.append(key)
return list(set(results))


def standard_names():
"""Returns list of CF standard_names.
Returns
-------
list
All CF standard_names
"""

import requests
from bs4 import BeautifulSoup

url = "https://cfconventions.org/Data/cf-standard-names/79/src/cf-standard-name-table.xml"
req = requests.get(url)
soup = BeautifulSoup(req.content, features="xml")

standard_names = [entry.get("id") for entry in soup.find_all("entry")]

return standard_names
4 changes: 3 additions & 1 deletion cf_pandas/vocab.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,4 +105,6 @@ def open_file(self, openname: Union[str, pathlib.PurePath]):
openname: str
Where to find vocab to open.
"""
return json.loads(open(openname, "r").read())
return json.loads(
open(pathlib.PurePath(openname).with_suffix(".json"), "r").read()
)
145 changes: 145 additions & 0 deletions cf_pandas/widget.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
"""Widget"""

from typing import DefaultDict, Dict, Optional, Sequence, Union

import pandas as pd

from .reg import Reg
from .utils import astype
from .vocab import Vocab


def dropdown(
nickname: str,
options: Union[Sequence, pd.Series],
include: str = "",
exclude: str = "",
):
"""Makes widget that is used by class.
Options are filtered by a regular expression written to reflect the include and exclude inputs, and these are updated when changed and shown in the dropdown. The user should select using `command` or `control` to make multiple options. Then push the "save" button when the nickname and selected options from the dropdown menu are the variables you want to include exactly in a future regular expression search.
Parameters
----------
nickname: str
nickname to associate with the Vocab class vocabulary entry from this, e.g., "temp". Inputting this to the function creates a text box for the user to enter it into.
options: Sequence
strings to select from in the dropdown widget. Will be filtered by include and exclude inputs.
include: str
include must be in options values for them to show in the dropdown. Will update as more are input. To input more than one, join separate strings with "|". For example, to search on both "temperature" and "sea_water", input "temperature|sea_water".
exclude: str
exclude must not be in options values for them to show in the dropdown. Will update as more are input. To input more than one, join separate strings with "|". For example, to exclude both "temperature" and "sea_water", input "temperature|sea_water".
"""
import ipywidgets as widgets

reg = Reg(include=include, exclude=exclude)
print("Regular expression: ", reg.pattern())
options = astype(options, pd.Series)
options2 = options[options.str.match(reg.pattern())]

widg = widgets.SelectMultiple(
options=options2,
value=[] if len(options2) == 0 else [options2.iloc[0]],
rows=10,
description="Options",
disabled=False,
)
return widg


class Selector(object):
"""Coordinates interaction with dropdown widget to make simple vocabularies.
Options are filtered by a regular expression written to reflect the include and exclude inputs, and these are updated when changed and shown in the dropdown. The user should select using `command` or `control` to make multiple options. Then push the "save" button when the nickname and selected options from the dropdown menu are the variables you want to include exactly in a future regular expression search.
Examples
--------
Show widget with a short list of options. Input a nickname and press button to save an entry to the running vocabulary in the object:
>>> import cf_pandas as cpf
>>> sel = cfp.Selector(options=["var1", "var2", "var3"])
>>> sel
See resulting vocabulary with:
>>> sel.vocab
"""

def __init__(
self,
options: Sequence,
vocab: Optional[Vocab] = None,
nickname_in: str = "",
include_in: str = "",
exclude_in: str = "",
):
"""Initialize Selector object.
Parameters
----------
options: Sequence
strings to select from in the dropdown widget. Will be filtered by include and exclude inputs.
vocab: Vocab object
Defaults to None. A vocabulary will be created as part of using this widget. However, instead a vocabulary can be input via this argument and then will be amended with the entries made with the widget.
nickname_in: str
Default nickname, used for initial value, useful for testing
include_in: str
Default include, used for initial value, useful for testing
exclude_in: str
Default exclude, used for initial value, useful for testing
"""
import ipywidgets as widgets

# create an output widget in order to show output instead of going to log
self.output = widgets.Output()

if vocab is None:
self.vocab = Vocab()
else:
self.vocab = vocab

self.dropdown_values: Sequence = []
self.include = include_in
self.exclude = exclude_in

self.button_save = widgets.Button(description="Press to save")

self.nickname_text = widgets.Text(value=nickname_in)
self.nickname = self.nickname_text.value

self.dropdown = widgets.interact(
dropdown,
options=widgets.fixed(options),
nickname=self.nickname,
include=self.include,
exclude=self.exclude,
)
from IPython.display import display

display(self.button_save)

self.button_save.on_click(self.button_pressed)
display(self.output)

def button_pressed(self, *args):
"""Saves a new entry in the catalog when button is pressed."""

# print vocab
# clear the output on every click of randomize self.val
self.output.clear_output()

# execute function so it gets captured in output widget view
with self.output:
if self.dropdown.widget.kwargs["nickname"] == "":
raise KeyError("Must input nickname to make entry.")

# regular expressions to put into entries: exact matching
res = [
Reg(include_exact=exp).pattern()
for exp in self.dropdown.widget.result.value
]
self.vocab.make_entry(
self.dropdown.widget.kwargs["nickname"], res, attr="standard_name"
)
print("Vocabulary: ", self.vocab)
4 changes: 4 additions & 0 deletions ci/environment-py3.10.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,12 @@ channels:
dependencies:
- python=3.10
############## These will have to be adjusted to your specific project
- bs4
- ipywidgets
- lxml
- pandas
- regex
- requests
##############
- pytest
- pip:
Expand Down
4 changes: 4 additions & 0 deletions ci/environment-py3.8.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,12 @@ channels:
dependencies:
- python=3.8
############## These will have to be adjusted to your specific project
- bs4
- ipywidgets
- lxml
- pandas
- regex
- requests
##############
- pytest
- pip:
Expand Down
4 changes: 4 additions & 0 deletions ci/environment-py3.9.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,12 @@ channels:
dependencies:
- python=3.9
############## These will have to be adjusted to your specific project
- bs4
- ipywidgets
- lxml
- pandas
- regex
- requests
##############
- pytest
- pip:
Expand Down
9 changes: 9 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,12 @@ Vocab class for handling custom variable-selection vocabularies
:inherited-members:
:undoc-members:
:show-inheritance:

widget class for easy human selection of variables to exactly match
*******************************************************************

.. automodule:: cf_pandas.widget
:members:
:inherited-members:
:undoc-members:
:show-inheritance:
71 changes: 71 additions & 0 deletions docs/demo_widget.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.14.0
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---

# Widget to help humans select strings to match

The best way to understand this demo is with a Binder notebook since it includes a widget! Click on the badge to launch the Binder notebook.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/axiom-data-science/cf-pandas/HEAD?labpath=demo_widget.ipynb)

---

One way to deal with vocabularies (see [vocab demo](https://cf-pandas.readthedocs.io/en/latest/demo_vocab.html)) is to create a vocabulary that represents exactly which variables you want to match with for a given server. This way, when you are interacting with catalogs and data from that server you can be sure that your vocabulary will pull out the correct variables. It is essentially a variable mapping in this use case as opposed to a variable matching.

Sometimes the variables we want to search through for making selections could be very long. This widget is meant to help quickly include and exclude strings from the list and then allow for human-centered multi-select with command/control to export a vocabulary.

```{code-cell} ipython3
import cf_pandas as cfp
```

## Select from list of CF standard_names

You can read in all available standard_names with a utility in `cf-pandas` with:

`cfp.standard_names()`.

```{code-cell} ipython3
names = cfp.standard_names()
```

The basic idea is to write in a nickname for the variable you are representing in the top text box, and then select the standard_names that "count" as that variable. One problem is that if you don't include and exclude specific strings, the list of standard_names is too long to look through and select what you want for a given variable nickname.

Here is an example with a few inputs initialized to demonstrate. You can add more strings to exclude by adding them to the text box with a pipe ("|") between strings like `air|change`. You can pipe together terms to include also; the terms are treated as the logical "or" so the options list will show strings that have at least one of the "include" terms.

Once you narrow the options in the dropdown menu enough, you can select the standard_names you want. When you are happy with your selections, click "Press to save". This creates an entry in the class "vocab" of your variable nickname mapping to the attribute "standard_name" exactly matching each of the standard_names selected. Then, you can enter a new variable nickname and repeat the process to create another entry in the vocabulary.

```{code-cell} ipython3
w = cfp.Selector(options=names, nickname_in="temp",
exclude_in="air", include_in="temperature")
w.button_pressed()
```

The rest of the notebook shows results based on the user not changing anything in the widget so the results can be consistent.

Look at vocabulary

```{code-cell} ipython3
w.vocab
```

Save vocabulary for future use

```{code-cell} ipython3
w.vocab.save("std_name_demo")
```

Open and check out your vocab with:

```{code-cell} ipython3
cfp.Vocab("std_name_demo")
```
4 changes: 4 additions & 0 deletions docs/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,12 @@ name: cf-pandas_docs
dependencies:
- python=3.8
# If your docs code examples depend on other packages add them here
- bs4
- ipywidgets
- lxml
- pandas
- regex
- requests
# These are needed for the docs themselves
- jupytext
- numpydoc
Expand Down
2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ Welcome to cf-pandas's documentation!
:maxdepth: 2

demo_reg.md
demo_vocab.md
demo_widget.md
api
GitHub repository <https://github.com/axiom-data-science/cf-pandas>

Expand Down
Loading

0 comments on commit e32c1b5

Please sign in to comment.