Skip to content
jlstevens edited this page Jun 11, 2015 · 13 revisions

Scientific visualization typically requires large amounts of custom coding that obscures the underlying principles of the work and makes it more difficult to reproduce the results. Here we describe how the new HoloViews Python package, combined with the IPython Notebook, provides a rich, interactive interface for flexible and nearly code-free visualization of your results while storing a full record of the process for later reproduction.

HoloViews provides a set of general-purpose data structures designed for interactive use that allows you to pair your data with a small but crucial amount of semantic metadata that also provide hints as to how you want to represent it visually. Together with a system that completely separates presentation concerns, complex visualizations can be built interactively in a declarative fashion.

It also provides powerful containers that allow you to organize this data for analysis, embedding it whatever multidimensional continuous or discrete space best characterizes it. The resulting workflow allows nearly code-free exploration, analysis, and visualization of your data and results, which leads directly to an exportable recipe for reproducible research.

reproducible, interactive, visualization, notebook

Introduction

The process of scientific investigation is both varied and continuously evolving. Regardless of the domain, research involves stretches of uncertain exploration punctuated by periods where crucial findings are distilled and disseminated. On the one hand, there is the exploratory phase, characterized by an ongoing research log whereas the latter stage is typically signified by the final, peer-reviewed publication.

Although the core principles of the research process are unchanging, the work of the investigator has been revolutionised by the modern computer. It is now common to interact with vast data sets, rapidly exploring different visualisations and analyses. Although the power of rapid prototyping is often invaluable during the exploration phase, the pace with which ideas may be trialled makes reproducibility more difficult as crucial steps may be lost along the way.

As a result, there is a natural tension between the interactive mode of exploration where unproductive ideas are often tested and discarded and the need to preserve important findings in a reproducible manner for publication. In this paper, we present an approach which increases the level of interactivity and ability to rapidly prototype ideas while simultaneously increasing reproducibility.

The interactive interpreter

To understand this approach, we need to briefly consider the history of how we interact with computational data. The idea of an interactive programming session originates with the earliest LISP interpreters in the 1960s. Since then, high-level programming languages have only become even more dynamic in nature. In recent years, the Python language has been adopted by researchers due to its concise, readable syntax. Python is well suited to dynamic interaction and offers an interactive, textual interpreter.

A typical interpreter such as the Python prompt is a text-only environment where commands are entered by the user then immediately parsed, executed and returned back to the user using a suitable representation. This approach offers immediate feedback and works well for data that is naturally expressed in a concise textual form. Unfortunately, this approach begins to fail when the data cannot be usefully visualised as text. In such instances, a plotting package together with a rich graphical display would be used to present the results outside the environment of the interpreter.

This disjointed approach is a reflection of history; the text-only environments, where interactive interpreters were first employed, appeared long before any rich graphical interfaces and GUI environments. To this day, text-only interpreters are standard due to the relative simplicity of working with text. Other approaches, such as the Mathematica Notebook overcome some of the limitations of a text-only format but have remained constrained by limited interoperability and a lack of standardised open formats.

introductory_layout_example.png

Example of a composite HoloViews datastructure and its display representation taken from an IPython Notebook session. The array data is a 400x400 numpy array corresponding to a rendering of part of the Mandelbrot Set. A. The Raster element display the data overlaid by with a horizontal line corresponding to the HLine element via the * operator. The histogram shown is a Histogram element, displaying the distribution of values in the array. B. A Curve element showing the values across the middle of the Raster image as indicated by the blue horizontal line. The curve is concatenated to the Overlay in A via the + operation. 🏷️`layout`

Fixing the disconnect between data and representation

While text-based interpreters have failed to overcome the inherent limitations of working with rich data, the web browser has emerged as a ubiquitous, means of interactively working with rich media documents. In addition to being universally available, web browsers have the benefit of being supported by open standards that remain supported almost indefinitely. Although early versions of the HTML standard only supported passive page viewing, the widespread adoption of HTML5 (and websockets) has made it possible for anyone to engage with complex, dynamic documents in a bi-directional, interactive manner.

The emergence of the web browser as a platform has been exploited by the Python community and the scientific community at large with the development of tools such as the IPython Notebook [1]_ and SAGE MathCloud [2]_. These projects, offer interactive computation sessions in a notebook format instead of a traditional text prompt. Although similar to the traditional text-only interpreters, these notebooks allow embedded graphics or other media (such as video) while maintaining a permanent record of useful commands in a rich document that supports interleaved code and exposition.

Despite the greatly improved capabilities of these tools for computational interaction, the spirit of the interactive interpreter has not been restored: there is an ongoing disconnect between data and its representation. This artificial distinction is a lingering consequence of a text-only world and has resulted in a strict split between how we conceptualise 'simple' and 'complex' data. This split can be easily identified by the import of an external plotting package (such as matplotlib [3]) in order to generate a more useful representation of the data.

Here we introduce HoloViews, a library of simple classes designed to re-establish the link between data with its representation. Although HoloViews is not a plotting package, it is designed to offer a useful datastructures with rich, customizable visual representations in the IPython Notebook environment. The result is research that is more interactive, more concise, more declarative and more reproducible. An example that we will be discussed shortly is presented in Figure :ref:`layout` which builds a complex visualization as a self-contained example in a single line of code.

Clone this wiki locally