Skip to content
jlstevens edited this page Jun 6, 2015 · 13 revisions

This is the abstract.

reproducible, interactive, visualization, notebook

Introduction

The process of scientific investigation is both varied and continuously evolving. Regardless of the domain, research involves stretches of uncertain exploration punctuated by periods where crucial findings are distilled and disseminated. On one hand, there is the exploratory phase, symbolised by an ongoing research log whereas the latter stage is typically signified by the completed, peer-reviewed publication.

Although the core principles of any research program remain constant, the work of the investigator has been revolutionised by the modern computer. It is now common to interact with vast data sets, rapidly exploring different visualisations and analyses. Although the power of rapid prototyping is often invaluable during the exploration phase, the pace with which ideas may be trialled has made reproducibility more difficult as it becomes increasingly likely that crucial steps are lost along the way.

As a result, there is a natural tension between the interactive mode of exploration where it is desirable to test and discard unproductive ideas while also preserving key findings in a reproducible manner for later publication. In this paper, we present an approach which increases the level of interactivity and ability to rapidly prototype ideas while simultaneously increasing reproducibility.

The interactive interpreter

To understand this approach, we need to briefly consider the history of how we interact with computational data. The idea of an interactive programming session originates with the earliest LISP interpreters in the 1960s. Since then, high-level programming languages have only become even more dynamic in nature. In recent years, the Python language has been adopted by researchers due to its concise, readable syntax. Python is well suited to dynamic interaction and features an interactive, textual interpreter.

A typical interpreter such as the Python prompt is a text-only environment where commands are entered by the user then immediately parsed, executed and returned back to the user using a suitable representation. This approach offers immediate feedback and works well for data that is naturally expressed in a concise textual form but begins to fail when the data cannot be usefully visualised as text. In such instances, a plotting package and a rich graphical display would then be used to present the results outside the environment of the interpreter.

This kinda of disjointed approach is a reflection of history; the text-only environments where interactive interpreters were first employed appeared long before any rich graphical interfaces and GUI environments. To this day, text-only interpreters are standard due to the relative simplicity of working with text whereas richer environments (such as the IPython Qt console) have generally failed to gain traction.

The disconnect between data and its representation

As text-based interpreters failed to scale to workflows involving data that could not be usefully represented as text, the web browser emerged as a ubiquitous and standardized means of interactively working with rich media documents. Although early versions of the HTML standard only supported passive page viewing, the introduction of HTML5 has made it common to engage in bi-directional interaction with complex, dynamic documents.

The emergence of the web browser as a platform has been exploited by the scientific and Python communities with the development of tools such as the IPython Notebook [1]_ and SAGE MathCloud [2]_. These projects, offer interactive computation sessions in a notebook format instead of a traditional text prompt. Although similar to the traditional text-only interpreters, these notebooks allow embedding of graphics and other media while keeping a permanent record of useful commands in a rich interactive document mixing code and exposition.

Despite the greatly improved capabilities of these tools as rich computational environments, the spirit of the interactive interpreter has not been restored: there is an ongoing disconnect between data and its representation. This artificial distinction is a lingering consequence of a text-only world and has resulted in a stark strict split between how we conceptualise 'simple' and 'complex' data. This split can be easily identified by the use of any plotting package such as matplotlib [3].

Here we introduce HoloViews, a library of simple classes designed to reestablish the link between data with its representation. Although HoloViews is not a plotting package, it offers visual representations of your data within the IPython Notebook environment. The result is research that is more interactive, more concise, more declarative and more reproducible.

Clone this wiki locally