Almost all areas of science are moving to a more data-driven analysis pipeline, where large multidimensional datasets need to be explored and analyzed for possible insights. The process of exploring large datasets is inherently an interactive process, as the user will react to aspects of the data, which will in turn help determine future queries.
Unfortunately, many visualization interfaces were designed with the assumption that much, if not all, of the data to be visualized would reside in memory. As both the quantity and quality of the tools used to collect data have improved, datasets have continued to grow in size, and this assumption often no longer holds. However, interactivity necessitates low-latency access, and the latency required to fetch data from disk for each interaction with the interface is unacceptable.
To address the challenge of scaling visualization to Big Data, we have implemented a data visualization system called ScalaR that provides a web-based, map-style interface (think Google Maps) for viewing large data sets. We presented a paper about ScalaR at the First Workshop on Big Data Visualization held recently in Santa Clara, Calif. The rest of this post summarizes the new approaches and methods to scaling data visualization that we are taking in ScalaR.
ScalaR is a data visualization system that provides a web-based, map-style interface (think Google Maps) for viewing large data sets.