Speaker Travis Oliphant
Affiliation FOCUS Foundation and Continnum Analytics
Host Wenming Ye and Dennis Gannon
Date recorded 18 September 2013
NumPy and recently Pandas have made Python ubiquitous for scientific computing and data analytics. The technical stack for Python works very well for a wide variety of problems that fit in single-address space (RAM of a single computer). For problems that require larger data sets, current solution approaches are to use memory-mapped files, MPI, IPython parallel and/or a standard map-reduce system like Disco (or Hadoop). These techniques typically significantly complicate the software solution from the simple array (table)-oriented expression that makes NumPy (Pandas) so powerful and popular. These approaches can also result in significant data movement throughout the memory hierarchy (which is the common bottleneck in data-centric computing today). Blaze, is an array / table for python that can be used to manage and manipulate very-large, disjoint, data sets in an array-oriented fashion with Python. It is built on a C++-library (dynd) that provides dynamic, multi-dimensional arrays with flexible data types. It also leverages Numba, an array-oriented, python compiler that takes a subset of the Python syntax to LLVM IR and optimized machine code. In this talk I will discuss Blaze and Numba design and roadmap. I will also provide an overview and example of web-based visualizations with Bokeh which allows Python developers to easily produce interactive, web-based visualizations leading in to an overview of Wakari which provides easy access to executable IPython notebooks in the cloud.
©2013 Microsoft Corporation. All rights reserved.
People also watched
Python+Machine Learning tutorial - Data munging for predictive modeling with pandas and scikit-learn