Cloud-scale data analytics from Excel
From the familiar interface of Microsoft Excel, Excel DataScope enables researchers to accelerate data-driven decision making. It offers data analytics, machine learning, and information visualization by using Windows Azure for data and compute-intensive tasks. Its powerful analysis techniques are applicable to any type of data, ranging from web analytics to survey, environmental, or social data.
Complex Data Analytics via Familiar Excel Interface
We are beginning to see a new class of decision makers who are very comfortable with a variety of diverse data sources and an equally diverse variety of analytical tools that they use to manipulate data sets to uncover a signal and extract new insights. These decision makers want to invoke complex models, large-scale machine learning, and data analytics algorithms over their data collection by using familiar application, such as Microsoft Excel. They also want access to extremely large data collections that live in the cloud, to sample or extract subsets for analysis or to mash up with their local data sets.
Seamless Access to Cloud Resources on Windows Azure
Excel DataScope is a cloud service that enables data scientists to take advantage of the resources of the cloud, via Windows Azure, to explore their largest data sets from familiar client applications. Our project introduces an add-in for Microsoft Excel that creates a research ribbon that provides the average Excel user seamless access to compute and storage on Windows Azure. From Excel, the user can share their data with collaborators around the world, discover and download related data sets, or sample from extremely large (terabyte sized) data sets in the cloud. The Excel research ribbon also presents the user with new data analytics and machine learning algorithms, the execution of which transparently takes place on Windows Azure by using dozens or possibly hundreds of CPU cores.
Extensible Analytics Library
The Excel DataScope analytics library is designed to be extensible and comes with algorithms to perform basic transforms such as selection, filtering, and value replacement, as well as algorithms that enable it to identify hidden associations in data, forecast time series data, discover similarities in data, categorize records, and detect anomalies.
Excel DataScope Features
- Users can upload Excel spreadsheets to the cloud, along with metadata to facilitate discovery, or search for and download spreadsheets of interest.
- Users can sample from extremely large data sets in the cloud and extract a subset of the data into Excel for inspection and manipulation.
- An extensible library of data analytics and machine learning algorithms implemented on Windows Azure allows Excel users to extract insight from their data.
- Users can select an analysis technique or model from our Excel DataScope research ribbon and request remote processing. Our runtime service in Windows Azure will scale out the processing, by using possibly hundreds of CPU cores to perform the analysis.
- Users can select a local application for remote execution in the cloud against cloud scale data with a few mouse clicks, effectively allowing them to move the compute to the data.
- We can create visualizations of the analysis output and we provide the users with an application to analyze the results, pivoting on select attributes.
Analytics Algorithms Performed in the Cloud
Excel DataScope is a technology ramp between Excel on the user’s client machine, the resources that are available in the cloud, and a new class of analytics algorithms that are being implemented in the cloud. An Excel user can simply select an analytics algorithm from the Excel DataScope Research Ribbon without concern for how to move their data to the cloud, how to start up virtual machines in the cloud, or how to scale out the execution of their selected algorithm in the cloud. They simply focus on exploring their data by using a familiar client application.
Excel DataScope is an ongoing research and development project. We envision a future in which a model developer can publish their latest data analysis algorithm or machine learning model to the cloud and within minutes Excel users around the world can discover it within their Excel Research Ribbon and begin using it to explore their data collection.
Excel DataScope Posters
View posters from TechFest 2011.
|Cloud Data Analytics from Excel – poster 1|
|Cloud Data Analytics from Excel – poster 2|