Microsoft Research Data Visualization Apps for Office

Established: February 6, 2013

This project has now been concluded, and the apps have been deprecated. Thank you for your feedback!


At Microsoft Research, we want to let users play with new types of visualizations. We’re doing it with Apps for Office (opens in new tab), a new feature for Office 2013 and Office 365. These let users easily add new functionality to Office.

You can see some of how the system worked in this handy guide (opens in new tab).

This is a Research Deployment
We have released the Visualization Toolkit for us to learn about how users interact with advanced visualization. We return a small amount of anonymized information to learn how our users are working with the application; we keep no personally-identifiable information. This information is stored on secure servers, and is used only for research purposes.

  • Histogram: Microsoft Research Data Visualization Apps for Office

    A histogram shows the distribution of data. It will count and bucket data points, and create a bar graph of the frequency of those points. A histogram can be helpful to see how many data points fall into different categories.

    How to use

    First, add the Histogram from the Office Store (opens in new tab). (Here are the basic instructions for getting started with Apps for Office.)

    Click Sample Data to add a column of sample data to your spreadsheet.

    Make sure you have data in exactly one selected, and click Set Data to set the data in the histogram to the column.

    The slider controls the number of bins: drag it left to make fewer, big bins; drag right to make more, small ones.

    The Color box allows you to select a color for the bars.

    The histogram can view either categorical data (such as text labels) or continuous values. The Category checkbox forces the system to choose one bin for each distinct value in the dataset.

    A Histogram with five bins, showing the distribution of the sample data at left.

  • 2D Histogram: Microsoft Research Data Visualization Apps for Office

    A 2D Histogram, or Density Plot, visualizes the joint distribution of two different axes: it lets you see how much of one thing there is, given another.

    Consider the scenario in the scene below. There are two columns of raw data. On the left side, we have built individual histograms of each one–and they both look a bit like normal curves. Only the 2D histogram reveals that there is a connection between the two dimensions: that high values in one axis go with low values on another.

    A 2D histogram is sort of like a scatterplot, but can show clusters of points. Even if there are a few thousand points, or the points overlap, the density plot can still show them all.

    How to use

    First, add the Histogram 2D App from the Office Store. (Here are the basic instructions for getting started with Apps for Office.)

    Click Sample Data to add two columns of sample data to your spreadsheet. Click the button again to see categorical-vs-numerical data, which can look at the frequency of sales of “apples” compare to “pears”. Click a third time to see sample data for categorical-vs-categorical.

    Make sure you have data in exactly two columns selected, and click Set Data to set the data in the histogram to the column.The two sliders control the number of bins: drag it left to make fewer, big bins; drag right to make more, small ones.The Color box allows you to select a color for the squares in the chart.The histogram can view either categorical data (such as text labels) or continuous values. The Category checkbox forces the system to choose one bin for each distinct value in the dataset.

  • Treemap: Microsoft Research Data Visualization Apps for Office

    A treemap is a hierarchical view of data: it breaks an area into rectangles representing branches of a tree. Each rectangle is broken in turn into subrectangles, representing subbranches. Each subrectangle can be associated with two different numbers, once of which may be set to a color; the other to a size.

    A treemap can make it easy to discover how a whole is broken into parts: where the budget in your organization goes, or where the space on your disk has gone.

    For example, here’s an organization divided into four offices: North, South, West, and East. Each sells volumes of “food”, “office”, and “tech”; each of these, in turn, has subcategories. We’re using the size of each box to represent sales volume, and the color to represent the profit.

    The Treemap was invented by Ben Shneiderman, who maintains a history of different treemaps (opens in new tab).

    Data Formats

    Excel doesn’t naturally have a way to represent hierarchical data: it likes rows and columns. In the Treemap module, you can enter data either as a name list, or as an id/parent set.

    Name List

    The Name List format might be most natural for things like disk space. In a name list, you enter as many columns as you need to represent the deepest part of the hierarchy–and then leave some parts of it blank. Here’s an example from a disk:

    Level 1 Level 2 Level 3 Level 4 Level 5
    C:
    C: Files Personal User MyFile.txt
    C: Temp

    You might also use the Name List to handle data that isn’t completely hierarchical:

     

    Region

    Category Product Volume Profit Each
    West Sprockets Type A 1000 0.5
    West Sprockets Type B 400 -5
    East Ratchets 2000 1.0

    Note the “Volume” and “Profit” columns. We can use them for size and color, respectively.

    Id/parent

    If you have an organization chart stored in a database, then it might look different: you might have records with a person and their manager. Here’s a small organization:

    Name Manager
    David Freese Wilfredo Baber
    Wilfredo Baber Yolanda Hill
    Yolanda Hill

    Since it’s very annoying to switch between the “Name List” and the “Id/Parent” forms, the Treemap application accepts both.

    Using the App

    For basic instructions on starting up an App for Office, see the description on t.

    The easiest way to get started with the app is to generate sample data. on the “Sample Data” tab, press either “Name List” or “ID/Parent”.

    After data is available, it will take three selections to build a complete chart: one to add the names; one to select the column for the size, and one more to select the column for color. Note that these last two columns are optional: it’s a perfectly fine treemap even without size or color.

  • Data Visualization Apps for Office

    A Streamgraph is a more unusual type of graph. It shows the way that a set of numbers has changed over time by their relative area. (See the sidebar for the history of Streamgraphs).

    A streamgraph is based on a rectangle of data: in this example, “Comedy” had a volume of 10 in 2008, and of 12 in 2010.

    Year Comedy Drama Sci-Fi Action
    2008 10 10 6 0
    2009 12 13 3 0
    2010 14 18 10 16
    2011 22 22 9.3 55

    Once you have the app installed, you can press “Generate” to get sample data that is appropriately formatted.

    There are two buttons to specify the chart data. Press “Series Values…” to select the data that represents the values of the dataset, including the column headers. In the table below, those are italicized.

    Press “Horizontal axis labels…” to select the data that represents the column names. In this example, you would select the boldfaced years.

    All of the values in the table should be numbers greater than zero. If the application sees a number less than zero, it will replace it with a zero.

  • Apps for Access

    Our implementations of core visualizations are a testbed to learn about how these tools are used with database tools.

    We provide core visualizations, familiar to users of Excel and other popular charting tools:

    • A bar chart and column chart. These differ only in the orientation of the bars.
    • A line chart and an area chart; both offer stacked or overlapping areas. An area chart is filled; a line chart only draws the top line.
    • A pie chart
    • A scatter chart, or scatter plot.

    For each of these apps, the system expects see one column of X axis labels, and several columns of series data. Press “Generate” to insert sample data on the sheet.

    Year    Sales    Profit
    2001    100    14
    2002    122    12
       …    …

    Click [Series Data…] to select the series: Sales and Profit.

    Click [Label Data…] be used to select the labels: the Year column.

    This image shows a column chart for the sample data above.

    Notes

    For the area chart and line chart, the visualization will attempt to interpret the ‘labels’ column as ascending, numeric X axis values. If that works, then it will render them appropriately spaced; the spacing need not be regular. If any of the values are not numeric, or not ascending, then each label will instead be placed as evenly-spaced categories.

    Charts behave inconsistently with negative Y-axis values.

    Note that in the current version of the charts, the X and Y axes are automaitcal

People

Portrait of Roland Fernandez

Roland Fernandez

Principal Researcher

Portrait of Ricardo Gutierrez

Ricardo Gutierrez

Senior Software Engineer