Creating Data Cubes for ODM Databases

 

1.0      Introduction.. 2

2.0      Computer Requirements. 3

3.0      Building a Data Cube from a Template.. 4

3.1      Setting up the Database.. 4

3.2      Getting the Sample Data Cube Solution.. 9

3.3      Connecting to your Database.. 11

3.4      Browsing the Data Source View.. 15

3.5      Building and Browsing the Dimensions. 21

3.6      Building the Cube.. 31

3.7   Browsing the Cube. 37

4.0    Building a Cube from Scratch. 41

 


1.0 Introduction

This document describes how to create your own data cube from the BearRiverOD sample database using SQL Server 2005 and Visual Studio 2005.

 

A data cube is an OLAP database; OLAP databases are optimized for data mining. Data cubes present simple aggregations (sum, min, or max) which are pre-computed for speed and additional calculations (median) can be computed dynamically. All data cube operations act along dimensions such as time, site, or variable type. Dimensions can have hierarchies for simple filtering with drilldown capability; an example of a hierarchy is the time hierarchy year to day of year.

 

Data cubes are constructed from a relational database and are queried with a specialized query language called MDX. Data cubes can be accessed directly from Excel through PivotTables. Viewing the data in a data cube can be very useful for spotting trends and relationships in your data.

 

For an introduction to using data cubes, see the user manual user manual at http://bwc.berkeley.edu/RussianRiver. That will take you through the process of accessing a pre-built data cube via Excel. To request credentials for the Bear River OD datacube or other related data cubes hosted by the Berkeley Water Center, contact bwc-support@lists.berkeley.edu.

 

To build the sample data cube, you’ll need the sample Bear RiverOD database and the Visual Studio project off my Bear River page: http://research.microsoft.com/~vaningen/Hydrology/BearRiver/default.htm. You can also download a compressed backup of the resulting data cube.

 

Information about the ODM effort and ODM tools is available off the Utah State University ODM site at http://water.usu.edu/cuahsi/odm/default.aspx. You can also use either of the two sample databases off the Utah State site or any other ODM database.

 

 

A special thanks to Tim Whitaker of University of Texas at Austin and Robin Weber of Lawrence Berkeley Lab for being the early users of the BearRiverOD data cube and much help with this tutorial. 

2.0 Computer Requirements

To create a data cube, your Windows computer must have the following software installed:

 

 

To check to see if SQL Analysis Services are installed:

  1. Open your Services window, e.g. by opening Control Panel --- Administrative Tools --- Services.
  2. Search for SQL Analysis Services.

 


3.0 Building a Data Cube from a Template

3.1  Setting up the Database

 

After downloading the sample Bear RiverOD database, you should have a zip file named brodmdatabase.zip containing two files. The BearRiverOD_Data.mdf file is the database; the BearRiverOD_log.LDF file is the associated log file. Typically you unzip these into the C:\ Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data folder.

 

After unzipping the files and moving them into the target directory, you need to attach the restored database.

 

  1. Start SQL Server Management Studio (Start --- All Programs --- Microsoft SQL Server 2005 --- SQL Server Management Studio) and connect to the database engine.

 

2.  Right click on Databases and select Attach.

 

 

  1. In the Attach Databases window, click Add, browse to your restored database file and select it.

 

 

  1. Click OK. If you are restoring into the C: directory, SQL Server should find the associated log file automatically. You will see the log file you are restoring under the database file name in the lower pane. If you don’t see it, you can correct the Current File Path by clicking on the  button and navigating to the log file. When you’re done, you should see a display like the following. You’ll note that on the bwc.berkeley.edu server, we keep the databases on the M: volume and the logs on L: volume.

 

       

 

  1. Click OK and the database should be attached. After refreshing the database list, you should see the BearRiverOD database in the Object Explorer.

 

Have a look at the various tables in the database at your leisure. For data analysis and cube building, the important tables are the Data Values table containing the actual measurements and the various related tables such as Sites or VariableID.

 

 

 

 

As a check on the database restore, you can verify the number of data values are contained in the database. Open a new query window by right clicking on the BearRiverOD database link (this should highlight the link) then left clicking on the New Query button.

 

 

Execute the query shown below. The database should contain 820742 data values. We’ll also use this as an early check on the data cube build.

 

 

You can now close SQL Server Management Studio. Click No when prompted to save your SQL queries.

3.2  Getting the Sample Data Cube Solution

 

After downloading the Visual Studio project, you should have a file named brodmcube.zip. Extract the files contained in that zip file and navigate to the BearRiverOD.sln file.

 

 

 

Open it with Visual Studio; you should be able to do this by double clicking.

 

 

The Solution Explorer is on the right. The folders in the BearRiverOD project include:

 

 

3.3  Connecting to your Database

 

Before you can really browse the BearRiverOD project, you need to connect the data source to your database.

 

  1. Expand Data Sources if necessary.
  2. Right click on Bear River OD.ds, and click Open. 
  3. In the Data Source Designer dialog, click the Edit button. 

 

 

  1. You should see the Connection Manager dialog.
    1. Make sure that Native OLE DB\SQL Native Client is chosen as the Provider. This ensures that the right protocol will be used to connect to your database.
    2. If you are going to build your cube on the same machine with your database, you can specify “localhost” as the Server name. If you want to build a cube on a different machine than the database, you can specify the full path name to the server. For example, bwc is bwc.berkeley.edu.
    3. We use Windows Authentication at bwc.
  2. If you want to use another OD database rather than the recently restored BearRiverOD database, you can use the pull down at the “Select or enter a database name:” menu.
  3. Make sure you can connect to your database by clicking the “Test Connection” button.

 

 

 

If you are using an ODM database other than the Bear River sample database, you may want to rename the Data Source or Project at this point. For example, if your database is “RussianRiver”, you could rename the project to “MyRussianRiver” and the Data Source to “Russian River OD”.

.

 

 

Be sure to say “Yes” when prompted to change the object name as well.

 

 

 

 

Click OK to continue. You can also choose the name of the Analysis Services Database at this point. Select the “Properties” tab off the “Project” menu.

 

 

That will bring up a window that has “Options” and “Target” in bold. Under “Target” is “Server” and “Database”. The first is the server name (you can deploy remotely) and the second is the database name. You can change one or both and then click OK.

 

3.4  Browsing the Data Source View 

Now that you’re connected to your database, you can browse the project.

 

Let’s look at the Data Source View. Expand Data Source Views, and open Bear River OD.dsv.

 

 

 

 

You now see a layout of tables and their relationships. The DataValues is at the center. To build a cube, you need a “star” or “fully normalized” database schema. The table containing the actual measurements forms the center of the star. Each dimension in your cube is built out of a table with a foreign key relationship with the DataValues table. The sample cube you are about to build has Variables, Methods, Sites, Qualifiers and a Local Time Line dimension.

 

The OD database schema does not always use foreign keys and does not always enforce constraints on those keys. To compensate for that, a view on the DataValues table is used. To see this, right click on the “DataValues” folder in the Tables view and select “Edit Named Query”.

 

 

 

 

 

The named query ensures that a value of 0 will be used whenever the QualifierID or MethodID is NULL. The named query also ignores the columns of the DataValues table that are not used in the cube. This is not necessary as there is no penalty for including columns that are not used; it can make the cube easier to use by eliminating the unused attributes from dimension menus.

 

That same 0 foreign key must be added to the Qualifiers and Methods tables. This is done by a similar Named Query. For example, the Qualifiers table is defined by:

 

SELECT     QualifierID, QualifierCode, QualifierDescription

FROM Qualifiers

UNION

SELECT     0 AS qualifierid, '' AS qualifiercode,

      'None' AS Qualifierdescription

 

Again, note that these named queries would not be necessary if there were no NULLs in the DataValues table.

 

The ODM database also does not have a Local Time Line table. The project handles that by creating that table. The time dimension also has a number of hierarchies. To build these, we need distinct columns in the LocalTimeLine table for each component of each hierarchy. This makes the named query fairly lengthy, although not complicated. You can see the full query by editing the named query. Let’s look at a subset:

 

SELECT     LocalDateTime,

DATEPART(yy, LocalDateTime) AS yy,

DATEPART(mm, LocalDateTime) AS mm,

DATEPART(dd, LocalDateTime) AS dd,

DATEPART(hh, LocalDateTime) AS hh,

DATEPART(mi, LocalDateTime) AS mi,

DATEPART(dayofyear, LocalDateTime) AS doy,

CONVERT(varchar(4), DATEPART(yy, LocalDateTime)) + '-' + RIGHT('00' + CONVERT(varchar(3), DATEPART(dayofyear, LocalDateTime)), 3) AS yydoy,

CONVERT(varchar(4), DATEPART(yy, LocalDateTime)) + '-' + RIGHT('00' + CONVERT(varchar(3), DATEPART(dayofyear, LocalDateTime)), 3) + '-' + RIGHT('0' + CONVERT(varchar(2), DATEPART(hh, LocalDateTime)), 2) AS yydoyhh,

CONVERT(varchar(4), DATEPART(yy, LocalDateTime)) + '-' + RIGHT('00' + CONVERT(varchar(3), DATEPART(dayofyear, LocalDateTime)), 3) + '-' + RIGHT('0' + CONVERT(varchar(2), DATEPART(hh, LocalDateTime)), 2) + ':' + RIGHT('0' + CONVERT(varchar(2), DATEPART(mi, LocalDateTime)), 2) AS yydoyhhmi

FROM         DataValues

GROUP BY LocalDateTime

 

The first several elements of the query allow browsing by year (yy), month (mm), day of month (dd), hour of day (hh), minute of hour (mi), and day of year (doy). This lets you aggregate the data across those attributes. For example, you can plot the average of all values by day of year to see seasonal variations, the maximum of all values by hour of day to see diurnal cycles, or compute the overall average, minimum, and maximum of any data value across all time.  

 

As we’ll see later, the Local Time Line dimension also has a “Year to Day” hierarchy which allows you to drill down through the Year, to Day of Year, to hour of that day, to minute within that hour. To build that hierarchy, the LocalTimeLine table needs columns for year (yy), year + day of year (yydoy), year + day of year + hour of day (yydoyhh), and year + day of year + hour of day + minute of hour (yydoymi). Similar columns are needed for each such hierarchy in the time dimension. The BearRiver OD project constructs “Year to Day”, “Year to Week”, and “Year to Month” hierarchies. You’ll see the additional columns such as yymmddhhmi (year + month of year + day of month + hour of day + minute of hour) in the named query. Other bwc cubes have included Water Year (starting 1 October) and MODIS week (8 day satellite traversal) time hierarchies.

 

The Variables table in the BearRiverOD database is also quite large. The database contains data downloaded from the USGS NWIS site and that site has a LOT of variables. To make our cube simpler, a named query is used to include only the variable names that are actually used in the DataValues table.

 

 

SELECT     VariableID, VariableCode, VariableName, VariableUnitsID, SampleMedium, ValueType, IsRegular, TimeSupport, TimeUnitsID, DataType, GeneralCategory, NoDataValue

FROM         Variables

WHERE     (VariableID IN

                          (SELECT     VariableID

                            FROM       DataValues

                            GROUP BY VariableID))

 


Note that each of the above named queries could be replaced with views in your database. The resulting view could be used directly in the project and no named queries would be necessary. The Bear River OD project uses Named Queries to avoid changing the sample database in any way.

 

If you are using an ODM database other than the BearRiverOD, you should make sure that the various named queries are behaving correctly. Below the main tabs in Visual Studio, you’ll see a refresh button for the data source view.  Click the Refresh button to update the view with respect to your local database.

 

 

If the schema of your new database matches the schema of the original database (the one that the VS Solution was originally created for), then you should see a dialog indicating that no changes were found.

 

 

 

Click OK to continue. At this point, if you’re using a database other than the Bear River sample database, you’ll also want to change the name of the Data Source View. Renaming the Data Source View is just like renaming the Data Source. You right click on the Data Source View and select rename.

 

3.5  Building and Browsing the Dimensions

 

Next, build and browse the Dimensions. Let’s start with the simple Variables dimension. In the Solution Explorer, expand the Dimensions and open Variables.

 

 

 

The Variables.dim tab should open with the Dimensions Structure tab as the active (visible) tab. You’ll see an Attributes pane, a Hierarchies and Levels pane, and a Data Source View pane. 

 

 

 

Building a cube is done by processing. Dimensions can be processed before processing the rest of the cube or processed with the cube. Pre-processing a dimension allows you to view the dimension and make sure it’s correct before processing the rest of the cube. To process a dimension:

  1. Click the Process button  just above the Attributes pane.

 

 

  1. You may be prompted to build and deploy the project.  If so, click Yes.