Creating Data Cubes for ODM Databases
3.0 Building a Data Cube from a Template
3.2 Getting the Sample Data Cube
Solution
3.3 Connecting to your Database
3.4 Browsing the Data Source View
3.5 Building and Browsing the Dimensions
4.0 Building a Cube from Scratch
This
document describes how to create your own data cube from the BearRiverOD sample
database using SQL Server 2005 and Visual Studio 2005.
A data cube
is an OLAP database; OLAP databases are optimized for data mining. Data cubes
present simple aggregations (sum, min, or max) which are pre-computed
for speed and additional calculations (median) can be computed
dynamically. All data cube operations act along dimensions such as time,
site, or variable type. Dimensions can have hierarchies for simple
filtering with drilldown capability; an example of a hierarchy is the time hierarchy
year to day of year.
Data
cubes are constructed from a relational database and are queried with a
specialized query language called MDX. Data cubes can be accessed directly from
Excel through PivotTables. Viewing the data in a data cube can be very useful
for spotting trends and relationships in your data.
For an
introduction to using data cubes, see the user manual user manual at http://bwc.berkeley.edu/RussianRiver.
That will take you through the process of accessing a pre-built data cube via
Excel. To request credentials for the Bear River OD datacube or other related
data cubes hosted by the Berkeley Water Center, contact
bwc-support@lists.berkeley.edu.
To build
the sample data cube, you’ll need the sample Bear
RiverOD database and the Visual
Studio project off my Bear River page: http://research.microsoft.com/~vaningen/Hydrology/BearRiver/default.htm.
You can also download a compressed
backup of the resulting data cube.
Information
about the ODM effort and ODM tools is available off the Utah State University
ODM site at http://water.usu.edu/cuahsi/odm/default.aspx.
You can also use either of the two sample databases off the Utah State site or
any other ODM database.
A special
thanks to Tim Whitaker of University of Texas at Austin and Robin Weber of
Lawrence Berkeley Lab for being the early users of the BearRiverOD data cube
and much help with this tutorial.
To create a
data cube, your Windows computer must have the following software installed:
To check to
see if SQL Analysis Services are installed:

After
downloading the sample Bear
RiverOD database, you should have a zip file named brodmdatabase.zip
containing two files. The BearRiverOD_Data.mdf file is the database; the
BearRiverOD_log.LDF file is the associated log file. Typically you unzip these
into the C:\ Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data folder.
After
unzipping the files and moving them into the target directory, you need to
attach the restored database.

2. Right click on Databases and select
Attach.



Have a look
at the various tables in the database at your leisure. For data analysis and
cube building, the important tables are the Data Values table containing the
actual measurements and the various related tables such as Sites or VariableID.

As a check
on the database restore, you can verify the number of data values are contained
in the database. Open a new query window by right clicking on the BearRiverOD
database link (this should highlight the link) then left clicking on the New
Query button.

Execute the
query shown below. The database should contain 820742 data values. We’ll
also use this as an early check on the data cube build.

You can now
close SQL Server Management Studio. Click No when prompted to save your SQL
queries.
After
downloading the Visual
Studio project, you should have a file named brodmcube.zip. Extract
the files contained in that zip file and navigate to the BearRiverOD.sln file.

Open it
with Visual Studio; you should be able to do this by double clicking.

The
Solution Explorer is on the right. The folders in the BearRiverOD project
include:
Before you
can really browse the BearRiverOD project, you need to connect the data source
to your database.


If you are
using an ODM database other than the Bear River sample database, you may want
to rename the Data Source or Project at this point. For example, if your
database is “RussianRiver”, you could rename the project to
“MyRussianRiver” and the Data Source to “Russian River
OD”.
.

Be sure to
say “Yes” when prompted to change the object name as well.

Click OK to
continue. You can also choose the name of the Analysis Services Database at this
point. Select the “Properties” tab off the “Project”
menu.

That
will bring up a window that has “Options” and “Target”
in bold. Under “Target” is “Server” and
“Database”. The first is the server name (you can deploy remotely)
and the second is the database name. You can change one or both and then click
OK.

Now that you’re
connected to your database, you can browse the project.
Let’s
look at the Data Source View. Expand Data Source Views, and open Bear River
OD.dsv.

You now see
a layout of tables and their relationships. The DataValues is at the center. To
build a cube, you need a “star” or “fully normalized”
database schema. The table containing the actual measurements forms the center
of the star. Each dimension in your cube is built out of a table with a foreign
key relationship with the DataValues table. The sample cube you are about to
build has Variables, Methods, Sites, Qualifiers and a Local Time Line
dimension.
The OD
database schema does not always use foreign keys and does not always enforce
constraints on those keys. To compensate for that, a view on the DataValues
table is used. To see this, right click on the “DataValues” folder
in the Tables view and select “Edit Named Query”.


The named
query ensures that a value of 0 will be used whenever the QualifierID or
MethodID is NULL. The named query also ignores the columns of the DataValues table
that are not used in the cube. This is not necessary as there is no penalty for
including columns that are not used; it can make the cube easier to use by
eliminating the unused attributes from dimension menus.
That same 0
foreign key must be added to the Qualifiers and Methods tables. This is done by
a similar Named Query. For example, the Qualifiers table is defined by:
SELECT QualifierID,
QualifierCode, QualifierDescription
FROM Qualifiers
UNION
SELECT 0 AS qualifierid, ''
AS qualifiercode,
'None' AS
Qualifierdescription
Again, note
that these named queries would not be necessary if there were no NULLs in the
DataValues table.
The ODM
database also does not have a Local Time Line table. The project handles that
by creating that table. The time dimension also has a number of hierarchies. To
build these, we need distinct columns in the LocalTimeLine table for each
component of each hierarchy. This makes the named query fairly lengthy,
although not complicated. You can see the full query by editing the named
query. Let’s look at a subset:
SELECT LocalDateTime,
DATEPART(yy,
LocalDateTime) AS yy,
DATEPART(mm,
LocalDateTime) AS mm,
DATEPART(dd,
LocalDateTime) AS dd,
DATEPART(hh,
LocalDateTime) AS hh,
DATEPART(mi,
LocalDateTime) AS mi,
DATEPART(dayofyear,
LocalDateTime) AS doy,
CONVERT(varchar(4), DATEPART(yy,
LocalDateTime)) + '-' + RIGHT('00' + CONVERT(varchar(3), DATEPART(dayofyear,
LocalDateTime)), 3) AS yydoy,
CONVERT(varchar(4), DATEPART(yy,
LocalDateTime)) + '-' + RIGHT('00' + CONVERT(varchar(3), DATEPART(dayofyear,
LocalDateTime)), 3) + '-' + RIGHT('0' + CONVERT(varchar(2), DATEPART(hh,
LocalDateTime)), 2) AS yydoyhh,
CONVERT(varchar(4), DATEPART(yy,
LocalDateTime)) + '-' + RIGHT('00' + CONVERT(varchar(3), DATEPART(dayofyear,
LocalDateTime)), 3) + '-' + RIGHT('0' + CONVERT(varchar(2), DATEPART(hh,
LocalDateTime)), 2) + ':' + RIGHT('0' + CONVERT(varchar(2), DATEPART(mi,
LocalDateTime)), 2) AS yydoyhhmi
FROM
DataValues
GROUP
BY LocalDateTime
The first several
elements of the query allow browsing by year (yy), month (mm), day of month
(dd), hour of day (hh), minute of hour (mi), and day of year (doy). This lets
you aggregate the data across those attributes. For example, you can plot the
average of all values by day of year to see seasonal variations, the maximum of
all values by hour of day to see diurnal cycles, or compute the overall
average, minimum, and maximum of any data value across all time.
As
we’ll see later, the Local Time Line dimension also has a “Year to
Day” hierarchy which allows you to drill down through the Year, to Day of
Year, to hour of that day, to minute within that hour. To build that hierarchy,
the LocalTimeLine table needs columns for year (yy), year + day of year
(yydoy), year + day of year + hour of day (yydoyhh), and year + day of year +
hour of day + minute of hour (yydoymi). Similar columns are needed for each
such hierarchy in the time dimension. The BearRiver OD project constructs
“Year to Day”, “Year to Week”, and “Year to
Month” hierarchies. You’ll see the additional columns such as
yymmddhhmi (year + month of year + day of month + hour of day + minute of hour)
in the named query. Other bwc cubes have included Water Year (starting 1
October) and MODIS week (8 day satellite traversal) time hierarchies.
The
Variables table in the BearRiverOD database is also quite large. The database
contains data downloaded from the USGS NWIS site and that site has a LOT of
variables. To make our cube simpler, a named query is used to include only the
variable names that are actually used in the DataValues table.
SELECT VariableID,
VariableCode, VariableName, VariableUnitsID, SampleMedium, ValueType,
IsRegular, TimeSupport, TimeUnitsID, DataType, GeneralCategory, NoDataValue
FROM
Variables
WHERE (VariableID IN
(SELECT
VariableID
FROM
DataValues
GROUP BY VariableID))
Note that each of the above named queries could be replaced with views in your
database. The resulting view could be used directly in the project and no named
queries would be necessary. The Bear River OD project uses Named Queries to
avoid changing the sample database in any way.
If you are
using an ODM database other than the BearRiverOD, you should make sure that the
various named queries are behaving correctly. Below the main tabs in Visual
Studio, you’ll see a refresh button for the data source view. Click the Refresh button to update the
view with respect to your local database.

If the
schema of your new database matches the schema of the original database (the
one that the VS Solution was originally created for), then you should see a
dialog indicating that no changes were found.

Click OK to
continue. At this point, if you’re using a database other than the Bear
River sample database, you’ll also want to change the name of the Data
Source View. Renaming the Data Source View is just like renaming the Data
Source. You right click on the Data Source View and select rename.
Next, build
and browse the Dimensions. Let’s start with the simple Variables
dimension. In the Solution Explorer, expand the Dimensions and open Variables.

The
Variables.dim tab should open with the Dimensions Structure tab as the active
(visible) tab. You’ll see an Attributes pane, a Hierarchies and Levels
pane, and a Data Source View pane.

Building a
cube is done by processing.
Dimensions can be processed before processing the rest of the cube or processed
with the cube. Pre-processing a dimension allows you to view the dimension and
make sure it’s correct before processing the rest of the cube. To process
a dimension:
