|
Model Management
People
Current and recent collaborators include:
The Vision
The goal of model management is to develop a generic infrastructure that offers an order-of-magnitude productivity improvement to builders of model-driven applications, such as database tools, application design tools, message translators, and customizable commercial applications. Today's model-driven applications include much object-at-a-time programming on relational schemas, DTDs, web-site structures, E/R diagrams, UML models, etc. The main ideas behind model management are that 1. such object-at-a-time programming can be abstracted as high-level operations on models (i.e., schemas) and mappings between models, and 2. these operations can be made independent of the data model and application of interest, that is, generic. The key operations of model management are:
Our vision paper [3] for model management motivates the problem, gives an example application scenario, and discusses the main structures and operations. The first full implementation of model management is described in [5]. An Example
To see how model management operations might be used, consider the problem of populating a data warehouse. Suppose we have a model S1 of a data source and a mapping map1W from S1 to a model W of a data warehouse. Now we are given a model S2 of a second data source which is known to be similar to S1. We can integrate S2 into the data warehouse as follows (see the figure below):
1. Match S2 and S1, yielding a mapping map21;
Step (1) characterizes those parts of S2 that are the same as S1. Step (2) reuses map1W by applying it to those parts of S2 that are the same as S1. In [2], we described scenarios like this in detail, to show how to use model management operations to solve some practical data warehousing problems. Matching Models
A model management implementation needs mappings. So the first operation of interest is the one that generates mappings, namely Match. We started by assembling a survey of research literature on schema matching [6]. Using this survey as guidance, we integrated some existing approaches and new ones into a hybrid algorithm, called Cupid [4]. Cupid uses properties of individual elements, linguistic information about names, structural similarities, key and referential constraints, an external thesaurus, and context-dependent matching. We have found Cupid to be immediately useful to some tools, even without the other model management operations. For example, a graphical tool for defining a mapping between XML message types can use Cupid to produce a draft mapping, which the user can then review and refine. Formal Semantics
Since Model Management is meant to be generic, we need a data model independent way to characterize its semantics. We are using category theory for this purpose. A category is an abstract mathematical structure consisting of a set of uninterpreted objects and morphisms (i.e., transformations) between them. We apply it to Model Management by representing the models (or schemas) of a data model as a category and the mappings between models as morphisms between objects of a schema category. We used this categorical approach in a study of the effect of schema integration on integrity constraints. Category theory facilitates reasoning about structure. Using it, we can answer such questions as "how can new models be constructed from existing ones?" and "how can structures be decomposed or mapped into more elementary components?" The Future
The development of model management is at an early stage. Other topics worth investigating are the semantics and implementation of Merge and Compose, properties of translations from standard data models into a generic one, and the application of model management to other practical problems. References
1. Alagic, S. and P.A. Bernstein, "A Model Theory for Generic Schema Management," DBPL '01, (PDF, 284KB). 2. Bernstein, P.A. and E. Rahm, "Data Warehouse Scenarios for Model Management," ER2000 Conference Proceedings, Springer-Verlag, pp. 1-15 (PDF, 374KB). 3. Bernstein, P.A. "Applying Model Management to Classical Meta Data Problems," Proc. CIDR 2003, pp. 209-220 (PDF, 185KB). 4. Madhavan, J., P. A. Bernstein, and E. Rahm, "Generic Schema Matching Using Cupid," VLDB '01, (PDF, 140KB) Extended version: MSR-TR-2001-17. 5. Melnik, S., E. Rahm, P. A. Bernstein, "Rondo: A Programming Platform for Generic Model Management," Proc. SIGMOD 2003, pp. 193-204 (PDF, 344KB) 6. Rahm, E., and P. A. Bernstein, "On Matching Schemas Automatically," VLDB Journal 10, 4 (Dec. 2001), (PDF, 192KB). The original publication is available on LINK at http://link.springer.de. Last updated: Aug. 8, 2003 |