Generic Schema Matching With Cupid

MSR-TR-2001-58 |

Publication

Schema matching is a critical step in many applica­tions, such as XML message mapping, data ware­house loading, and schema integration. In this paper, we investigate algorithms for generic schema matching, outside of any particular data model or application. We first present a taxonomy for past solutions, showing that a rich range of techniques is available. We then propose a new algorithm, Cupid, that discovers map­pings between schema elements based on their names, data types, constraints, and schema structure, using a broader set of techniques than past approaches. Some of our innovations are the integrated use of linguistic and structural match­ing, context-dependent matching of shared types, and a bias toward leaf structure where much of the schema content resides. After describing our algo­rithm, we present experimental results that compare Cupid to two other schema matching systems. This is an extended version of a paper published at the 27th VLDB Conference.