Instance-Based Clustering for Databases

Matthew Merzbacher, Wesley W. Chu


We present a method for automatically clustering similar attribute values in a database system spanning multiple domains. The method constructs a value abstraction hierarchy for each attribute using rules that are derived from the database instance. The rules have a confidence and popularity that combine to express the "usefulness" of the rule. Attribute values are clustered if they are used as the premise for rules with the same consequence. By iteratively applying the algorithm, a hierarchy of clusters can be found. The algorithm can be improved by allowing domain expen direction during the clustering process.

Full Text: