A. Varde, E. Rundensteiner, M. Maniruzzaman, R. D. Sisson, Worcester Polytechnic Institute, Worcester, MA
Summary: Data Mining consists of finding interesting patterns in large datasets to guide decisions. This paper describes a technique “AutoDomainMine” that performs data mining guided by basic domain knowledge to discover more advanced knowledge. The data being mined comes primarily from heat treating experiments. It involves input conditions of quenching experiments and the resulting graphs, namely, heat transfer curves which are a plot of heat transfer coefficients versus part temperature. Since heat transfer coefficients characterize quenching experiments, estimating a heat transfer curve given the input conditions assists in decision-making. For instance, the estimated curves could serve as the input to simulation tools for analysis. The AutoDomainMine approach integrates two data mining techniques clustering and classification into a learning strategy. It first clusters graphical results of existing experiments, i.e., heat transfer curves, and then uses classification to learn the clustering criteria, i.e., input conditions characterizing the clusters. The learned criteria are used to design cluster representatives that help to classify unseen data. These serve as the basis for estimating the results of new experiments given the input conditions. The AutoDomainMine approach gives higher accuracy than state-of-the-art techniques such as similarity searching. It is being further enhanced by distance metric learning to capture the semantics of graphs and by designing domain-specific cluster representatives as better classifiers for estimation