Using a Data Metric for Offering Preprocessing Advice in Data Mining Applications
Published: 1998 August
Buchtitel: Proceedings of the 13th European Conference on Artificial Intelligence (ECAI '98), Brighton, UK, August 23-28, 1998
This paper describes research that is performed in the course of a project where a methodology for providing user support for KDD processes plays a central role. Although methodologically we aim at supporting the whole process of applying inductive learning techniques, the current paper focusses on a part of this process. The main issue in this paper is the support of data preprocessing for KDD. One of our experiences is that preprocessing of data possibly is the most time consuming part of Data Mining applications, and is almost always included in an application of machine learning. We give some insights in the metadata we calculate from a dataset as part of the method for user support and focus on how metadata can be used to guide preprocessing definition in combination with top down task decomposition. DCT (Data Characterisation Tool) is implemented in a software environment (Clementine). Some examples are given that resulted from running the UGM/DCT (User Guidance Module combined with DCT) on the data. Finally we will provide some examples based on UCI datasets and consider the improvements we made w.r.t. other approaches as well as what we gained using this extension to our User Guidance Module (UGM) for user support.