GMD - German National Research Center for Information Technology Schloss Birlinghoven, Sankt-Augustin, D-53754 Germany email@example.com http://allanon.gmd.de/and/
Abstract. Data mining methods are designed for revealing signiﬁcant relationships and regularities in data collections. Regarding spatially referenced data, analysis by means of data mining can be aptly complemented by visual exploration of the data presented on maps as well as by cartographic visualization of results of data mining procedures. We propose an integrated environment for exploratory analysis of spatial data that equips an analyst with a variety of data mining tools and provides the service of automated mapping of source data and data mining results. The environment is built on the basis of two existing systems, Kepler for data mining and Descartes for automated knowledge-based visualization. It is important that the open architecture of Kepler allows to incorporate new data mining tools, and the knowledge-based architecture of Descartes allows to automatically select appropriate presentation methods according to characteristics of data mining results. The paper presents example scenarios of data analysis and describes the architecture of the integrated system.
The notion of Knowledge Discovery in Databases (KDD) denotes the task of revealing signiﬁcant relationships and regularities in data based on the use of algorithms collectively entitled ”data mining”. The KDD process is an iterative fulﬁllment of the following steps : 1. Data selection and preprocessing, such as checking for errors, removing outliers, handling missing values, and transformation of formats. 2. Data transformations, for example, discretization of variables or production of derived variables. 3. Selection of a data mining method and adjustment of its parameters. 4. Data mining, i.e. application of the selected method. 5. Interpretation and evaluation of the results. In this process the phase of data mining takes no more than 20 % of the total workload. However, this phase is much better supported methodologically D.J. Hand, J.N. Kok, M.R. Berthold (Eds.): IDA’99, LNCS 1642, pp. 149–160, 1999. c Springer-Verlag Berlin Heidelberg 1999
Gennady Andrienko and Natalia Andrienko
and by software than all others . This is not surprising because performing of these other steps is a matter of art rather than a routine allowing automation . Lately some eﬀorts in the KDD ﬁeld have been directed towards intelligent support to the data mining process, in particular, assistance in the selection of an analysis method depending on data characteristics [2,4]. A particular case of KDD is knowledge extraction from spatially referenced data, i.e. data referring to geographic objects or locations or parts of a territory division. In analysis of such data it is very important to account for the spatial component (relative positions, adjacency, distances, directions etc.). However, information about spatial relationships is very diﬃcult to represent in discrete, symbolic form required for the data mining methods. Known are works on spatial clustering  and use of spatial predicates , but a high complexity of data description and large computational expenses are characteristic for them.
Integrated Environment for Knowledge Discovery
For the case of analysis of spatially referenced data we propose to integrate traditional data mining instruments with automated cartographic visualization and tools for interactive manipulation of graphical displays. The essence of the idea is that an analyst can view both source data and results of data mining in the form of maps that convey spatial information to a human in a natural way. This oﬀers at least a partial solution to the challenges caused by spatially referenced data: the analyst can...