Today vast amount of data is generated, compiled and kept in information repositories such as databases and data warehouses. Present information technology developed enough and powerful to retain any amount of data in an orderly manner. This paper deals with data mining process, more specifically with knowledge discovery. Notwithstanding, discovering applicable patterns, tendency, principles, relationships and deviations in great amounts of data, and making significant forecasts form it, yet, remains one of the primary challenges of the information era.
Key words: Knowledge discovery, data mining, components of knowledge discovery process
Modern business highly relies on data. Data turns to be a strategic asset for companies, which are obliged to stay vivid and competitive, similarly to scientific institutions, government, and society as the whole. The data comes from everywhere, ranging from purchases in local shop, satellite images, sensors to Internet. For instance, the Amazon collects and stores a huge amount information of their customers who visit its site; the NASA’s satellites generate terabytes of data every hour. This data is not only in the form of traditional numbers and text, but also sounds, images, and videos. Database and data warehouse technologies; including transactional, scientific and engineering, legacy and spatial, time-series text, and object-oriented databases; computer hardware and software; and automated data collection tools are nowadays so mature and powerful that they can store any amount of data in an organized and efficient way. Transformation, aggravation, analysis and synthesis should be done in order to discover crucial and interesting pattern. Therefore, knowledge discovery (KD) in databases is a promising solution and this paper is focused on explanation of how knowledge discovery is organized and contribute to organization. The article organized as follow Section 2 defines KD concept, Section 3 describes the process of knowledge discovery. In Section 4 overview of all components of the knowledge discovery process is provided. Finally, Section 5 provides conclusions and section 6 references.
Knowledge discovery is the nontrivial extraction of implicit, previously unknown, and potentially valid information from data. Knowledge discovery describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data. It is often described as drawing out knowledge from the input data. Knowledge discovery evolved out of the Data mining domain, and is closely related to it both in terms of methodology and terminology. The most well-known branch of data mining is knowledge discovery, also known as Knowledge Discovery in Databases (KDD). Just as many other forms of knowledge discovery it creates abstractions of the input data. The knowledge obtained through the process may become additional data that can be used for further usage and discovery. . Knowledge discovery from existing software systems, also known as software mining is closely related to data mining, since existing software artifacts contain large value for risk management and business value, key for the evaluation and evolution of software systems. Instead of mining individual data sets, software mining focuses on metadata, such as process flows (e.g. data flows, control flows, & call maps), architecture, database schemas, and business rules/terms/process.
Knowledge discovery process
Knowledge discovery could be visualized on the picture below.
On the first step issues, concerns, and general objectives, a problem description evolves. After that this leads to a problem specification with quantifiable measures for later test and evaluation. Resourcing
The second stage of the KD process focused on creation of a suitable dataset on...