arXiv:1211.5481v1 [astro-ph.IM] 23 Nov 2012
Department of Physics, University Federico II, Via Cinthia 6, I-80126 Napoli, Italy Department of Computer Engineering and Systems, University Federico II, Via Claudio 21, I-80125 Napoli, Italy 3 INAF, Astronomical Observatory of Capodimonte, Via Moiariello 16, I-80131 Napoli, Italy 4 Visiting Associate, California Institute of Technology, Pasadena, CA 91125, USA 2
Abstract. We present a multi-purpose genetic algorithm, designed and implemented with GPGPU / CUDA parallel computing technology. The model was derived from a multi-core CPU serial implementation, named GAME, already scientiﬁcally successfully tested and validated on astrophysical massive data classiﬁcation problems, through a web application resource (DAMEWARE), specialized in data mining based on Machine Learning paradigms. Since genetic algorithms are inherently parallel, the GPGPU computing paradigm has provided an exploit of the internal training features of the model, permitting a strong optimization in terms of processing performances and scalability. Keywords: genetic algorithms, GPU programming, data mining
Computing has started to change how science is done, enabling new scientiﬁc advances through enabling new kinds of experiments. They are also generating new kinds of data of increasingly exponential complexity and volume. Achieving the goal of being able to use, exploit and share most eﬀectively these data is a huge challenge. The harder problem for the future is heterogeneity, of platforms, data and applications, rather than simply the scale of the deployed resources. Current platforms require the scientists to overcome computing barriers between them and the data . The present paper concerns the design and development of a multi-purpose genetic algorithm implemented with the GPGPU/CUDA parallel computing technology. The model comes out from the machine learning supervised paradigm, dealing with both regression and classiﬁcation scientiﬁc problems applied on massive data sets. The model was derived from the original serial implementation, named GAME (Genetic Algorithm Model Experiment) deployed on the ⋆
corresponding author, firstname.lastname@example.org
Cavuoti et al.
DAME  Program hybrid distributed infrastructure and made available through the DAMEWARE  data mining (DM) web application. In such environment the GAME model has been scientiﬁcally tested and validated on astrophysical massive data sets problems with successful results . As known, genetic algorithms are derived from Darwin’s evolution law and are intrinsically parallel in its learning evolution rule and processing data patterns. The parallel computing paradigm can indeed provide an optimal exploit of the internal training features of the model, permitting a strong optimization in terms of processing performances.
Data Mining based on Machine Learning and parallel computing
Let’s start from a real and fundamental assumption: we live in a contemporary world submerged by a tsunami of data. Many kinds of data, tables, images, graphs, observed, simulated, calculated by statistics or acquired by diﬀerent types of monitoring systems. The recent explosion of World Wide Web and other high performance resources of Information and Communication Technology (ICT) are rapidly contributing to the proliferation of such enormous information repositories. Machine learning (ML) is a scientiﬁc discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data. A learner can take advantage of examples (data) to capture characteristics of interest of their unknown underlying probability distribution. These data form the so called Knowledge Base (KB): a...