Parallel Data Mining and Assurance Service Model Using...

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Aditya Jadhav, Mahesh Kukreja
E-mail: aditya.jadhav27@gmail.com & mr_mahesh_in@yahoo.co.in

Abstract : In the information industry, huge amount of data is widely available and there is an imminent need for turning such data into useful information. This need is fulfilled by the process of exploration and analysis, by automatic or semi-automatic means, of large quantities of data provided by Data Mining. In case of a single system with few processors, there are restrictions on the speed of processing as well as the size of the data that can be processed at a time. The speed as well as the limit on the size of the data to be processed can be increased if data mining is carried out in parallel fashion with the help of the coordinated systems connected in LAN. But the problem with this solution is that LAN is not elastic, i.e. the number of systems in which the work is to be distributed on basis of the size of the data to be processed cannot be changed. Our main aim is to distribute data to be analyzed in various nodes in cloud. For optimum data distribution and efficient data mining as per user’s desire, various algorithms must be implemented.

3.

Elasticity: Computing resources can be rapidly increased or decreased as needed, as well as released for other uses when they are no longer required. Pay as you go: Remittance for only the resources actually used and for only the time used must be done.

4.

1.2 Virtualization In computing, the creation of a virtual (rather than actual) version of something, such as a hardware platform, operating system, a storage device or network resources is known as Virtualization. Virtualization can be viewed as part of an overall trend in enterprise IT that includes autonomic computing, a scenario in which the IT environment will be able to manage itself based on perceived activity, and utility computing, in which computer processing power is seen as

References: [1] Eucalyptus. The Eucalyptus Open-source Cloudcomputing System. http://open.eucalyptus.com/ documents / ccgrid2009.pdf [2] Hadoop Wiki http://wiki.apache.org/hadoop/ [3] Dell. Introduction to Hadoop http://content.dell.com/ us/en/business/d/business~solutions~whitepapers~en /Documents~hadoop-introduction.pdf.aspx [4] Storage Conference. The Hadoop Distributed File System http://storageconference.org/ 2010/ Papers/ MSST/Shvachko.pdf [5] A Tutorial on Clustering Algorithms. K-Means Clustering http://home.dei.polimi.it/matteucc/ Clustering/ tutorial_html/kmeans.html [6] International Journal of Computer Science Issues. Setting up of an Open Source based Private Cloud http://ijcsi.org/papers/IJCSI-8-3-1-354-359.pdf [7] Eucalyptus. Modifying a prepackaged image http://open.eucalyptus.com/participate/wiki/modifyi ng-prepackaged-image [8] Michael G. Noll. Running Hadoop On Ubuntu Linux (Single-Node Cluster) http://www.michaelnoll.com/tutorials/running-hadoop-on-ubuntu-linuxsingle-node-cluster/ [9] 8K Miles Cloud Solutions. Hadoop: CDH3 – Cluster (Fully-Distributed) Setup http://cloudblog.8kmiles.com/2011/12/08/hadoopcdh3-cluster-fully-distributed-setup/ [10] Apache Mahout. Creating Vectors from Text https://cwiki.apache.org/MAHOUT/creatingvectors-from-text.html [11] Amgad Madkour Blog. KMeans Clustering Using Apache Mahout http://amgadmadkour.blogspot.in /2012/07/kmeans-clustering-using-apachemahout.html  ISSN (Print): 2278-5140, Volume-1, Issue – 2, 2012 36

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

You May Also Find These Documents Helpful

MAT2 Task 2 Executive Summary

MAT2 Task 2 Executive Summary

Nt1330 Unit 3 Problem Analysis Paper

Nt1330 Unit 3 Problem Analysis Paper

CMGT 445 Learning Team B Paper

CMGT 445 Learning Team B Paper

Cis 500- Cloud Computing

Cis 500- Cloud Computing

Hadoop Discrimination Research Paper

Hadoop Discrimination Research Paper

Cloud Security Report

Cloud Security Report

Business Value of Cloud Computing

Business Value of Cloud Computing

Business Trend Memo

Business Trend Memo

Shivanand R Koppalkar BIAM 530 Week 5 Cloud Computing and SDLC Assignment

Shivanand R Koppalkar BIAM 530 Week 5 Cloud Computing and SDLC Assignment

Itm 501 Cloud Computing

Itm 501 Cloud Computing

Cyber Security

Cyber Security

Cloud Computing

Cloud Computing

Cost Estimation

Cost Estimation

What is Cloud Storage

What is Cloud Storage

book

book

Related Topics