Spam Analysis: Analysis Of Naïve Bayes

• Spam Detector: Spam is an area where text search is performed within discussion forums with the purpose of finding those opinions that are not expected or useful for the discussed issue. These reviews are created for the sole purpose of advertisement or deceiving the customers falsely. Machine learning approach has been widely studied in spam detection which have already been compared and studied [46]. This module will look for the spam reviews and filter out the relevant ones from the irrelevant ones. Since, words are repeated many a times in the fake reviews. Keyword stuffing is a technique which is used in our work by checking the review sentences with the keywords used by the spammers. Also, the threshold is set which checks to find if …show more content…
This will be done by using the automatic POS Tagger.
• Opinions categorizer: Next is to categorize these opinions into positive and negative category by using the Naive Bayes. Naïve Bayes Classifier is a well known probabilistic classifier which describes its application to text. In order to incorporate unlabelled data, the foundation Naïve Bayes was build. The task of learning of a generative model is to estimate the parameters using labeled training data only. The estimated parameters are used by the algorithm to classify new documents by calculating which class the generated the given document belongs to. The probabilities of the positive and negative count are found according to the nouns (features) using Naive Bayes classifier [47].

The algorithm for Naive Bayes Classifier is given in Table 1.2:

Input: Sentences {s1 + s2 + s 3 + ...... s n} divided into List of words (to-kens) words = {w1 + w 2 + w 3 + ...... w n} where i=1,2,3....n

Database : Naive Table Td
Positive words : {pw1 + pw 2 + pw 3 +......p w n}
Negative words: {nw1 + nw 2 + nw 3 +.....nw …show more content…
1.4.

Fig. 1.4. Detailed architecture

1.3 Results and Analysis
The following section describes the data set used in our experiments and the results obtained.

1.3.1 Dataset Description

The customer review dataset of a product is used for our analysis. The reviews are collected from the various social networking sites like www.facebook.com, www.amazon.com, www.sitejabber.com etc. Opinions may contain complete sentences as reviews or shot comments or may be rated as stars with date and time. LG LED television product reviews are used in our work. These opinions are categorized into individual sentences. The dataset used in the proposed system is shown in Table 1.3.
Table 1.3. Corpus Details
S No. Corpus LG LED Television
1 Opinions 150
2 Total Sentences 460
3 Positive Sentences 252
4 Negative Sentences 143
5 Total Opinion as sentences 395
6 Percentage 85.86%

1.3.2 Evaluation

The performance of the system is evaluated on the basis of Precision, Recall and F-Measure [48]. Precision is the fraction of extracted reviews that are relevant. Recall is the fraction of relevant reviews that are extracted. and F-Measure is the measure of the overall results accuracy

Spam Analysis: Analysis Of Naïve Bayes

You May Also Find These Documents Helpful

Panacetin Essay

Panacetin Essay

PartOneFlameTest1 CreateandcompleteadatatableforPartOneofthelabItshouldinclude

PartOneFlameTest1 CreateandcompleteadatatableforPartOneofthelabItshouldinclude

Nt1310 Unit 3 Study Essay

Nt1310 Unit 3 Study Essay

Lab 1 labpaq

Lab 1 labpaq

Separating Mixtures Project

Separating Mixtures Project

Its a Gas Lab

Its a Gas Lab

Bio Lab 1

Bio Lab 1

An Mrp Solution for Riordan Manufacturing

An Mrp Solution for Riordan Manufacturing

lab 5

lab 5

Tension Lab

Tension Lab

How to Analyze a Web Page

How to Analyze a Web Page

Semantic vs. Syntactic tools in Sentiment Analysis

Semantic vs. Syntactic tools in Sentiment Analysis

Film Review: Sentiment Analysis Of Movie Review

Film Review: Sentiment Analysis Of Movie Review

Syntax and Parsing

Syntax and Parsing

Lexical Approach for Sentiment Analysis in Hindi

Lexical Approach for Sentiment Analysis in Hindi

Related Topics