Only available on StudyMode
  • Topic: Mammography, Machine learning, Data mining
  • Pages : 24 (3460 words )
  • Download(s) : 85
  • Published : January 2, 2013
Open Document
Text Preview
Semi-Supervised K-Means Clustering for Outlier
Detection in Mammogram Classification
K. Thangavel1, A. Kaja Mohideen2
Department of Computer Science, Periyar University, Salem, India 1,

Abstract— Detection of outliers and relevant features are
the most important process before classification. In this
paper, a novel semi-supervised k-means clustering is
proposed for outlier detection in mammogram
classification. Initially the shape features are extracted
from the digital mammograms, and k-means clustering is
applied to cluster the features, the number of clusters is
equal with the number of classes. The clusters are
compared with original classes, the wrongly clustered
instances are identified as outliers and they are removed
from the feature space. A novel Genetic Association Rule
Miner (GARM) is applied with this reduced feature set to
construct the association rules for classification. The
performance is analyzed with rough set using Receiver
Operating Characteristic (ROC) curve analysis. The
mammogram images from MIAS (Mammogram Image
Analysis Society) and DDSM (Digital Database for
Screening Mammography) were used to evaluate the
K eywords- Mammogram; k-Means Clustering; Shape Features;
Outlier Detection.

Mammography is currently the most effective imaging
modality for breast cancer screening. Computer Aided (CA)
diagnosis systems have been developed to aid radiologists in detecting mammographic lesions, characterized by promising
performance [1]. Various CA diagnosis algorithms have been
proposed for the characterization of microcalcifications (MCs), an important indicator of malignancy [2-4]. These algorithms are based on extracting image features from regions of interest (ROIs) and estimating the probability of malignancy for a

given MC cluster. A variety of computer-extracted features and classification schemes have been used to automatically
discriminate between benign and malignant MC clusters. The
majority of these studies have followed two approaches. The
first approach is based on computer extracted
morphology/shape features of individual MCs or of MC
clusters [5-9], since morphology is one of the most important clinical factors in breast cancer diagnosis. CAD schemes that employ the radiologists’ ratings of MCs morphology have also been proposed [10-11]. The second approach employs texture

features extracted from ROIs containing MC clusters [12-16]. Some studies have compared morphological vs. textural
features but the results are differentiated with respect to the

978-1-4244-9008-0/10/$26.00 ©2010 IEEE


features investigated, the classifiers used and datasets analyzed. A combination of both morphological and textural features has also been studied, providing promising results in breast cancer diagnosis [7, 16]. In this paper we have considered only the shape features. One of the most important steps for the

classification task is extracting suitable features capable of distinguishing between classes. There have been great efforts spent on extracting appropriate features from micro
calcification clusters [17, 18]. In order to reduce the complexity and to increase the performance of the classifier the redundant and irrelevant features are reduced from the original feature set. Rough set theory, proposed by Pawlak [19, 20], has been

proven to be an effective tool for feature selection, rule
extraction and knowledge discovery from categorical data in
recent years. In this paper, a novel semi-supervised k-means clustering is implemented to detect the irrelevant instances. Initially the mammogram images are preprocessed for
removing film artifacts, noises and the suspicious regions are segmented from the enhanced image and the shape features are extracted from the segmented region. From this feature set the outliers are identified and removed with semi-supervised kmeans clustering. These...
tracking img