# Machine Learning Week 6

Topics: Machine learning, Principal component analysis, K-means clustering Pages: 17 (4020 words) Published: June 22, 2013
Programming Exercise 7: K-means Clustering and Principal Component Analysis Machine Learning

Introduction
In this exercise, you will implement the K-means clustering algorithm and apply it to compress an image. In the second part, you will use principal component analysis to ﬁnd a low-dimensional representation of face images. Before starting on the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics. To get started with the exercise, you will need to download the starter code and unzip its contents to the directory where you wish to complete the exercise. If needed, use the cd command in Octave to change to this directory before starting this exercise.

Files included in this exercise
ex7.m - Octave/Matlab script for the ﬁrst exercise on K-means ex7 pca.m - Octave/Matlab script for the second exercise on PCA ex7data1.mat - Example Dataset for PCA ex7data2.mat - Example Dataset for K-means ex7faces.mat - Faces Dataset bird small.png - Example Image displayData.m - Displays 2D data stored in a matrix drawLine.m - Draws a line over an exsiting ﬁgure plotDataPoints.m - Initialization for K-means centroids plotProgresskMeans.m - Plots each step of K-means as it proceeds runkMeans.m - Runs the K-means algorithm [ ] pca.m - Perform principal component analysis 1

[ [ [ [ [

] ] ] ] ]

projectData.m - Projects a data set into a lower dimensional space recoverData.m - Recovers the original data from the projection findClosestCentroids.m - Find closest centroids (used in K-means) computeCentroids.m - Compute centroid means (used in K-means) kMeansInitCentroids.m - Initialization for K-means centroids

indicates ﬁles you will need to complete Throughout the ﬁrst part of the exercise, you will be using the script ex7.m, for the second part you will use ex7 pca.m. These scripts set up the dataset for the problems and make calls to functions that you will write. You are only required to modify functions in other ﬁles, by following the instructions in this assignment.

Where to get help
We also strongly encourage using the online Q&A Forum to discuss exercises with other students. However, do not look at any source code written by others or share your source code with others. If you run into network errors using the submit script, you can also use an online form for submitting your solutions. To use this alternative submission interface, run the submitWeb script to generate a submission ﬁle (e.g., submit ex7 part2.txt). You can then submit this ﬁle through the web submission form in the programming exercises page (go to the programming exercises page, then select the exercise you are submitting for). If you are having no problems submitting through the standard submission system using the submit script, you do not need to use this alternative submission interface.

1

K-means Clustering

In this this exercise, you will implement the K-means algorithm and use it for image compression. You will ﬁrst start on an example 2D dataset that will help you gain an intuition of how the K-means algorithm works. After that, you wil use the K-means algorithm for image compression by reducing the number of colors that occur in an image to only those that are most common in that image. You will be using ex7.m for this part of the exercise. 2

1.1

Implementing K-means

The K-means algorithm is a method to automatically cluster similar data examples together. Concretely, you are given a training set {x(1) , ..., x(m) } (where x(i) ∈ Rn ), and want to group the data into a few cohesive “clusters”. The intuition behind K-means is an iterative procedure that starts by guessing the initial centroids, and then reﬁnes this guess by repeatedly assigning examples to their closest centroids and then recomputing the centroids based on the assignments. The K-means algorithm is as follows: % Initialize centroids centroids = kMeansInitCentroids(X, K); for...