# Tic-Tac-Toe - Data Mining

My goal is to find the probability of wining in tic-tac-toe game given that you make the first move. To obtain hypothesis bases on my goal I have to state some conditions and facts on the game. They are: 1) There are 362, 880 ways of placing O’s and X’s. 2) When X make first move, possibility of X winning is 131,184, O winning is 77, 904, and 46, 080 tied games (Source: http://en.wikipedia.org/wiki/Tic-tac-toe). After eliminating rotations and/or reflections of other outcomes, there are only 138 unique outcomes. X won 91 times, O won 44 times and 3 ties (Source: http://en.wikipedia.org/wiki/Tic-tac-toe). Basically, the win of X is the concept. There are 8 possible ways of creating three X in row. Based on this my hypothesis states: Hypothesis

“If X makes the first move then the probability of the player with X will win is 60% and above.” Null Hypothesis

“If X makes the first move then the probability of the player with X will win is less than 60%.” Data Collection and Preparation

To prove or refute the hypothesis, data has to be collected. As we all know this step requires a great amount of time and effort. Also in order to build an effective model a data mining algorithm must be presented with a few hundred or few thousands relevant/applicable records. As mentioned above there are thousands of winning combinations, I have collected datasets with 958 instances which encodes set of possible board configurations at the end of tic-tac-toe game given X makes the first move. The data set for tic-tac- toe board end game was taken from UCI machine learning repository website (Source: http://archive.ics.uci.edu/ml/datasets/Tic-Tac-Toe+Endgame). The data was in a command delimited text file without attributes name on the top. After downloading the data in CSV format, I converted into excel spreadsheet using the options available from MS Excel. Since the attributes name was missing I have to read the data description file which was provided at the same website to relate the values to its attributes. Before converting into excel spreadsheet I added the attributes name on top of the values in same format as the values. Since abbreviation for attributes has been used, I am providing detailed description of attribute in the below table: Serial NoAttribute AbbreviationAttribute DescriptionValue 1T-LTop-Left-Square(x, o, b)

2T-MTop-Middle-Square(x, o, b)

3T-RTop-Right-Square(x, o, b)

4M-LMiddle-Left-Square(x, o, b)

5M-MMiddle-Middle-Square(x, o, b)

6M-RMiddle-Right-Square(x, o, b)

7B-LBottom-Left-Square(x, o, b)

8B-MBottom-Middle-Square(x, o, b)

9B-RBottom-Right-Square(x, o, b)

10ClassClass (Output)Positive, Negative

(x=Player x has taken, o=Player o has taken, b=blank)

I have scanned data for missing values but could not find any. Looking at the data I can say that we have solid set of data to do mining. This dataset will help us to prove or refute the hypothesis. Data Mining

To do data mining there are many technique and tools available in the market. Choosing a data mining technique requires lot of factors into consideration. A formal statement as per our text book reads: Given a set of data containing attributes and values to be mined together with information about the nature of the data and the problem to be solved, determine an appropriate data mining technique. So to choose the data mining technique I looked at the attributes which is all had categorical data and we already have a predefined output. So the appropriate technique for this kind of data set is “Supervised Learning”. I have used the ESX model provided by the iDA for data mining. I choose ESX over neural network because of the categorical nature of the data. I am just going to walk through the process of supervised learning using iDA. I opened the excel spreadsheet which contained data, then I added two rows below the attribute name representing the nature of the data and the other one representing whether it...

Please join StudyMode to read the full document