Data Mining Apriori Algorithm

Only available on StudyMode
  • Topic: Machine learning, Nearest neighbor search, K-nearest neighbor algorithm
  • Pages : 14 (3501 words )
  • Download(s) : 303
  • Published : December 28, 2011
Open Document
Text Preview
Recommended Systems using Collaborative Filtering and Classification Algorithms in Data Mining Dhwani Shah 2008A7PS097G

Mentor – Mrs. Shubhangi Gawali



1 BITS – Pilani, K.K Birla Goa

INDEX S. No. 1. 2. 3. 4. 5. 6. 7. 8. 9. Topic Introduction to Recommended Systems Problem Statement Apriori Algorithm Pseudo Code Apriori algorithm Example Classification Classification Techniques k-NN algorithm Determine a good value of k References Page No. 3 5 5 7 14 16 19 24 26


1. Introduction to Recommended Systems
Recommended Systems form a specific type of information filtering system technique that attempts to recommend information items (movies, TV program/show/episode, video on demand, music books, news, images, web pages, scientific literature such as research papers etc.) that are likely to be of interest to the user.. Recommendations can be based on demographics of the users, overall top selling items, or past buying habit of users as a predictor of future items.

Collaborative Filtering (CF)
It is the most successful recommendation technique to date. The basic idea of CF-based algorithms is to provide item recommendations or predictions based on the opinions of other like-minded users. The opinions of users can be obtained explicitly from the users or by using some implicit measures. Collaborative filtering techniques collect and establish profiles, and determine the relationships among the data according to similarity models. The possible categories of the data in the profiles include user preferences, user behavior patterns, or item properties Everyday Examples of Collaborative Filtering... • • • • Bestseller lists Top 40 music lists The “recent returns” shelf at the library Many weblogs

Challenges of collaborative filtering. • The lack of the information would affect the recommendation results. For the relationship mining, new items not-yet-rated or not-yet-labeled can be abandoned in the recommendation processes. • Collaborative filtering does not cover the extreme case. If the scales of the user profiles are small or the users have unique tastes, similarity decisions are unable to be established. • If any new information of users has to be included in the recommendation processes in real time, data latency will increase the waiting time for the query result. The complexity of the computation for the recommendation affects the waiting time of the user directly. • Synchronization is another issue of the profile updates in the system. When hundreds of users query the system within a very short time period.


Explicit vs. Implicit Data Collection In order to make any recommendations, the system has to collect data. The ultimate goal of collection the data is to get an idea of user preferences, which can later be used to make predictions on future user preferences. There are two ways to gather the data. The first method is to ask for explicit ratings from a user, typically on a concrete rating scale (such as rating a movie from one to five stars). The second is to gather data implicitly as the user is in the domain of the system - that is, to log the actions of a user on the site. Explicit data gathering is easy to work with. Assumedly, the ratings that a user provides can be directly interpreted as the user's preferences, making it easier to make extrapolations from data to predict future ratings. However, the drawback with explicit data is that it puts the responsibility of data collection on the user, who may not want to take time to enter ratings. On the other hand, implicit data is easy to collect in large quantities without any extra effort on the part of the user. Unfortunately, it is much more difficult to work with since the goal is to convert user behavior into user preferences. Of course, these two methods of gathering data are not mutually exclusive. A combination of the two have the possibility for the best overall results - one could gain the advantages of explicit...
tracking img