Data Mining Project on IMDB Website

Only available on StudyMode
  • Download(s) : 414
  • Published : April 22, 2011
Open Document
Text Preview
Data Mining Project on IMDB website
The Internet Movie Database (IMDb) is an online database of information related to movies, television shows, stars, etc. We chose to do our project from 2008 to 2011 year’s movie database. We extracted data like Movie, Director, Star, Image Url, Studio from the IMDb website. For this extraction of data we used a tool named Mozenda. After the data extraction, the data was analyzed. For a particular star, his/her movie, director, studio with whom the star has worked was shown. A Graphical User Interface (GUI) for the same was developed. According to this GUI, when the user selects a Star his/her respective movies, directors, studios are displayed. A graph for the extracted data is also shown. For this a tool named NodeXL is used. This graph is having star and movie as the nodes and an edge is the relation between the star and the movie which shows that the star has worked in the movie and vice versa.


This tool was used to extract the web data. In the Mozenda agent builder, the url was entered. The website page gets loaded in the agent builder. One can navigate through the pages from where to extract the data. We chose to extract data from January 2008 to April 2011. So the url for January 2008’s webpage ( was entered. After the January 2008’s webpage is loaded, start new Agent from this page on the agent builder is clicked. As we have to extract the same set of data like movie name, director, image, studio for each movie, Create list of items on the agent builder is clicked. The movie names of the first two movies on the webpage are selected. Then a dialog box appears. A respective filed name like Movie is given. Same procedure is repeated for Director, Studio, Image Url. As we want to extract same type of data from multiple pages, Add list pager on the agent builder is clicked and then next month is clicked. Now the software knows what all data is to be extracted from multiple pages. To extract the data from multiple pages Test agent is clicked and then Start is selected on the agent builder. The software will extract Movie name, Director, Studio, Image url for each movie from all pages from January 2008 to April 2011. Once the data is extracted, it is saved. The data can be exported on to the desktop in an excel sheet or can be sent to an email address.


The extracted data is in the excel sheet. It is also imported in Microsoft office Access by clicking on external data tab on top in Ms Access sheet and then by clicking on Excel and by giving the path of the excel sheet. Visual basic is used for graphical user interface. Click Alt+F11 on the excel sheet to go to the visual basic form.References from tools tab are selected and Microsoft ActiveX Data Objects 2.8 Library is added. A connection with the Microsoft office Access database is made. Sql query is used to extract the data from the database. The GUI is having a combo box or a drop down box which is having list of stars. There is a submit button, and three list boxes for Movie, Director, Studio. When the user selects a star from the drop down list and clicks on the submit button, a query is fired to extract the respective movie name, director, studio of the selected star. All this is displayed in their respective list boxes.

Screen Shot 1: Graphical User Interface (GUI)
When the user selects the star and clicks on the submit button, the respective movie, director, studio the star has worked with is displayed.

'The following code executes when the submit button is clicked

Private Sub cmd_submit_Click()

Dim Con As ADODB.Connection 'Creating reference to connection object Dim rs1 As ADODB.Recordset 'Creating reference to recordsets object Dim rs2 As ADODB.Recordset

Dim rs3 As ADODB.Recordset...
tracking img