# Getting and Processing Data from Twitter

COVER SHEET

2010 Fall Semester

Instructor: CLASTER William Class: Mathematics for Information Technology EA

Report: NodeXL Assignment

APM 1st year student 12410179 HOANG Nguyen Phong

Due Date: January 25, 2011 16:00

HOANG Nguyen Phong 12410179

Page 1

Mathematics for Information Technology EA

I.

Getting and Processing data from Twitter: Firstly, by “import tool” in NodeXL plug-in for Microsoft Excel, I searched for people whose tweets contains the key word “Julian Assange”, and ticked to all of three boxes “Follows”, “Replies-to”, and “Mentions relation” with the 100 people limitation so that I could get a stronger-connected graph. However, in some first searches, I could not get a reasonable graph as I expected due to 2 reasons. The first one is the limitation number was small, and I chose the box “I don’t have a twitter account” was another reason. And the graph was really weak-connected because there were only 4 or 5 edges and about over 90 single points. So I was not able to apply the “graph metrics” contents to consider these graphs. Then I registered to use twitter account to import more than 100 people data. This time I also used the key word “Julian Assange” but with the limitation up to 300 people. After that, I got 300 people whose tweets contain “Julian Assange”, and the graph was stronger-connected. Then I calculated all numbers by “Graph Metrics” tool, and removed some points which have zero degree in both of “in-degree” and “out-degree” in order that we can examine and deduce information more easily from the graph.

HOANG Nguyen Phong 12410179

Page 2

Mathematics for Information Technology EA

II.

Examining the graph: As for the Graph, under “Harel-Koren Fast Multiscale” algorithm and in “Layout Options” task, I chose the box “Move the graph’s smaller components into boxes at the bottom of the graph” with “Maximum size of the components to move” is 6, I got the below graph which has all the significant components appear clearly, and all the smaller components, which have less than 6 vertices, are moved to the bottom of the graph.

HOANG Nguyen Phong 12410179

Page 3

Mathematics for Information Technology EA

Then, according to the “Betweenness Centrality” numbers which is obtained from “Graph Metrics” tool, I labeled the “important” vertices in the Graph as below:

And basing on the above labeled graph and the results of the examination by “Graph metrics” tool in NodeXL, we can deduce many things: The important person or well-known organization The relevant things to the key word “Julian Assange” and some interesting information on their pages. Going to his page, I found that his field of expertise is online communities and their impact on search engine results, but he is also interested on the media and politics. Page 4

1. The twitter user with nick name “bloggerheads” is the most important person in this set of people, because his “Betweenness Centrality” point is HOANG Nguyen Phong 12410179

Mathematics for Information Technology EA

430, which is the largest in the graph. And he also has the largest “In-Degree” is 12.

Therefore, he has his own technic ways to obtain much information about communications, media and also politics which lead to the relation with the key word “Julian Assange” who revealed on media field many diplomatic secrets of the United State of American last year. This is the reason why many twitter users follow him in oder to up-date the lastest information.

2. The second important vertex is not a person; it is an account of an organization which has the “Betweenness Centrality” point is 284 and “In-Degree” is 7.

Surfing their page, it is found that they are disable people in UK, who want to reveal some hidden things which is hidden by government or an invisible force. Their page is for disable people and by disable people, so it tweets a lot about “Julian Assange” and his...

Please join StudyMode to read the full document