Analyzing the social networks by finding page rank, betweenness and closeness centrality, degree, etc through programming requires a lot of coding (time consuming) and graphical representation of such large datasets is a challenge. Generating network statistics and metrics and creating visualizations of network graphs is made easy by using the tool NodeXL provided by the Microsoft in the familiar network of Microsoft Excel as a small add-in. In this part of the project we found the node with the highest pagerank, form communities by k-degree algorithm, reducing the graph size using degree values of each node by exploring the features provided by the NodeXL. Wiki-Vote Dataset:
This dataset has 1036398 edges and 7115 vertices. The graphical representation of the dataset using NodeXL is:
The visual graph shows that there are a lot of unconnected edges with the other nodes. Our first objective is to reduce the graph. The reduced graph has edges and vertices with degree higher than 2 and in-degree and out-degree for each vertex>=1.This increased the connectivity between the vertices in the graph.
Reducing the graph:
As this is a directed graph, some nodes might not have edges directed to it(in-degree) or directed from it(out-degree).Here in-degree means the number of votes the node received and out-degree is the number he voted for other node. Through NodeXL, we found the nodes with zero in-degree and zero out-degree. This can be done by NodeXL. In the excel sheet NodeXL menu , * Select type of graph as directed from the drop down menu. * Select Graph metrics in the graph menu.
* Select in-degree, out-degree, etc in the menu to calculate overall graph metrics which can be used later for future analyzation or just the in-degree and out-degree.
* After calculating in-degree and out-degree, using the excel formula we make the visibility of the nodes with zero in-degree or zero out-degree to “Skip”. The edges with visibility “Skip” are skipped from the graphical representation. Formula used for visibility column: =IF(OR([@[In-Degree]]=0,[@[Out-Degree]]=0),"Skip","")
* Now the vertices are skipped. For the edges to skip, we used “Vlookup” in excel from vertices worksheet and changed the visibility to “Skip” if the edge has the vertex with visibility “Skip”. Formula: =VLOOKUP([@[Vertex 1]],Vertices[[Vertex]:[Visibility]],7,FALSE) (looking for vertices in “Fromnodeid”), =VLOOKUP([@[Vertex 2]],Vertices[[Vertex]:[Visibility]],7,FALSE)(looking for vertices in “Tonodeid”) * The nodes decreased from 7115 to 1214 and edges from 103689 to 37242.Visual representation of the graph vertices and edges is done by clicking the show graph in the NodeXL graph menu.
* NodeXL allows us to visualize the graph in different layouts but these layouts are clumsier than Harel_Koren Fast Multiscale and Frutcherman-Reingold layouts. We used Harel_Koren Fast Multiscale in our project. Different layouts drawn by NodeXL are: * Frutcherman-Reingold
* Horizontal Sine Wave
* Vertical Sine Wave
* Polar Absolute
* Grouping the vertices can be done in 3 different ways.
* Group by Vertex Atrribute
* Group by Connected component
* Group by clusters
* Group by clusters can be done by 3 diffferent alogorithms.
* We formed communities in 2 different ways one is by grouping the vertex attribute. This option in NodeXL allows us to include any column in the vertices worksheet(degree, in-degree, out-degree, betweenness and closeness centrality ,pagerank, etc). Numbers are specified in the” start new group at these values” to divide into groups. Group metrics are calculated from the graph metrics in the analysis tab.
* The column “Collapsed?” in the Groups...