Social Network Analysis

Click a node to see its neighbours. Select size strategy below.

Email Volumes:

Size:

Analyse:

This email data was collected by US federal investigators in the wake of the Enron collapse. It is a useful real data set that has been studied by many researchers.

When looking at email data from an organisation a common task is to try to work out individual roles and organisational hierarchies just from the metadata of email traffic directions.

Sub-teams within the organisation stand out as clusters of connected nodes in the chart, but identifying important individuals requires a little more analysis.

Weighted Links

Clicking on Email Volumes sizes link widths based on the number of emails sent between individuals. This highlights a prominent individual - Bill Williams - in a group on the edge of the chart. Bill is a prolific emailer, and probably the manager of a team. The team don't seem to be engaging with him though, as hardly any of them write back.

You can follow a link from Bill to Timothy who is more central. This is probably Bill's boss. Following the thicker links from Timothy, we see that his boss was probably either John, Louise or Kevin.

Centrality Analysis

KeyLines has some useful network algorithms that can be used to score the email accounts in different ways. We can use these scores to modify the size of nodes, highlighting influential individuals.

See the Graph Centrality documentation for more detail on different centrality measures.

Degrees counts the number of directly connected nodes each node has.

Closeness measures how may steps each node is from every other node in the graph. It helps identify 'broadcasters', or people who have good influence over the network.

Betweenness counts the number of shortest paths all nodes are on, identifying important 'bridges' in the network. In sociology, this has been shown to correlate with seniority in organisations.

Eigenvector centrality is a measure of influence that takes into account the number of links each person has and the number of links their connections have, and so on throughout the network. It is an effective measure of influence in social networks and malware propagation.

PageRank identifies important nodes by counting incoming links and weighting according to the relative scores of their originating nodes. It helps identify nodes which are indirectly influential to the network.