Identification of Key Nodes in Microblog Networks

ETRI Journal.
2016.
Feb,
38(1):
52-61

- Received : August 14, 2015
- Accepted : October 14, 2015
- Published : February 01, 2016

Download

PDF

e-PUB

PubReader

PPT

Export by style

Article

Metrics

Cited by

TagCloud

A microblog is a service typically offered by online social networks, such as Twitter and Facebook. From the perspective of information dissemination, we define the concept behind a spreading matrix. A new WeiboRank algorithm for identification of key nodes in microblog networks is proposed, taking into account parameters such as a user’s direct appeal, a user’s influence region, and a user’s global influence power. To investigate how measures for ranking influential users in a network correlate, we compare the relative influence ranks of the top 20 microblog users of a university network. The proposed algorithm is compared with other algorithms — PageRank, Betweeness Centrality, Closeness Centrality, Out-degree — using a new tweets propagation model — the Ignorants-Spreaders-Rejecters model. Comparison results show that key nodes obtained from the WeiboRank algorithm have a wider transmission range and better influence.
influence
refers to a user’s function on information spread.
Key nodes perform major roles in terms of a network’s structure and functions. Therefore, the mining and ranking of key nodes are important research topics — ones that can provide useful guidance for e-commerce and that can affect public sentiment.
Prior to the process of passing information within a microblog network (for example, when there is need to broadcast a message, such as in product marketing or event planning), the choice of a set of appropriate nodes helps both a given piece of information spread and prolongs the duration of such a spread. On the other hand, when there is a need to stop the spread of information, selective immunization of key nodes as well as monitoring and control can prove effective in this regard.
How to identify the key nodes of OSNs has become an interesting problem for researchers. To identify influential nodes on Twitter, Kwak and others
[1]
ranked users according to the number of followers and found that the result of this was similar to that obtained by using the PageRank algorithm. They also studied the topological characteristics of Twitter and its power as a new medium of information sharing.
Cha and others
[2]
presented an in-depth comparison of three measures of influence —
indegree
, retweets, and mentions — using a large amount of data collected from Twitter, and observed that popular users who have a high degree of indegree are not necessarily influential in terms of spawning retweets or mentions. Computational models are presented for quantifying social influence, and the corresponding algorithms of social influence analysis are applied to different social network data, such as Twitter, Weibo, and Slashdot
[3]
.
Zhang
[4]
provided a formal definition for the notion of
social influence locality
and developed two instantiation functions based on pairwise influence and structural diversity. Users’ behaviors are mainly influenced by close friends in their ego networks.
Walisa and Wichian
[5]
defined a number of factors for leadership in social networks and showed that those individuals who are central to social networks serve as opinion leaders. A fuzzy data mining algorithm to find association rules was proposed for analyzing and dividing message posts into quantitative values from which some interesting sequential patterns were obtained.
A new approach based on social network analysis and text mining has been presented
[6]
,
[7]
; it is capable of detecting opinion leaders in online communities. Cho and others
[8]
used a social network approach and threshold model to find effective opinion leaders in the diffusion of technological innovation. Li and Du
[9]
proposed an ontology-based opinion-leader identification framework for word-of-mouth marketing in online social blogs using the information retrieved from blog content, authors, readers, and their relationships. Unfortunately, all of these identification methods do not consider information diffusion rules in OSNs.
Kitsak and others
[10]
researched a method for ranking influential nodes within a network from a new perspective and found that the most efficient
spreaders
are those located in the inner core of a network, as identified by
k
-shell decomposition analysis. However, it should be noticed that the
k
-shell method assigns many nodes with the same
k
-core value even though they perform entirely differently in the spreading process, and the method itself did not serve well the tree diagram, regular network, and BA scale-free network.
Liu and others
[11]
presented an improved parameterless method to generate a ranking list to evaluate the spreading influence of any given node; the list could identify the spreading influence of nodes more accurately than those generated from degree, closeness centrality,
k
-shell, or mixed degree decomposition methods. However, this improved method only considered the impact of excess degree when decomposing a network and supposed nodes of the same level to have the same number of neighbors in the outer layer — a supposition that ultimately affects the accuracy of any ranking.
Kang and others
[12]
proposed a new notion of diffusion centrality in which semantic aspects of a graph as well as a diffusion model of how a diffusive property is spreading are used to characterize the centrality of vertices. The experiments, using real YouTube data, showed the diffusion centrality produced better quality results. However, the implementation method had higher computational complexity. Gao and others
[13]
proposed a local structural centrality measure to rank the spreading ability of nodes in complex networks, which considers both the number and the topological connections of the neighbors of a node. But, it was found to be only suitable for undirected networks and had some limitations for directed networks, such as microblog networks.
In this study, we select microblog networks from a number of OSNs as our research object and propose a new WeiboRank algorithm to rank the importance of each given node within these networks. It combines local metrics with global metrics based on information dissemination characteristics in microblog networks. The algorithm can effectively identify key nodes in microblog networks. To evaluate the effectiveness of the proposed algorithm, a new Ignorants-Spreaders-Rejecters (ISR) model is proposed, which is more consistent with the actual model of information propagation dynamics.
This paper is organized as follows. In Sections II and III, we briefly review traditional identification algorithms of key nodes, including the centrality algorithm based on social network analysis and the PageRank algorithm based on WEB search. In Section IV, we introduce the WeiboRank algorithm and the transmission mechanism based on the newly proposed ISR model. The model is used to evaluate and compare the performance of our method with other ranking algorithms, in Section V. Experimental results and discussions are also presented. We conclude this paper in Section VI.
k
, the DC of
k
is calculated as follows:
N
is the total number of network vertices and
d
(
k
) is the degree value of vertex
k
.
A high degree of centrality would indicate a node has a large number of connections with other nodes. Because users within a microblog network participate in two-directional relationships, in terms of representing this in a directed network, the degree of centrality can be expressed in two different ways
[16]
— in-degree (IDC) and out-degree (ODC). In terms of our method, in-degree is considered to represent the number of followers a user has, and out-degree the number of friends.
N
nodes, the CC of node
k
is defined as
d_{ki}
refers to the length of the shortest path from node
k
to node
i
in the same network. Nodes with a smaller total distance
i
, needs a different node,
k
, to reach a further different node,
j
, via the shortest path. Specifically, let
g_{ij}
be the total number of shortest paths between vertices
i
and
j
, and
g_{ij}
(
k
) the total number of these shortest paths that pass through node
k
. Then the BC of node
k
is given by
N
is the total number of vertices within the network. The normalized index of BC is defined as
A
, the user’s PageRank value is calculated based on the following two assumptions
[17]
: (a) the number assumption (that is, the more followers a user receives, the more important that user is) and (b) the quality assumption (that is, important users who follow a user, will pass on more weights to that user). Under the two assumptions above, the steps for the calculation using the PageRank algorithm are as follows:
The details of each round of PageRank calculations are as follows:
Mathematically, the PageRank algorithm can be expressed as follows:
p
_{1}
,
p
_{2}
, …,
p_{N}
are the network users,
N
is the total number of users,
M
(
p_{i}
) is the set of users that link to
p_{i}
,
L
(
p_{j}
) is the number of outbound links of user
p_{j}
, and
d
is the damping factor, which is usually set to 0.85.
Note that in the PageRank algorithm, a PageRank value is evenly distributed to outbound links; thus, the algorithm ignores the importance of the user itself.
Flow chart of WeiboRank algorithm.
following
and
followed
relationships among users. Because of such a network, information can then spread across the network.
In
Fig. 2
, the directions of the arrows indicate the relationships between users. That is to say that user C follows user B, and user B follows user A; equally, user C is a follower of user B, and user B is a follower of user A. After user A initiates and transmits a message,
M
, the message (called tweet) will appear on the microblog homepage of user B who is a follower of user A. After user B reads the tweet and retweets it, the tweet will appear again on the microblog homepage of user C. As such, the corresponding tweet-spread link is given by
M
(A) →
M
(B) →
M
(C). It can be seen that the path of microblog information is in the opposite direction to that which depicts the sense of
following
.
Simple directed relationship between users.
To characterize the relationships between the nodes in microblog networks, let
B
be an
N
×
N
matrix whose (
i
,
j
)th element,
b_{ij}
, is defined as follows:
Simple example of relationship network.
In summary, an information-spread matrix for a network can be obtained by taking the transpose of a corresponding relationship matrix for the same network. The actual spread of information is a subset of the information-spread matrix based on some transmission mechanism.
N
nodes in a microblog network and each node represents one registered microblog user. Depending on how users deal with microblog information, users can be divided into
ignorants
,
spreaders
, or
rejecters
. Ignorants are those who do not know of a piece of information; that is, they have not read a specific tweet. Spreaders are those who initiated the tweet, or who have read the tweet and then retweeted the read message. Rejecters are those who are not interested in the tweet; that is, who have read and not retweet the message. Spreaders and rejecters combined can be called
readers
. The initial condition in this model is to set all users to ignorants; that is, to grant them “
I
” status. At some time, a node posts a tweet and changes its status to that of a spreader; that is, they now have “
S
” status. According to the model shown in
Fig. 4
, this tweet will spread within the microblog network until the network has no spreaders. This status is called terminal status.
ISR spreading model.
Spread-Mechanism Analysis.
When a spreader posts a tweet, all of its followers can receive it in real time. Upon reading the tweet, those followers who find it interesting and want to share it with others retweet the same message. This process can be interpreted as that where the ignorant node meets the spreader node and becomes a spreader at a probability of
α
.
Those followers who read the tweet through following the tweet initiator and find it uninteresting will choose not to retweet it. This process can be interpreted as that where the ignorant node meets the spreader node and becomes a rejecter at a probability of
δ
. On the other hand, there are those followers that have not yet logged into their accounts and thus will have missed the tweet. As such, we have
α
+
δ
≤ 1.
After a user posts or retweets a tweet that has been browsed, the user is considered to have lost their associated retweet value. Because of this loss upon retweeting a post, the node is then deemed a rejector. This process is considered as follows: when a spreader posts or retweets a tweet which has been read by its followers, it becomes a rejector at a probability of
β
.
Complete topology of tweet spread network.
Below is a simple example illustrating how a WeiboRank value is calculated.
Figure 6
shows the transmission network for one tweet. The observed node is the node number 335. The value for direct capability of influence is the number of its followers, which is three, as shown in the figure. Its region of influence
R
is 17. The average information load for node 335 is (1 × 3 + 2 × 13 + 3 × 1)/17=1.88. Therefore, node 335 has a WR value of 5.65.
Transmission network for one tweet.
Corresponding relationship between CC and DC.
Corresponding relationship between CC and BC.

Firstly, the institutional microblog accounts within higher education communities have the dominant influence. Specifically, these institutions include the library; student communities; enrollment platforms; university alumni; Open Lectures Room program; and University BBS, MBA Education, Arts departments and colleges — all of which whose tweets have overwhelming fan bases and play key roles in the spreading of information. It is rather ironic that the universities’ own official microblogs (for example, FDU microblog account), which have been officially verified, do not have the same level of influence among the students and teaching staff as the previously mentioned institutions, although the official microblogs are perhaps extremely popular among fans outside of the respective universities. As such, the roles of the official microblogs are diminished significantly.
Secondly, smaller-scale, more specialized colleges should pay more attention to the influence of their own microblog accounts, which often represent some of their more well-known “star” teaching staff.
Amongst the Sina Weibo users from the Shanghai University of Foreign Language, a user named ‘Jiang Zhu Zi’ is a verified user, whose real identity is the deputy dean of the Law School of the university; the user named ‘Happy Wu Ying’ is also a Sina Weibo–verified user, whose real identity is an associate professor of the College of Journalism and Communication of the same university. It can be seen that the smaller, more specialized colleges have the more frequent interactions amongst their college staff and students. Stories and wisdom from “star” teaching staff are more likely to spread within communities that share a common background (for example, student communities and classes). In this way, the accumulation of reputation by “star” teaching staff might produce a so-called Matthew effect. In contrast, national and comprehensive universities are of a larger scale; thus, departmental differences are more apparent. This limits, to certain degree, the scope of influence by the “star” staff at such institutions. Consequently, their power of influence is significantly limited.
τ
,
[18]
to show the degree of difference between them. The closer the value of
τ
is to +1 or −1, the stronger is the likely correlation.
Table 2
shows the obtained
τ
values for the correlation between WR and BC, CC, ODC, or PageRank measure using the user data set as described previously. The results show that the lists of top 5 user ranking by the WeiboRank and PageRank algorithm have a strong correlation and a weak correlation by the WeiboRank and CC measure, respectively. The performance in
Table 2
shows that the
τ
-value for the best-case scenario is 0.6.

Based on an ISR microblog spread model with identical parameters (
α
= 0.3,
β
= 0.8,
δ
= 0.5),
Fig. 9
shows the corresponding curves of the density of propagation range, selecting the top
k
users who initiated tweets. Because the curves for Out-Degree and PageRank are almost on top of each other for the top
k
users under consideration, we omit one of them. From
Fig. 9
, overall, it can be seen that the WR, BC, and PageRank algorithms are clearly superior to the CC and ODC algorithms in terms of the density of scope of spread. The value calculated from the former group of algorithms is much bigger than that from the latter group. For the top ten users, the WeiboRank algorithm is the best, for it produces the largest density value. From the top ten to top twenty users, the Weibo Rank, Betweeness Centrality, and PageRank algorithms produce similar results. In actual applications,
k
-values are typically not big,
k
≤ 10 in general. Therefore, the WeiboRank algorithm has some very good benefits.
The WeiboRank method is very suitable when it is necessary to reach the largest amount of nodes in a short time.
Figure 9
shows that the inclusion of the top 10 nodes is generally enough in terms of propagation scope density. This is because a further increase in the considered number of top nodes does not increase the total number of nodes reached in a significant manner. The PageRank method only considers the first level of followers expressed by the in-degree. The WeiboRank algorithm extends the analysis to include more levels, trying to reach other significant nodes; that is, the PageRank algorithm focuses only on the first set of connections. The WeiboRank algorithm takes into account all the nodes of the network, providing a larger vision of all connections.
Comparing performance of five methods.
k
nodes based on the ISR model. For up to the top 10 users, the WeiboRank algorithm produced the largest propagation range density value.
In microblog networks, the mining of key nodes, which are typically institutional microblog accounts, field experts, and well-known public figures with the capability of detonating “an information explosion,” can be applied to the design of strategies for advertisement campaigns and marketing. This is in line with the “individual rule” — one of the three key factors stipulated by Gladwell in the setting off of a fashion explosion.
This work was supported by the National Nature Science Foundation of China (No. 61373084).
Corresponding Author lujingshiep@163.com
Jing Lu received both her BS degree in electrical engineering and her MS degree in information and communication engineering from China University of Mining and Technology, Xuzhou, China, in 2001 and 2004, respectively. She is currently a PhD candidate at the School of Communication and Information Engineering, Shanghai University, China. She has been an associate professor at the Shanghai University of Electric Power. Her current research interests include complex networks, data mining, and machine learning.
wanwg@staff.shu.edu.cn
Wanggen Wan received his PhD degree in information engineering from Xidian University, China, in 1992. From 1991 to 1992, he was a visiting scholar with the Computer Engineering Department, former Minsk Radioengineering Institute, Belarus, the USSR. He was a postdoctoral research fellow with the Information and Control Engineering Department of Xi’an Jiao-Tong University, China, from 1993 to 1995. From 1995 to 1997, he was an associate professor with the Electronic and Information Engineering Department of Shanghai University china, and was promoted to professor in 1998. He was a visiting scholar with the Electrical and Electronic Engineering Department of Hong Kong University of Science and Technology, Clear Water Bay, China, from 1998 to 1999. He was a visiting professor and section head at the Multimedia Innovation Center of Hong Kong Polytechnic University, Hung Hom, China, from 2000 to 2004. Since 2004, he has been with the School of Communication and Information Engineering, Shanghai University, China, where he is currently a professor, dean of the International Office, director of the Institute of Smart City, and program leader of the Circuit and Systems Department. He is a fellow of the IET and an IEEE senior member. He has been co-chairman for many international conferences since 2008. His research interests include data mining, signal processing, and computer graphics. He is a coauthor of approximately 200 academic papers.

I. Introduction

Key nodes in complex networks are those nodes that have more influence on network structure and features compared to other nodes. For example, in a malicious attack on a scale-free network, attacks on even a small number of key nodes will likely result in the collapse of the whole network.
The mining of key nodes is an important research topic and one that is deeply rooted in the field of information science; together with link prediction and information recommendation, they form the three key research areas in the field of network information mining. In recent years, as studies on network science move from the macro-level to the micro-level, the ordering and mining of key nodes become hot topics. There are many methods designed to assess the importance of network nodes; however, they are all derived from graph theory. In particular, data mining is based on graph theory in the form of finding the most critical nodes and edges within a network.
In research into online social networks (OSNs), nodes that are found to have the most influence in such networks are called opinion leaders. Consequently, influence analysis of these so-called opinion leaders is an important area in OSN research. With respect to the spread of viruses, rumors, or opinions, it is very important to find these relatively more influential nodes.
A microblog is a relatively new type of service typically offered by OSNs. Utilizing WEB 2.0 technologies, a microblog’s nodes are formed by individual beings and information between nodes is exchanged in a point-to-point transmission fashion on the web. It is a product designed primarily for the exchanging and spreading of information. A microblog is an Internet application with a strong property of “We media.” It changes people’s habits of information acquisition, and becomes an information sharing platform that combines many Internet properties such as instantaneity, convenience, and openness. Within a microblog platform, every user is a broadcaster, listener, and messenger. Using computers, mobile devices, and so on, individual users are able to broadcast whatever they see and feel, at whatever time and place, in the form of a brief message of no more than 140 words
[1]
, a sentence, a photo, a short video, or even a sign of emotion.
In microblog networks, key nodes have the power to influence. According to the Merriam-Webster Dictionary, influence is defined as ‘‘the power or capacity of causing an effect in indirect or intangible ways... .’’ In sociology, its definition is still vague
[2]
. In this paper,
II. Centrality Metrics

Centrality theory
[14]
, in statistics, is used to estimate the statistical parameters of the centralization degree of samples. Within the scope of computer networks, it is given a new purpose; that is, it can be applied to complex networks, such as microblog networks. In our method, a social network is represented by a graph. Entities are represented as network nodes and interactions between entities are represented as edges. In an abstract network structure, in terms of nodes and edges, the research on centrality is of both theoretical and practical value.
- 1. Degree Centrality (DC)

The DC
[15]
of a vertex is a simple local measure based on the notion of neighborhood. This measure is useful in the case of static graphs, for situations when we are interested in finding nodes that have the most direct connections to other nodes. Given a vertex,
(1) DC(k)= d(k) N−1 ,

where
- 2. Closeness Centrality (CC)

CC
[15]
is a global index that measures the steps it takes for a vertex to contact all other vertices in a network. For instance, assuming a network with
(2) CC(k)= N−1 ∑ i=1 N d ki k≠i,

where
∑ i=1 N d ki

are considered relatively more important. The shorter the average distance of a node to other nodes, the more direct and efficient is the node’s influence on other nodes. This is because there exist fewer intermediate nodes with which to relay its messages. We normally think of nodes with high CC scores as being well-positioned to obtain timely information as it arrives.
- 3. Betweenness Centrality (BC)

BC
[15]
is a graph analysis technique based on shortest-path enumeration for identifying key individuals in large-scale interaction networks. BC is defined as the share of times that a node,
(3) BC * (k)= ∑ i=1 N ∑ j=1 N g ij (k) g ij i≠j≠k,

where
(4) BC(k)= BC * (k) (N−1)(N−2)/2 .

The BC of a vertex measures the control that the vertex has over its communication links and its influence on the other nodes. As such, the BC index can be used to identify critical vertices. A higher BC value indicates that the vertex in question can reach other vertices on relatively shorter paths, or that the vertex lies on a significant fraction of shortest paths connecting pairs of other vertices.
III. PageRank Algorithm

The PageRank algorithm was originally proposed by Larry Page, one of the founders of Google and was used exclusively by Google’s search engine. It measures the importance of website pages relative to other website pages. The number of links to a given website page plays an important role.
The main idea behind the PageRank algorithm is that “more important websites are likely to receive more links from other websites”
[17]
. The PageRank algorithm has two underlying assumptions — a number assumption and a quality assumption. In this paper, the PageRank algorithm is used for microblog networks. Microblog users follow others or are followed. For a given user in the network,
- 1) Initialization step: form a relationship network through “follower–friend” by users — every user has the same PageRank value.
- 2) After certain rounds of recursive calculations, every user will get a final PageRank value. All users will receive theirrenewPageRank values after each round of calculations.

- 1) Each user distributes its PageRank value evenly to the outbound links that it follows. In this way, each outbound link will receive the same weight.
- 2) Each user updates its PageRank value by adding all the weights from inbound links of followers that it receives. After every user in the network updates its PageRank value, that round of calculations is completed.

(5) PR( p i )= 1−d N +d ∑ p j ∈M( p j ) PR( p j ) L( p j ) ,

where
IV. WeiboRank Algorithm

A flow chart for the WeiboRank algorithm is shown in
Fig. 1
. After constructing a spreading matrix, we will calculate the WR value of nodes in the microblog networks. Then we will evaluate the performance of the WeiboRank algorithm by ranking nodes based on ISR model.
PPT Slide

Lager Image

- 1. Spreading Matrix

In a computer-based analysis of microblog networks, the first problem encountered is how to represent a network. A microblog user-relationship network represents the
PPT Slide

Lager Image

(6) b ij ={ 1 message flow from vertex i to j, 0 otherwise.

So, the spreading matrix corresponding to
Fig. 3
is given as
B=[ 0 0 0 1 1 0 1 0 1 0 0 0 1 0 1 0 ].

PPT Slide

Lager Image

- 2. ISR Model

Assume there are
PPT Slide

Lager Image

- 3. WeiboRank Value of Node

Before our description of the WeiboRank (WR) algorithm, we define the following parameters:
- 1) Direct capability of influence,F1(v):Figure 5shows the complete topology of a microblog network. In fact, it represents the following relationships amongst users. Uservis at the center of the network. According to the distance of information travel, we collect all nodes that are a distance of one or less from the center and form a concentric circle; namely, theN1layer. It is clear that the number of nodes represents the number of followers that uservhas, which is denoted byF1(v). The value ofF1(v) indicates the direct capability of influence of uservfrom the local metric point of view. For instance, if uservhas three followers, then theN1layer will contain three nodes, and its value of direct capability of influenceF1(v) is equal to three.
- 2) Region of influence,R: In general, the more nodes connected directly or indirectly tov, the wider the region of influence along the information path that includes nodev. Using the same method as in 1), we collect all nodes that are within a distance of two, and form anN2layer. It can be seen that all the nodes in theN2layer form the collection of all the followers of the users inF1(v), denoted byF2(v). This process continues until the biggest concentric circle,NMlayer, is formed, in which all nodes areleafnodes. The further the information-spread path travels, the more profound the influence of nodev. Supposing that a tweet has spread across the whole of a network and all follower nodes have had a chance to read the sent or retweeted tweet from userv, we may define the region of influence of uservbyR=∑i=1MFi(v).The valueRrepresents the number of nodes that the tweet information sent from uservhas reached. InFig. 5, there are eight nodes in layerN2; hence, the value ofF2(v) is equal to eight. The region of influence of uservin this instance is thusR= 3 + 8 + 15 = 26.
- 3) A user’s WeiboRank value is defined as the product of the value of direct capability of influence and the average information load. Assuming a tweet is sent from userv, its WR value is as follows:

(7) WR(v)= F 1 (v)⋅ ∑ j=1 R d vj /R,

- where nodejis the node that the tweet can reach according to network-spreading matrixB, anddvjis the distance between nodesvandj. Note that∑j=1Rdvj/Rcan be considered as the average information load, which measures a node’s power of influence from the global perspective of the whole network. The higher the WeiboRank value is, the greater is the user’s influence; hence, its position is deemed to be more critical.

PPT Slide

Lager Image

PPT Slide

Lager Image

V. Experiment and Results Analysis

- 1. Measurement of Centrality

To investigate the effectiveness of the proposed WeiboRank algorithm, we need to access a complete, actual data set of some particular community of microblog users; for example, at a university. A Weibo search tool is limited in terms of the number of users obtained and the presented data might not be necessarily true. As such, a Weibo API is used in conjunction with the search tool.
The Sina Weibo application programming interface (API) is an open interface, just like Twitter API. We take the following steps to obtain a university user community at both the institutional and the individual level. Firstly, we select ten institutional users relevant to the university by means of the Sina Weibo search tool. Secondly, we find all of the followers of the ten institutional users by Sina API functions. Finally, amongst these followers, we simultaneously select those users who belong to more than three fan groups. Experiments (to be presented below) show that the choice of three fan groups is an appropriate number considering the total number of available users.
We use Python language to acquire 3,131 valid Sina Weibo users from Fudan University (FDU) and 2,098 users from Shanghai International Studies University (SISU) through Sina Weibo API functions. These users include both institutional and individual users. The complete user data sample was collected in December 2012. From the graph of FDU (SISU) microblog network, the node number is 3,131 (2,098) and the edge number is 47,376 (46,729). The centrality metrics of all 3,131 nodes of the FDU microblog network are calculated.
Figure 7
reveals the correspondence between CC and DC. The CC value of nodes with large DC value is relatively large, while the distribution of CC of nodes with a small DC value is wide. This indicates that those nodes with a small DC value do not always have a small influence on other nodes in the microblog network. The corresponding relationship between CC and BC, as shown in
Fig. 8
, is similar to the relationship between CC and DC.
PPT Slide

Lager Image

PPT Slide

Lager Image

- 2. Top Ranking by WeiboRank Method

For each user in the microblog network, we calculate its WeiboRank value and rank the results in terms of influence.
Table 1
gives the user screen names of the top 15 most important key nodes in both the FDU and the SISU microblog networks. The corresponding analysis of these ranked lists is as follows.
Screen name of top 15 users by WeiboRank value.

Ranking | Screen name in FDU | Screen name in SISU |
---|---|---|

1 | Fudan library official microblog | SISU employment guidance center |

2 | FDU star forum | SISU BBS |

3 | FDU student union | SISU alumni |

4 | Fudan UTV | SISU thinking forum |

5 | Fudan BBS | SISU GSU |

6 | Foreign affairs office | SISU student union |

7 | FDU press | SISU young alumni community |

8 | Shanghai FDU alumni association | Jiang Zhu Zi |

9 | FDU germanic school | SISU MBA education center |

10 | FDU youth league committee | SISU young news center |

11 | FDU MTA project | Happy Wu Ying |

12 | FDU international politics department | ISUS |

13 | FDU MPAcc project | SISU JC express |

14 | FDU alumni association | SISU overseas cooperation school |

15 | FDU graduate student enrollment | SISU students’ association union |

- 3. Comparison of Relative Influence Ranks

To find the effectiveness of the WeiboRank algorithm, we compare it with existing well-known measures in terms of relative influence ranks. Rather than comparing different relative influence ranks directly, we use Kendall’s coefficient,
Kendall τ rank correlation coefficients.

Correlation | Top 5 | Top 10 | Top 15 | Top 20 |
---|---|---|---|---|

WR vs. PageRank | 0.6000 | 0.1111 | 0.2488 | 0.3219 |

WR vs. BC | −0.4000 | −0.2444 | 0.1429 | 0.1263 |

WR vs. CC | −0.4000 | −0.4222 | 0.0286 | −0.0737 |

WR vs. ODC | −0.4000 | −0.5556 | −0.1238 | −0.1053 |

- 4. Comparison Based on ISR Model

In microblog networks, we consider the scope of spread as the criterion for key nodes. Specifically, after a user posts a tweet, how many times the tweet was read in the process of information spread is a feasible criterion. In a microblog network, it is of interest to know how many users, in the end, have read the tweet. In the whole process of tweet spreading, the number of spreaders first increases, then decreases, and then reaches zero when the tweet no longer continues to be spread. At this time, the microblog network reaches a terminal state and has only ignorants and rejectors. The density of the propagation range can be represented by the density of rejectors in the ISR model. Taking a density value of 0.6, as an example, it means that 60% of microblog users have read the tweet in the end.
Table 3
shows the userIDs of the top 20 microblogs using five key-node mining algorithms — PageRank, Betweeness Certrality, Closeness Centrality, Out-degree, and WeiboRank. The values in
Table 3
represent the userIDs of microblog users selected from different ranking algorithms. Once a user has successfully registered for a microblog service, their registered account is assigned an ID unique to the whole network; that is, every microblog account has its own unique ID identifying the owner/user. A user’s screen name can be changed, but not its ID number associated with the account.
Top 20 userIDs using five mining algorithms.

Ranking | PageRank | BC | CC | ODC | WR |
---|---|---|---|---|---|

1 | 2094660043 | 1700066312 | 2009286791 | 1700066312 | 2094660043 |

2 | 2062166941 | 2052377882 | 2092671335 | 1729332983 | 1961500343 |

3 | 1979720530 | 1997205417 | 2806360034 | 1666928707 | 1949791100 |

4 | 1944846581 | 2091852760 | 2381807622 | 2005733113 | 1928683063 |

5 | 2052377882 | 1802587971 | 2253168707 | 1997205417 | 1869954915 |

6 | 1961500343 | 1949418640 | 2032492724 | 1802587971 | 2062166941 |

7 | 2499036843 | 1961500343 | 2125816775 | 1847229792 | 1748944243 |

8 | 1896361524 | 2430380332 | 1933329405 | 1949418640 | 2052377882 |

9 | 1869954915 | 1727640434 | 1780917364 | 2761453384 | 1984915141 |

10 | 1690054162 | 2005733113 | 2042719623 | 2870088300 | 1979720530 |

11 | 1874428952 | 1874428952 | 2938303623 | 1441630795 | 2113303782 |

12 | 1949791100 | 1869954915 | 1700066312 | 2586501737 | 1944846581 |

13 | 1802587971 | 2016427591 | 1729332983 | 1884015017 | 1976281511 |

14 | 1764257785 | 1979720530 | 1802587971 | 2605300910 | 1700066312 |

15 | 1960631320 | 1668524297 | 1666928707 | 1760040537 | 2168710354 |

16 | 1833155991 | 1760040537 | 1949418640 | 1704767831 | 1895633571 |

17 | 1925326447 | 1704767831 | 2761453384 | 1727640434 | 1941191692 |

18 | 1732449142 | 2062166941 | 2605300910 | 2091852760 | 1690054162 |

19 | 1621865890 | 1915525114 | 2430380332 | 1901940004 | 1833155991 |

20 | 1987326232 | 1885375684 | 1727640434 | 1006856393 | 2271643410 |

PPT Slide

Lager Image

VI. Conclusion

This paper presented a new algorithm to identify key nodes in microblog networks. The proposed WeiboRank algorithm combines local metrics with global metrics based on tweet propagation characteristics. To simulate the actual spreading process of tweets, a new transmission dynamics model known as an ISR model was presented. Using an actual data set from the Sina Weibo network, we compared the WeiboRank method with existing well-known methods for ranking a user’s relative influence. The results show that there is a strong correlation between the top five users by the WeiboRank and PageRank algorithms. Comparison results included the spreading ability of the top
BIO

Kwak H.
“What is Twitter, a Social Network or a News Media?”
Int. Conf. World Wide Web
Raleigh, NC, USA
Apr. 26-30, 2010
591 -
600

Cha M.
“Measuring User Influence in Twitter: The Million Follower Fallacy,”
Int. AAAI Conf. Weblogs Soc. Media
Washington, DC, USA
May 23–26, 2010
10 -
17

Sun J.
,
Tang J.
“Models and Algorithms for Social Influence Analysis,”
ACM Int. Conf. Web Search Data Mining
Rome, Italy
Feb. 4–8, 2013
775 -
776

Zhang J.
“Social Influence Locality for Modeling Retweeting Behaviors,”
Int. Joint Conf. Artif. Intell.
Beijing, China
Aug. 3–9, 2013
2761 -
2767

Walisa R.
,
Wichian P.
“Applying Mining Fuzzy Sequential Patterns Technique to Predict the Leadership in Social Networks,”
Int. Conf. ICT Knowl. Eng.
Bangkok, Thailand
Jan. 12–13, 2012
134 -
137

Song K.
“Detecting Opinion Leader Dynamically in Chinese News Comments,”
Lecture Notes Comput. Sci.
Wuhan, China
Sept. 14–16, 2011
197 -
209

Freimut B.
,
Carolin K.
“Detecting Opinion Leaders and Trends in Online Communities,”
Int. Conf. Digit. Soc.
St. Maarten, Netherlands
Feb. 10–16, 2010
124 -
129

Cho Y.
,
Hwang J.
,
Lee D.
2012
“Identification of Effective Opinion Leaders in the Diffusion of Technological Innovation: A Social Network Approach,”
Technol. Forecasting Soc. Change
79
(1)
97 -
106
** DOI : 10.1016/j.techfore.2011.06.003**

Li F.
,
Du T.C.
2011
“Who is Talking? An Ontology-Based Opinion Leader Identification Framework for Word-of-Mouth Marketing in Online Social Blogs,”
Decision Support Syst.
51
(1)
190 -
197
** DOI : 10.1016/j.dss.2010.12.007**

Kitsak M.
2010
“Identification of Influential Spreaders in Complex Networks,”
Nature Physics
6
(11)
888 -
893
** DOI : 10.1038/nphys1746**

Liu J.-G.
,
Ren Z.-M.
,
Guo Q.
2013
“Ranking the Spreading Influence in Complex Networks,”
Physica A: Statistical Mechanics Appl.
392
(18)
4154 -
4159
** DOI : 10.1016/j.physa.2013.04.037**

Kang C.
“Diffusion Centrality in Social Networks,”
IEEE/ACM Int. Conf. Adv. Soc. Netw. Anal. Mining
Istanbul, Turkey
Aug. 26–29, 2012
558 -
564

Gao S.
2014
“Ranking the Spreading Ability of Nodes in Complex Networks Based on Local Structure,”
Physica A: Statistical Mechanics Appl.
403
130 -
147
** DOI : 10.1016/j.physa.2014.02.032**

Friedkin N.E.
1991
“Theoretical Foundations for Centrality Measures,”
American J. Sociology
96
(6)
1478 -
1504
** DOI : 10.1086/229694**

Borgatti S.P.
2005
“Centrality and Network Flow,”
Soc. Netw.
27
(1)
55 -
71
** DOI : 10.1016/j.socnet.2004.11.008**

Mislove A.
“Measurement and Analysis of Online Social Networks,”
ACM SIGCOMM Internet Meas. Conf.
San Diego, CA, USA
Oct. 24–26, 2007
29 -
42

Kandiah V.
,
Shepelyansky D.L.
2012
“PageRank Model of Opinion Formation on Social Networks,”
Physica A: Statistical Mechanics Appl.
391
(22)
5779 -
5793
** DOI : 10.1016/j.physa.2012.06.047**

Kendall M.G.
1938
“A New Measure of Rank Correlation,”
Biometrika
30
(1–2)
81 -
93
** DOI : 10.1093/biomet/30.1-2.81**

Citing 'Identification of Key Nodes in Microblog Networks
'

@article{ HJTODO_2016_v38n1_52}
,title={Identification of Key Nodes in Microblog Networks}
,volume={1}
, url={http://dx.doi.org/10.4218/etrij.16.0115.0732}, DOI={10.4218/etrij.16.0115.0732}
, number= {1}
, journal={ETRI Journal}
, publisher={Electronics and Telecommunications Research Institute}
, author={Lu, Jing
and
Wan, Wanggen}
, year={2016}
, month={Feb}