Advanced
Quantifying Influence in Social Networks and News Media
Quantifying Influence in Social Networks and News Media
Journal of Information and Communication Convergence Engineering. 2012. Jun, 10(2): 135-140
Copyright ©2012, The Korean Institute of Information and Commucation Engineering
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/li-censes/bync/ 3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • Received : February 21, 2012
  • Accepted : April 09, 2012
  • Published : June 30, 2012
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Hongwon Yun
hwyun@silla.ac.kr

Abstract
Massive numbers of users of social networks share various types of information such as opinions, news, and ideas in real time. As a new form of social network, Twitter is a particularly useful information source. Studying influence can help us better understand the role of social networks. The popularity of social networks like Twitter is primarily measured by the number of followers. The number of followers in Twitter and the number of users exposed to news media are important factors in measuring influence. We chose Twitter and the New York Times as representative media to analyze the influence and present an empirical analysis of these datasets. When the correlation between the number of followers in Twitter and the number of users exposed to the New York Times is computed, the result is moderately high. The correlation between the number of users exposed to the New York Times and the number of sections including the users on it, was found to be very high. We measure the normalized influence score using our proposed expression based on the two correlation coefficients.
Keywords
I. INTRODUCTION
Currently, one of the most notable micro-blogging services is Twitter. Twitter is used by many people as a tool for spreading their ideas, knowledge, or opinions to others. Users in Twitter are usually dubbed “twitterers,” and they can publish “tweets” through a variety of media to others. Twitter has been gaining huge popularity and also increased interest from researchers [1 - 4] . In recent years, interest among researchers has increasingly focused on whether or not influential people have the power to influence a large number of others in social networks [5 - 10] . Recently, Twitter users and applications have been considering a twitterer’s influence to be measured by the number of followers the twitterer has [11 - 13] . The popularity of a twitterer depends on the number of followers [6 , 9] . The question is, are people with many followers on Twitter influential people in their community? How many times have they have they received coverage in the news media? We are interested in identifying people who are influential in both social networks and the news media.
In this paper, we quantify the influence on Twitter and the news media based on influential twitterers and present an empirical analysis. To do this work, we gathered top users’ data from Twitter based on the number of followers. We then searched news articles in the online news media using keywords that were collected from top users’ names from Twitter. These two datasets gathered from Twitter and the news media were analyzed to find a good indicator of influence. To investigate the correlation between influential twitterers who were also influential in the news media, we evaluated the correlation coefficient. Using these values, the influence scores were measured by our proposed approach in order to obtain the value of influence on Twitter and in the news media.
The rest of this paper is organized as follows. The datasets were prepared for the purpose of this study. We provide an overview of the collected data and show the preliminaries in section II. Section III elaborates on the analysis of data and the methodology for estimating user influence. In section IV, we present the quantifying influence score and the empirical results. Finally, we summarize our research and suggest some directions for future work in section V.
II. DATA PREPARATION AND PRELIMINARIES
For the purpose of this study, a set of Twitter data was prepared on January 27, 2012. We collected the global top 100 users on Twitter. The global top 100 users have the largest number of followers worldwide. For the Twitter collections, we have removed all the organizations to select only individuals as influential twitterers. The top twitterers were ranked by the number of followers. After removing all organizations from top 100 users in Twitter, 80 individuals remained. We use the 80 individuals as our basic dataset in this study. In order to obtain the total number of news articles published that included each name in the global top 80 twitterers, we chose the New York Times (NYT) as our sample. We gathered news articles dating from January 28, 2011 to January 27, 2012 through the NYT’s search page.
PPT Slide
Lager Image
Rapid growth of the top twitterers’ followers.
As an example, it may be worthwhile to consider how many followers Lady Gaga lost or gained. With regard to the number of followers, the statistics can help us to discover a twitterer’s influence. Fig. 1 shows the rapidly increasing number of the most popular twitterer’s followers. For example, the number of Lady Gaga’s followers was 15,685,045 on November 11, 2011 and after 3 months it was 19,030,903 on February 10, 2012. Over those 3 months, the number of her followers increased by 3,345,858 as shown in Fig. 1 a. Fig. 1 b shows that a similar increase appeared in the case of President Barack Obama of the United States.
The top 5 users on Twitter ranked by the number of followers are shown in Table 1. The number following and tweets are also shown in Table 1. Even though all of the top 5 users have a large number of followers, they follow few others and have few tweets. There is a large difference between the number of followers and the number following and number of tweets. Fig. 2 shows the top Twitter users who are ranked by the cumulative number of followers, following, and tweets. As shown in this figure, the number of followers captures almost entirely the cumulative distribution.
Units for magnetic properties top 5 twitters ranked by the number of followers
PPT Slide
Lager Image
Units for magnetic properties top 5 twitters ranked by the number of followers
PPT Slide
Lager Image
Top twitterers ranked by the cumulative number of followers, following, and tweets in Twitter.
III. DATASET ANALYSIS
To understand the features of the datasets from Twitter and the NYT, we first analyze the datasets.
- A. Basic Analysis of Twitter Data
Fig. 3 displays the number following against the number of followers in Twitter; the inside figure shows the number following around 5,000,000 followers in detail. Barack Obama is ranked 8 by the number of followers, and has the most number of 683,249 following. On the other hand, Marshall Bruce Mathers III (Eminem) has the least number others he is following, and he is ranked 15 by the number of followers.
PPT Slide
Lager Image
Number of followers and number following for the top twitterers in Twitter.
PPT Slide
Lager Image
Number of followers and number of tweets for the top twitterers.
PPT Slide
Lager Image
Number following and number of tweets for the top twitterers
We plot the distribution between the number of followers and that of tweets in Fig. 4 . Fig. 5 shows the number of tweets against the number following; the inside magnified figure displays a scale of around 500 following. We can see that there is little correlation between the number of followers, the number of others following, and the number of tweets as shown in Figs. 3 - 5 and as pointed out previously.
- B. Basic Analysis of NYT Data
For each of the top 80 twitterers, in order to investigate the number of news articles covering them in the NYT, we queried by the name of each of the top 80 twitterers. Table 2 shows the number of news articles for each section. As we can see from the table, the largest number of news articles that covers the 80 twitterers is found in the Arts section. This means that many popular Twitter users are engaged in the arts. We can see a major difference between the total number of news article in the Arts section and that of others as shown in Fig. 6 .
Proportion of news articles in the New York Times queried by the top 80 twitters
PPT Slide
Lager Image
Proportion of news articles in the New York Times queried by the top 80 twitters
PPT Slide
Lager Image
Distribution of the number of news articles in each section in the New York Times (NYT) covering the top twitterers.
- C. Methodology for Estimating User Influence
- 1) The Spearman’s Rank Correlation Coefficient
It is not easy to determine whether influential twitterers are also influential or not, even in the news media. Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. In order to quantify correlation, we computed the correlation coefficient between the two different datasets. We used the relative order of users’ ranks as a value of the difference. Each value was sorted, so that the rank of 1 means the most influential user and increasing rank indicates a less influential user. Every user is assigned a rank for each influence measure. Here the Spearman’s rank correlation coefficient can quantify a user’s rank varies across different values. We use the Spearman’s rank correlation coefficient as a measure of the strength of the association between two rank sets:
PPT Slide
Lager Image
The variables x i and y i are the ranks of users based on two different influence measures in a dataset n . The closer p is to +1 or -1, the stronger the likely correlation. A perfect positive correlation is 1 and a perfect negative correlation is -1 [14 , 15] .
- 2) Influence Score
We can compute the influence scores of the users based on the Spearman’s rank correlation coefficient. To do so, we define the influence score ( S ) as follows:
PPT Slide
Lager Image
This value S represents the amount of influence between two pairs of ranked datasets. We call this value the influence score for the same domain. A value ri is a sorted rank within a dataset. The value rj represents is also ranked one within a same dataset. The correlation coefficient p(i,j) is a measure of the strength of the association between two ranked datasets. The variable N is a sample size. Likewise, rk and rl are the ranked value within another dataset. The correlation coefficient p(k,l) represents a measure of statistical dependence between another two ranked datasets. We use this equation to evaluate the correlation in section IV.
IV. QUANTIFYING INFLUENCE SCORES
We describe the correlation between the top users’ ranking in Twitter and their ranking in the NYT. Normalized influence scores are computed for each of the top twitterers and each of them in the NYT.
- A. Distribution of the Top Twitter Users and Exposure in NYT Articles
The top 80 twitterers ranked by the number of followers are showed in Fig. 7 a. We found the number of news articles that mentioned the names of the top 80 users and sorted the number of new articles. Many of the people who are exposed in the NYT relatively frequently can be referred to as celebrities. The distribution of the celebrities is shown in Fig. 7 b.
PPT Slide
Lager Image
Number of top users in Twitter and exposure of news articles in the New York Times (NYT).
- B. Comparing Twitter and NYT based on the Top Users in Twitter
The top 5 users by the number of followers and corresponding rankings by the number of news articles are listed in Table 3 . We can see that the high rankings in Twitter are not necessarily high in the NYT in Table 3 . Fig. 8 shows the number of news articles against the ranking by the number of followers. This figure means that the popularity of users in Twitter is not proportionate to the amount of exposure in the NYT.
Comparison of the top users between Twitter and New York Times
PPT Slide
Lager Image
Comparison of the top users between Twitter and New York Times
PPT Slide
Lager Image
Followers’ rankings for the top users in Twitter and number of news articles searched by each of them in the New York Times (NYT).
- C. Estimating Correlation Coefficient
In order to investigate how the two pairs correlate, we compare the relative influence ranks of all 80 users. We compute the correlation coefficient as shown in Table 4 by the Spearman’s rank correlation coefficient. The value 0.52 in Table 4 is the correlation coefficient between the number of followers in Twitter and the number of news articles in the NYT. We see a moderately high correlation (above 0.5) across all pairs.
Spearmans’s rank correlation coefficient
PPT Slide
Lager Image
Spearmans’s rank correlation coefficient
- D. Measuring Influence Score on Twitter and NYT
The popularity of a Twitter user can be easily measured by the number of followers. However, the number of followers in Twitter alone cannot be a measure for estimating influence. We computed the influence score using our proposed expression as defined in section III. The top 20 users by the influence score are listed in Table 5 . Lady Gaga ranks 1 after measuring the influence score and Barack Obama’s ranking went up in the second place due to the large amount of exposure in the NYT.
Top 20 users ranked by # of followers, top 20 users ranked by number of articles and top 20 celebrities ranked by influence scoreNYT: New York Times.
PPT Slide
Lager Image
Top 20 users ranked by # of followers, top 20 users ranked by number of articles and top 20 celebrities ranked by influence score NYT: New York Times.
IV. CONCLUSIONS
Social media has been growing explosively and provides users the opportunity to share various types information and knowledge in real time. Twitter is one of the most popular social networks on the internet. The popularity of these social networks can be measured by the number of followers. The influence, which is, the individual’s potential to lead others to engage in a certain act, can also be determined by many factors. For studying influence, we prepared a set of global top twitterers’ data and collected news articles mentioning each of their names from the NYT. The number of followers is a significant factor for measuring influence. The exposed number of users in the news media is an important factor used to estimate the influence. We chose Twitter and the New York Times as representative media to analyze the influence. To understand the features of these two datasets, we analyzed the datasets and presented the results. We show that the correlation between the number of followers and the number of top twitterers with strong exposure in the news media is moderately high. The correlation between the exposed number of top twitterers and the number of sections including them is very high. Our proposed expression using these two correlation coefficients computed the influence score for each of the top twitterers ranked by the number of followers. This normalized influence score opens the possibility for discovering influential individuals within all of the social networks and news media.
References
Kwak H. , Lee C. , Park H. , Moon S. 2010 "What is Twitter, a social network or a news media?" Proceedings of the 19th International Conference on World Wide Web Raleigh, NC 591 - 600
Zhao W. X. , Jiang J. , Weng J. , He J. , Lim E. P. , Yan H. , Li X. 2011 "Comparing twitter and traditional media using topic models," Proceedings of the 33rd European Conference on Advances in Information Retrieval Dublin, Ireland 338 - 349
Suh B. , Hong L. , Pirolli P. , Chi E. H. 2010 "Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network," Proceedings of the 2nd IEEE International Conference on Social Computing Minneapolis, MN 177 - 184
Wu S. , Hofman J. M. , Mason W. A. , Watts D. J. 2011 "Who says what to whom on twitter," Proceedings of the 20th International Conference on World Wide Web Hyderabad, India 705 - 714
Bakshy E. , Hofman J. M. , Mason W. A. , Watts D. J. 2011 "Everyone's an influencer: quantifying influence on twitter," Proceedings of the 4th ACM International Conference on Web Search and Data Mining Hong Kong, China 65 - 74
Weng J. , Lim E. P. , Jiang J. , He Q. 2010 "TwitterRank: finding topic-sensitive influential twitterers," Proceedings of the 3rd ACM International Conference on Web Search and Data Mining New York, NY 261 - 270
Kwak H. , Chun H. , Moon S. 2011 "Fragile online relationship: a first look at unfollow dynamics in twitter," Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems Vancouver, Canada 1091 - 1100
de Choudhury M. , Diakopoulos N. , Naaman M. 2012 "Unfolding the event landscape on twitter: classification and exploration of user categories," Proceedings of the ACM Conference on Computer Supported Cooperative Work Seattle, WA 241 - 244
Romero D. M. , Galuba W. , Asur S. , Huberman B. A. 2011 "Influence and passivity in social media," Proceedings of the 20th International Conference on World Wide Web Hyderabad, India 113 - 114
Cha M. , Haddadi H. , Benevenuto F. , Gummadi K. P. 2010 "measuring user influence in twitter: the million follower fallacy," Proceedings of the 4th International AAAI Conference on Weblogs and Social Media Washington, DC 10 - 17
Bakshy E. , Karrer B. , Adamic L. A. 2009 "Social influence and the diffusion of user-created content," Proceedings of the 10th ACM Conference on Electronic Commerce Stanford, CA 325 - 334
Fischer E. , Reuber A. R. 2011 "Social interaction via new social media: (How) can interactions on Twitter affect effectual thinking and behavior?" Journal of Business Venturing vol. 26 (no. 1) 1 - 18
Goyal A. , Bonchi F. , Lakshmanan L. V. S. 2010 "Learning influence probabilities in social networks," Proceedings of the 3rd ACM International Conference on Web Search and Data Mining New York, NY 241 - 250
Myers J. L. , Well A. 2003 Research Design and Statistical Analysis 2nd ed. Lawrence Erlbaum Association Mahwah, NJ
Maritz J. S. 1981 Distribution-Free Statistical Methods Chapman and Hall London