Analyzing Tweet Cluster Using Standard Fuzzy C Means Clustering

Banerjee, Soumya; Badr, Youakim; Al-Shammari, Eiman Tamah

doi:10.1007/978-3-319-02993-1_17

Soumya Banerjee⁴,
Youakim Badr⁵ &
Eiman Tamah Al-Shammari⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 526))

1274 Accesses

Abstract

Since the inception of Web 2.0. the effort of socializing, interacting and referencing has been substantially enhanced.This is completely aided through the various means of social network expansions like blogging, public chat rooms and social networking sites such as Facebook, Twitter etc. Behavior on these websites leaves an electronic trail of social activity which can be analyzed and valuable information can be discerned. The development of such analysis has become phenomenal to foster psychological analysis, behavioral modeling and even commercializing the business activities under those paradigms itself. Therefore, micro-blogging service Tweeter recently has gained much interest to social network community with the trend of its Follower/Following Relationship, Mentions, trends, retweet, Twitter Lists etc. and the result of such impact could be realized while investigating diversified tweet clusters under the same community and under the same relevant discussion of topic. This chapter initiates a novel idea to analyze the random tweet cluster and its relevant trend through computational intelligence e.g. through Standard Fuzzy C Means clustering. The idea solicits and introduces a better method of clustering with more number of actually found dynamic clusters. Results have been evaluated with broader implication of analysis and research in futuristic Tweeter network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mislove, A., Gummadi, K.P., Druschel, P.: Exploiting social networks for Internet search. In: Proceedings of the 5th Workshop on Hot Topics in Networks (HotNets-V), Irvine, CA, November 2006
Google Scholar
Breiger, R.L.: The analysis of social networks. In: Hardy, M., Nryman, A. (eds.) Handbook of Data Analysis, pp. 505–526. Sage Publications, London (2010)
Google Scholar
Wu, S., Cornell University, USA, Jake, M., Hofman Yahoo! Research, NY, USA, Winter, A., Mason Yahoo! Research, NY, USA, Duncan, J., Watts Yahoo! Research, NY, USA, Who Says What to Whom on Twitter, 20th Annual World Wide Web Conference, ACM, Hyderabad, India (2011)
Google Scholar
Lasswell, H.D.: The structure and function of communication in society. In: Bryson, L. (ed.) The Communication of Ideas, pp. 117–130. University of Illinois Press, Urbana (1948)
Google Scholar
Walther, J.B., Carr, C.T., Choi, S.S.W., DeAndrea, D.C., Kim, J., Tong, S.T., Van Der Heide, B.: Interaction of interpersonal, peer, and media influence sources online. In: Papacharissi, Z. (ed.) A Networked Self: Identity, Community, and Culture on Social Network Sites, pp. 17–38. Routledge, New York (2010)
Google Scholar
Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Identifying influencers on twitter. In: Fourth ACM International Conference on Web Search and Data Mining (WSDM), Hong Kong (2011)
Google Scholar
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a Social Network or a News Media? In: Proceedings of the 19th international conference on World Wide Web, pp. 591–600. ACM, New York (2010)
Google Scholar
Newman, M.E.J., Park, J.: Why social networks are different from other types of networks. Phys. Rev. E 68(3), 036122 (2003)
Article Google Scholar
Hannak, A. et al.: Tweetin’ in the rain: Exploring societal-scale effects on mood. In: Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media (2013)
Google Scholar
Milgram, S.: The small world problem. Psychol. Today 2(1), 60–67 (1967)
Google Scholar
Cha, M., Haddadi, H., Benevenuto, F., Gummad, K.P.: Measuring user influence on twitter: the million follower fallacy. In: 4th Int’l AAAI Conference on Weblogs and Social Media, Washington (2010)
Google Scholar
Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web search and Data Mining, pp. 261–270. ACM, New York (2010)
Google Scholar
Jung, Y., Park, H., Du, D.-Z., Drake, B.L.: A decision criterion for the optimal number of clusters in hierarchical clustering. J. Bioinf. 18, S182–191 (2002)
Google Scholar
Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974)
Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Machine Intell. 1(4), 224–227 (1979)
Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Google Scholar
Bezdek, J.C. et.al.: Bezdel’s fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)
Google Scholar
Han, J., Kamber, M. : Data Mining Concepts and Techniques, 2nd edn. Elsevier, San Francisco (2006). ISBN 13: 978-1-55860-901-3
Google Scholar
Crane, R., Sornette, D.: Robust dynamic classes revealed by measuring the response function of a social system. Proc. Nat. Acad. Sci. 105(41), 15649–15653 (2008)
Google Scholar
www.klout.com accessed 2nd July 2013
De Choudhury, M., Counts, S., Horvitz, E.: Major life changes and behavioral markers in social media: case of childbirth. In: Proceedings of the 16th ACM Conference on Computer Supported Cooperative Work (San Antonio, TX, USA, February 23–27, 2013). CSCW (2013)
Google Scholar
Liu, J., Dolan, P., Pedersen, E.R.: Personalized news recommendation based on click behavior. In: Proceedings of the 15th International Conference on Intelligent User Interfaces ACM, IUI, pp. 31–40, (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Birla Institute of Technology, Mesra, India
Soumya Banerjee
National Institute of Applied Sciences (INSA-Lyon), Villeurbanne, France
Youakim Badr
College of Computing Science and Engineering, Kuwait University, Kuwait City, Kuwait
Eiman Tamah Al-Shammari

Authors

Soumya Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Youakim Badr
View author publications
You can also search for this author in PubMed Google Scholar
Eiman Tamah Al-Shammari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soumya Banerjee .

Editor information

Editors and Affiliations

Electrical & Computer Engineering, University of Alberta, Edmonton, Alberta, Canada
Witold Pedrycz
Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
Shyi-Ming Chen

Appendices

Appendix I: OAuth

Twitter uses the open authentication standard OAuth for authentication. It is an authentication protocol that allows users to approve application to act on their behalf without sharing their password. Before OAuth, basic authentication was used which required that each request be accompanied with the user name and password combination of a user. This was a security issue in that the application had to store the user name and password for the users which is not safe from a users viewpoint. OAuth eliminates this need by introducing the concept of access tokens. Each application which wants to use OAuth feature must first register itself with twitter, whereupon it will be provided with a unique consumer token which is a token and secret pair. Users are authenticated in the following way:

1.
The consumer (application) requests a request token from Twitter using the consumer token etc.
2.
When the service provider (Twitter) grants the request token, the consumer directs the user to the service provider.
3.
The user is authorized by the service provider (in case of twitter logging in followed by granting rights to the consumer/application) and directed back to the consumer.
4.
The consumer then requests for the ‘access token’ for that user by exchanging the request token for an access token.
5.
The service provider then supplies the consumer with the access token which consists of a token and secret pair.
6.
The consumer can then access protected resources by including the access token with each request made on the behalf of the user.

This process is secure as the password if the user is not provided to the application; instead the consumer stores access tokens for the various authenticated users. Every request to the API the application makes is signed using the access token key/secret pair. Access tokens currently do not expire. The user has the power to revoke the access of the application, in which case the access token for that particular will no longer be valid. Also when Twitter admin suspends the application then the access token will become invalid Fig. 7.

This diagram illustrates further:

Appendix II: Twitter4j

Twitter4j is the java library for the Twitter API in this project. It is open sourced and created by Yusuke Yamamoto. It provides an easy integration of the Java applications with the Twitter service. Some of its features include—built-in OAuth support, zero dependency: no additional jars required, Android platform etc. Twitter4J is thread-safe and method calls can be made concurrently. It has methods which allow us to use all the three API of the Twitter API namely REST API, SEARCH API and STREAMING API. Twitter4j has methods to access the various resources given below:

1.
Timeline Resources—getting the timeline (stream of tweets posted by the user and his/her friends and retweets) of a user, getting mentions etc.
2.
Tweet Resources—showing status (latest tweet) of a user, updating the status of a user, retweeting a status etc.
3.
User Resources—getting information about a particular user which includes user id, screen name, date of creation of the user, follower count, following count, statuses count, checking if the user is a verified user or protected user etc., searching users etc.
4.
Friends and Followers Resources—getting user ids of the followers and friends of a particular user.
5.
Friendship Resources creating/destroying friendship between two users, checking whether a friendship exists between a pair of users etc.
6.
Account Resources getting account settings, verifying credentials of a user, getting rate limit status for a particular user etc.
7.
Trends Resources getting current trends, daily trends etc.
8.
Search API Resources the above resources accessed the REST API, the search API allows us to search for tweets based on various parameters such as presence of a keyword or a particular hashtag, by a particular user, mentioning a user, since a date, tweets until some date, around a location, retweets, containing URLs etc.

Twiiter4j provides few other resources not mentioned here.

Almost all the methods of Twitter4j are rate limited i.e. there is limit on how many methods calls for that method can be made during a time period. The rate limit is 350 requests/hour, and is different for the search API.

Almost all the methods require authentication i.e. the user making the request must be authenticated by Twitter or the application making requests on behalf of a user must do so for an authenticated user. There is no authentication required in the Search API.

The stream API methods provides for accessing live streaming tweets.

Appendix III: Tracing a Tweets Path

From a given point/node/user a tweets, which has been retweeted at least once, path can be traced. It can found out from where it reached (came) the given user and till where it travelled (went). Associated with each tweet (status) in the Twitter API is its unique tweet id, user id/screen name of the tweeting user, date of creation, retweeted count, and if it is a retweeted status (i.e. a retweet) then information about the original tweet id and the original twitter user who tweeted that status (i.e. introduced the content for the 1st time) can also be retrieved. So by storing these attributes of tweets for every tweet downloaded can help us in determining the path the tweet has travelled.

Only tweets with non-zero retweet count are taken under consideration in the tweet tracing, as a tweet with a retweet count of zero has only travelled to all the followers of the tweeting user and tracing such tweets path (having path length 1 only) is of no significance. The retweets of a tweet are the ones which are of importance as they can help us in analysing network activity surrounding a user. An analysis of the flow of all the tweets (excluding statuses with zero retweet count) of a particular user can be useful in identifying the various popular (frequent) routes.

In order to trace a tweets path, we need to download and store the tweets of the given user and also the information about the connected users and their tweets. The complexity of this method can be reduced by noting the fact that a retweet will only go to a follower and if that follower retweets the tweet (may already be a retweet) then it will reach his/her followers and so on. So there is only a need to store information (user and his/her tweets) about a followers followers and so on ignoring the friends of such followers. On the other hand, when a user is either a friend (following) or both a friend and follower of the given user, then the retweets (or simply tweets) that the given user receives are from such users only who, in turn, may have received the tweet (or a retweet) from similar users (i.e. who are either friends or both friends and followers) and so on.

Such reduction in the directed twitter network converts it or makes it look like a river like structure with many tributaries (the network can also be visualized as consisting of unidirectional pipelines in which tweets flow). The extent of ‘flow’ of a tweet in such tributaries or pipelines depends upon the retweeting of the tweet by users in the way. Retweeting in turn depends upon the content of the tweet i.e. better the content of a tweet better are the chances of it being retweeted. Popularity of the original user who tweeted the status and the popularity of the subsequent followers are also factors that affect the extent to which a tweet is retweeted.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Banerjee, S., Badr, Y., Al-Shammari, E.T. (2014). Analyzing Tweet Cluster Using Standard Fuzzy C Means Clustering. In: Pedrycz, W., Chen, SM. (eds) Social Networks: A Framework of Computational Intelligence. Studies in Computational Intelligence, vol 526. Springer, Cham. https://doi.org/10.1007/978-3-319-02993-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-02993-1_17
Published: 10 December 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02992-4
Online ISBN: 978-3-319-02993-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics