Abstract
Since the inception of Web 2.0. the effort of socializing, interacting and referencing has been substantially enhanced.This is completely aided through the various means of social network expansions like blogging, public chat rooms and social networking sites such as Facebook, Twitter etc. Behavior on these websites leaves an electronic trail of social activity which can be analyzed and valuable information can be discerned. The development of such analysis has become phenomenal to foster psychological analysis, behavioral modeling and even commercializing the business activities under those paradigms itself. Therefore, micro-blogging service Tweeter recently has gained much interest to social network community with the trend of its Follower/Following Relationship, Mentions, trends, retweet, Twitter Lists etc. and the result of such impact could be realized while investigating diversified tweet clusters under the same community and under the same relevant discussion of topic. This chapter initiates a novel idea to analyze the random tweet cluster and its relevant trend through computational intelligence e.g. through Standard Fuzzy C Means clustering. The idea solicits and introduces a better method of clustering with more number of actually found dynamic clusters. Results have been evaluated with broader implication of analysis and research in futuristic Tweeter network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mislove, A., Gummadi, K.P., Druschel, P.: Exploiting social networks for Internet search. In: Proceedings of the 5th Workshop on Hot Topics in Networks (HotNets-V), Irvine, CA, November 2006
Breiger, R.L.: The analysis of social networks. In: Hardy, M., Nryman, A. (eds.) Handbook of Data Analysis, pp. 505–526. Sage Publications, London (2010)
Wu, S., Cornell University, USA, Jake, M., Hofman Yahoo! Research, NY, USA, Winter, A., Mason Yahoo! Research, NY, USA, Duncan, J., Watts Yahoo! Research, NY, USA, Who Says What to Whom on Twitter, 20th Annual World Wide Web Conference, ACM, Hyderabad, India (2011)
Lasswell, H.D.: The structure and function of communication in society. In: Bryson, L. (ed.) The Communication of Ideas, pp. 117–130. University of Illinois Press, Urbana (1948)
Walther, J.B., Carr, C.T., Choi, S.S.W., DeAndrea, D.C., Kim, J., Tong, S.T., Van Der Heide, B.: Interaction of interpersonal, peer, and media influence sources online. In: Papacharissi, Z. (ed.) A Networked Self: Identity, Community, and Culture on Social Network Sites, pp. 17–38. Routledge, New York (2010)
Bakshy, E., Hofman, J.M., Mason, W.A., Watts, D.J.: Identifying influencers on twitter. In: Fourth ACM International Conference on Web Search and Data Mining (WSDM), Hong Kong (2011)
Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a Social Network or a News Media? In: Proceedings of the 19th international conference on World Wide Web, pp. 591–600. ACM, New York (2010)
Newman, M.E.J., Park, J.: Why social networks are different from other types of networks. Phys. Rev. E 68(3), 036122 (2003)
Hannak, A. et al.: Tweetin’ in the rain: Exploring societal-scale effects on mood. In: Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media (2013)
Milgram, S.: The small world problem. Psychol. Today 2(1), 60–67 (1967)
Cha, M., Haddadi, H., Benevenuto, F., Gummad, K.P.: Measuring user influence on twitter: the million follower fallacy. In: 4th Int’l AAAI Conference on Weblogs and Social Media, Washington (2010)
Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web search and Data Mining, pp. 261–270. ACM, New York (2010)
Jung, Y., Park, H., Du, D.-Z., Drake, B.L.: A decision criterion for the optimal number of clusters in hierarchical clustering. J. Bioinf. 18, S182–191 (2002)
Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Machine Intell. 1(4), 224–227 (1979)
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Bezdek, J.C. et.al.: Bezdel’s fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)
Han, J., Kamber, M. : Data Mining Concepts and Techniques, 2nd edn. Elsevier, San Francisco (2006). ISBN 13: 978-1-55860-901-3
Crane, R., Sornette, D.: Robust dynamic classes revealed by measuring the response function of a social system. Proc. Nat. Acad. Sci. 105(41), 15649–15653 (2008)
www.klout.com accessed 2nd July 2013
De Choudhury, M., Counts, S., Horvitz, E.: Major life changes and behavioral markers in social media: case of childbirth. In: Proceedings of the 16th ACM Conference on Computer Supported Cooperative Work (San Antonio, TX, USA, February 23–27, 2013). CSCW (2013)
Liu, J., Dolan, P., Pedersen, E.R.: Personalized news recommendation based on click behavior. In: Proceedings of the 15th International Conference on Intelligent User Interfaces ACM, IUI, pp. 31–40, (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix I: OAuth
Twitter uses the open authentication standard OAuth for authentication. It is an authentication protocol that allows users to approve application to act on their behalf without sharing their password. Before OAuth, basic authentication was used which required that each request be accompanied with the user name and password combination of a user. This was a security issue in that the application had to store the user name and password for the users which is not safe from a users viewpoint. OAuth eliminates this need by introducing the concept of access tokens. Each application which wants to use OAuth feature must first register itself with twitter, whereupon it will be provided with a unique consumer token which is a token and secret pair. Users are authenticated in the following way:
-
1.
The consumer (application) requests a request token from Twitter using the consumer token etc.
-
2.
When the service provider (Twitter) grants the request token, the consumer directs the user to the service provider.
-
3.
The user is authorized by the service provider (in case of twitter logging in followed by granting rights to the consumer/application) and directed back to the consumer.
-
4.
The consumer then requests for the ‘access token’ for that user by exchanging the request token for an access token.
-
5.
The service provider then supplies the consumer with the access token which consists of a token and secret pair.
-
6.
The consumer can then access protected resources by including the access token with each request made on the behalf of the user.
This process is secure as the password if the user is not provided to the application; instead the consumer stores access tokens for the various authenticated users. Every request to the API the application makes is signed using the access token key/secret pair. Access tokens currently do not expire. The user has the power to revoke the access of the application, in which case the access token for that particular will no longer be valid. Also when Twitter admin suspends the application then the access token will become invalid Fig. 7.
This diagram illustrates further:
Appendix II: Twitter4j
Twitter4j is the java library for the Twitter API in this project. It is open sourced and created by Yusuke Yamamoto. It provides an easy integration of the Java applications with the Twitter service. Some of its features include—built-in OAuth support, zero dependency: no additional jars required, Android platform etc. Twitter4J is thread-safe and method calls can be made concurrently. It has methods which allow us to use all the three API of the Twitter API namely REST API, SEARCH API and STREAMING API. Twitter4j has methods to access the various resources given below:
-
1.
Timeline Resources—getting the timeline (stream of tweets posted by the user and his/her friends and retweets) of a user, getting mentions etc.
-
2.
Tweet Resources—showing status (latest tweet) of a user, updating the status of a user, retweeting a status etc.
-
3.
User Resources—getting information about a particular user which includes user id, screen name, date of creation of the user, follower count, following count, statuses count, checking if the user is a verified user or protected user etc., searching users etc.
-
4.
Friends and Followers Resources—getting user ids of the followers and friends of a particular user.
-
5.
Friendship Resources creating/destroying friendship between two users, checking whether a friendship exists between a pair of users etc.
-
6.
Account Resources getting account settings, verifying credentials of a user, getting rate limit status for a particular user etc.
-
7.
Trends Resources getting current trends, daily trends etc.
-
8.
Search API Resources the above resources accessed the REST API, the search API allows us to search for tweets based on various parameters such as presence of a keyword or a particular hashtag, by a particular user, mentioning a user, since a date, tweets until some date, around a location, retweets, containing URLs etc.
Twiiter4j provides few other resources not mentioned here.
Almost all the methods of Twitter4j are rate limited i.e. there is limit on how many methods calls for that method can be made during a time period. The rate limit is 350 requests/hour, and is different for the search API.
Almost all the methods require authentication i.e. the user making the request must be authenticated by Twitter or the application making requests on behalf of a user must do so for an authenticated user. There is no authentication required in the Search API.
The stream API methods provides for accessing live streaming tweets.
Appendix III: Tracing a Tweets Path
From a given point/node/user a tweets, which has been retweeted at least once, path can be traced. It can found out from where it reached (came) the given user and till where it travelled (went). Associated with each tweet (status) in the Twitter API is its unique tweet id, user id/screen name of the tweeting user, date of creation, retweeted count, and if it is a retweeted status (i.e. a retweet) then information about the original tweet id and the original twitter user who tweeted that status (i.e. introduced the content for the 1st time) can also be retrieved. So by storing these attributes of tweets for every tweet downloaded can help us in determining the path the tweet has travelled.
Only tweets with non-zero retweet count are taken under consideration in the tweet tracing, as a tweet with a retweet count of zero has only travelled to all the followers of the tweeting user and tracing such tweets path (having path length 1 only) is of no significance. The retweets of a tweet are the ones which are of importance as they can help us in analysing network activity surrounding a user. An analysis of the flow of all the tweets (excluding statuses with zero retweet count) of a particular user can be useful in identifying the various popular (frequent) routes.
In order to trace a tweets path, we need to download and store the tweets of the given user and also the information about the connected users and their tweets. The complexity of this method can be reduced by noting the fact that a retweet will only go to a follower and if that follower retweets the tweet (may already be a retweet) then it will reach his/her followers and so on. So there is only a need to store information (user and his/her tweets) about a followers followers and so on ignoring the friends of such followers. On the other hand, when a user is either a friend (following) or both a friend and follower of the given user, then the retweets (or simply tweets) that the given user receives are from such users only who, in turn, may have received the tweet (or a retweet) from similar users (i.e. who are either friends or both friends and followers) and so on.
Such reduction in the directed twitter network converts it or makes it look like a river like structure with many tributaries (the network can also be visualized as consisting of unidirectional pipelines in which tweets flow). The extent of ‘flow’ of a tweet in such tributaries or pipelines depends upon the retweeting of the tweet by users in the way. Retweeting in turn depends upon the content of the tweet i.e. better the content of a tweet better are the chances of it being retweeted. Popularity of the original user who tweeted the status and the popularity of the subsequent followers are also factors that affect the extent to which a tweet is retweeted.
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Banerjee, S., Badr, Y., Al-Shammari, E.T. (2014). Analyzing Tweet Cluster Using Standard Fuzzy C Means Clustering. In: Pedrycz, W., Chen, SM. (eds) Social Networks: A Framework of Computational Intelligence. Studies in Computational Intelligence, vol 526. Springer, Cham. https://doi.org/10.1007/978-3-319-02993-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-02993-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02992-4
Online ISBN: 978-3-319-02993-1
eBook Packages: EngineeringEngineering (R0)