Abstract
On most online social media sites today, user-generated data remains accessible to allowed viewers unless and until the data owner changes her privacy preferences. In this paper, we present a large-scale measurement study focused on understanding how users control the longitudinal exposure of their publicly shared data on social media sites. Our study, using data from Twitter, finds that a significant fraction of users withdraw a surprisingly large percentage of old publicly shared data—more than 28% of 6-year old public posts (tweets) on Twitter are not accessible today. The inaccessible tweets are either selectively deleted by users or withdrawn by users when they delete or make their accounts private. We also found a significant problem with the current exposure control mechanisms—even when a user deletes her tweets or her account, the current mechanisms leave traces of residual activity, i.e., tweets from other users sent as replies to those deleted tweets or accounts still remain accessible. We show that using this residual information one can recover significant information about the deleted tweets or even characteristics of the deleted accounts. To the best of our knowledge, we are the first to study the information leakage resulting from residual activities of deleted tweets and accounts. Finally, we propose two exposure control mechanisms that eliminates information leakage via residual activities. One of our mechanisms optimize for allowing meaningful social interactions with user posts and another mechanism aims to control longitudinal exposure via anonymization . We discuss the merits and drawbacks of our proposed mechanisms compared to existing mechanisms.
Similar content being viewed by others
Notes
This study was conducted respecting the guidelines set by our institute’s ethics board and with their explicit knowledge and permission.
We observed that Twitter provides a tweet in their random sample nearly instantaneously (within seconds) after a user posts the tweet. Consequently, there is at most a minimal chance that a user deleted a tweet even before it could appear in our random sample.
We only considered original tweets (and not retweets) during sampling since our goal is to understand how much of the tweets originally posted by users are withdrawn today.
We obtained the country of our users by leveraging location data of Twitter users gathered by Kulshrestha et al. [23]. They used the location and timezone field of the Twitter profile for inferring location of users.
We use a list of English stopwords and a list of Twitter-specific stopwords from [27].
For an interested reader to check the resemblance in meaning between the guessed and original tweets, we put our complete AMT evaluation result at http://twitter-app.mpi-sws.org/soups2016/amt_guess.html.
For example, Twitter today automatically deletes re-tweets of a deleted tweet, but not replies or mentions generated by other users.
References
Jenkins Jr., H.W.: Google and the search for the future. http://www.wsj.com/articles/SB10001424052748704901104575423294099527212 (2010)
Ayalon, O., Toch, E.: Retrospective privacy: managing longitudinal privacy in online social networks. In: Proceedings of the 9th Symposium on Usable Privacy and Security (SOUPS ’13) (2013)
Bauer, L., Cranor, L.F., Komanduri, S., Mazurek, M.L., Reiter, M.K., Sleeper, M., Ur, B.: The post anachronism: the temporal dimension of Facebook privacy. In: Proceedings of the 12th ACM Workshop on Privacy in the Electronic Society (WPES’13) (2013)
Dey, R., Jelveh, Z., Ross, K.W.: Facebook users have become much more private: a large-scale study. In: Proceedings of the 10th Annual IEEE International Conference on Pervasive Computing and Communications (perCom’12) (2012)
Johnson, M., Egelman, S., Bellovin, S.M.: Facebook and privacy: it’s complicated. In: Proceedings of the 8th Symposium on Usable Privacy and Security (SOUPS’12) (2012)
Liu, Y., Gummadi, K.P., Krishnamurthy, B., Mislove, A.: Analyzing Facebook privacy settings: user expectations vs. reality. In: Proceedings of the 11th ACM/USENIX Internet Measurement Conference (IMC’11) (2011)
Stutzman, F., Gross, R., Acquisti, A.: Silent listeners: the evolution of privacy and disclosure on Facebook. J. Priv. Confid. 4(2), 7–41 (2012)
Bernstein, M.S., Bakshy, E., Burke, M., Karrer, B.: Quantifying the invisible audience in social networks. In: Proceedings of the 31st SIGCHI Conference on Human Factors in Computing Systems (CHI’13) (2013)
Besmer, A., Lipford, H.R.: Moving beyond untagging: photo privacy in a tagged world. In: Proceedings of the 28th SIGCHI Conference on Human Factors in Computing Systems (CHI’10) (2010)
Hoadley, C.M., Xu, H., Lee, J.J., Rosson, M.B.: Privacy as information access and illusory control: the case of the Facebook news feed privacy outcry. Electron. Commer. Res. Appl. 9(1), 50–60 (2010)
Madejski, M., Johnson, M., Bellovin, S.M.: The failure of online social network privacy settings. Technical Report CUCS-010-11, Department of Computer Science, Columbia University (2011)
Mazzia, A., LeFevre, K., Adar, E.: The PViz comprehension tool for social network privacy settings. In: Proceedings of the 8th Symposium on Usable Privacy and Security (SOUPS’12) (2012)
Petrovic, S., Osborne, M., Lavrenko, V.: I wish I didn’t say that! Analyzing and predicting deleted messages in Twitter. CoRR arXiv:abs/1305.3107 (2013)
Zhou, L., Wang, W., Chen, K.: Tweet properly: analyzing deleted tweets to understand and identify regrettable ones. In: Proceedings of the 25th International Conference on World Wide Web (WWW’16) (2016)
Madden, M., Lenhart, A., Cortesi, S., Gasser, U., Duggan, M., Smith, A., Beaton, M.: Teens, social media, and privacy. http://www.pewinternet.org/2013/05/21/teens-social-media-and-privacy/
Almuhimedi, H., Wilson, S., Liu, B., Sadeh, N., Acquisti, A.: Tweets are forever: a large-scale quantitative analysis of deleted tweets. In: Proceedings of the 16th Conference on Computer Supported Cooperative Work (CSCW’13) (2013)
Jain, P., Kumaraguru, P.: On the dynamics of username changing behavior on Twitter. In: Proceedings of the 3rd IKDD Conference on Data Science (CODS’16) (2016)
Liu, Y., Kliman-Silver, C., Mislove, A.: The tweets they are a-changin’: evolution of Twitter users and behavior. In: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (ICWSM’14) (2014)
Snapchat: https://www.snapchat.com/ (2016)
Mondal, M., Messias, J., Ghosh, S., Gummadi, K.P., Kate, A.: Forgetting in social media: understanding and controlling longitudinal exposure of socially shared data. In: Proceedings of the 12th Symposium on Usable Privacy and Security (SOUPS’16) (2016)
Mondal, M., Liu, Y., Viswanath, B., Gummadi, K.P., Mislove, A.: Understanding and specifying social access control lists. In: Proceedings of the 10th Symposium on Usable Privacy and Security (SOUPS’14) (2014)
Cha, M., Haddadi, H., Benevenuto, F., Gummadi, K.P.: Measuring user influence in Twitter: the million follower fallacy. In: Proceedings of the 4th AAAI Conference on Weblogs and Social Media (ICWSM’10) (2010)
Kulshrestha, J., Kooti, F., Nikravesh, A., Gummadi, K.P.: Geographic dissection of the Twitter network. In: Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM’12) (2012)
Mullen, L.: Predicting gender using historical data. https://cran.r-project.org/web/packages/gender/vignettes/predicting-gender.html (2015)
Sloan, L., Morgan, J., Burnap, P., Williams, M.: Who tweets? Deriving the demographic characteristics of age, occupation and social class from Twitter user meta-data. PLoS ONE 10(3), e0115,545 (2015)
Tufekci, Z.: Facebook, youth and privacy in networked publics. In: Proceedings of the 6th International Conference on Weblogs and Social Media (ICWSM’12) (2012)
Zafar, M.B., Bhattacharya, P., Ganguly, N., Gummadi, K.P., Ghosh, S.: Sampling content from online social networks: comparing random vs. expert sampling of the Twitter stream. ACM Trans. Web 9(3), 12:1–12:33 (2015)
Aiello, L.M., Barrat, A., Schifanella, R., Cattuto, C., Benjamin, M., Menczer, F.: Friendship prediction and homophily in social media. ACM Trans. Web 6(2), 1131–1559 (2012)
Thelwall, M.: Homophily in MySpace. J. Am. Soc. Inf. Sci. Technol. 60(2), 219–231 (2009)
Cyber Dust: https://www.cyberdust.com/ (2016)
Team, T.: The streaming APIs. https://dev.twitter.com/streaming/overview
Acknowledgements
This work is an extended version of the paper: Mondal et al. Forgetting in Social Media: Understanding and Controlling Longitudinal Exposure of Socially Shared Data, Proceedings of the 12th Symposium on Usable Privacy and Security (SOUPS’16), Denver, CO, USA, June 2016.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
Funding was provided by Max-Planck-Gesellschaft.
Rights and permissions
About this article
Cite this article
Mondal, M., Messias, J., Ghosh, S. et al. Managing longitudinal exposure of socially shared data on the Twitter social media. Int J Adv Eng Sci Appl Math 9, 238–257 (2017). https://doi.org/10.1007/s12572-017-0196-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12572-017-0196-3