Abstract
In the modern Internet era the usage of social media such as Twitter and Facebook is constantly increasing. These social media are accumulating a lot of textual data, because individuals often use them for sharing their experiences and personal facts writing text messages. These data hide individual psychological aspects that might represent a valuable alternative source with respect to the classical clinical texts. In many studies, text messages are used to extract individuals psychometric profiles that help in analysing the psychological behaviour of users. Unfortunately, both text messages and psychometric profiles may reveal personal and sensitive information about users, leading to privacy violations. Therefore, in this paper, we propose a study of privacy risk for psychometric profiles: we empirically analyse the privacy risk of different aspects of the psychometric profiles, identifying which psychological facts expose users to an identity disclosure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The implementation of these attacks, written in Python 3.7, is available on Github https://github.com/karjudev/text-privacy. For conducting the experiments we used a server with 16x Intel(R) Xeon(R) Gold 5120 CPU @ 2.20 GHz (64 bits), 63 gb RAM.
- 2.
Composed of 517, 401 messages from 158 different authors.
- 3.
482,117 messages from 20,192 authors.
- 4.
230,571 messages from 20,192 authors.
- 5.
176,243 messages from 6,410 authors.
- 6.
176,207 messages from 6,410 authors.
- 7.
The number is obtained using the Sturges formula [22].
- 8.
Colloquial terms are: “mom” or “dad” and “mate” or “buddy”.
- 9.
Words like “think”, “know”, “always”, “never” and “should”.
References
Abul, O., Bonchi, F., Nanni, M.: Anonymization of moving objects databases by clustering and perturbation. Inf. Syst. 35, 884–910 (2010)
Anandan, B., Clifton, C.: Significance of term relationships on anonymization. In: Web Intelligence/IAT Workshops (2011)
Chakaravarthy, V.T., Gupta, H., Roy, P., Mohania, M.K.: Efficient techniques for document sanitization. In: CIKM (2008)
Choudhury, M., Counts, S., Horvitz, E.: Predicting postpartum changes in emotion and behavior via social media. In: Conference on Human Factors in Computing Systems - Proceedings (2013)
Crossley, S., Kyle, K., McNamara, D.: Sentiment analysis and social cognition engine (seance): an automatic tool for sentiment, social cognition, and social-order analysis. Behav. Res. Methods 49, 803–821 (2017)
Cumby, C.M., Ghani, R.: A machine learning based system for semi-automatically redacting documents. In: IAAI (2011)
Deng, M., et al.: A privacy threat analysis framework: supporting the elicitation and fulfillment of privacy requirements. Requir. Eng. 16, 3–32 (2011)
Klimt, B., Yang, Y.: Introducing the enron corpus. In: CEAS (2004)
Li, Y., Baldwin, T., Cohn, T.: Towards robust and privacy-preserving text representations. In: ACL, no. 2 (2018)
Pellungrini, R., Monreale, A., Guidotti, R.: Privacy risk for individual basket patterns. In: MIDAS/PAP@PKDD/ECML (2018)
Pellungrini, R., Pappalardo, L., Pratesi, F., Monreale, A.: Analyzing privacy risk in human mobility data. In: STAF Workshops (2018)
Pellungrini, R., Pratesi, F., Pappalardo, L.: Assessing privacy risk in retail data. In: PAP@PKDD/ECML (2017)
Pennebaker, J.W., Boyd, R.L., Jordan, K., Blackburn, K.: The development and psychometric properties of liwc2015. Technical report (2015)
Pensa, R.G., di Blasi, G.: A semi-supervised approach to measuring user privacy in online social networks. In: DS (2016)
del Pilar Salas-Zárate, M., et al.: A study on LIWC categories for opinion mining in spanish reviews. J. Inf. Sci. 40, 749–760 (2014)
Pratesi, F., Gabrielli, L., Cintia, P., Monreale, A., Giannotti, F.: PRIMULE: privacy risk mitigation for user profiles. Data Knowl. Eng. 125, 101786 (2020)
Pratesi, F., Monreale, A., Giannotti, F., Pedreschi, D.: Privacy preserving multidimensional profiling. In: GOODTECHS (2017)
Pratesi, F., et al.: Prudence: a system for assessing privacy risk vs utility in data sharing ecosystems. Trans. Data Priv. (2018)
Sánchez, D., Batet, M.: Toward sensitive document release with privacy guarantees. Eng. Appl. Artif. Intell. 59, 23–34 (2017)
Shen, J.H., Rudzicz, F.: Detecting anxiety through reddit. In: Proceedings of the Fourth Workshop on Computer Linguistics and Clinical Psychology-From Linguistic Signal to Clinical Reality (2017)
Shrestha, A., Spezzano, F., Joy, A.: Detecting fake news spreaders in social networks via linguistic and personality features. In: CLEF (Working Notes) (2020)
Sturges, H.A.: The choice of a class interval. J. Am. Stat. Assoc. 21, 65–66 (1926)
Tadesse, M.M., Lin, H., Xu, B., Yang, L.: Detection of depression-related posts in reddit social media forum. IEEE Access 7, 44883–44893 (2019)
Xiao, Y., Xiong, L.: Protecting locations with differential privacy under temporal correlations. In: CCS (2015)
Acknowledgment
This work is partially supported by the European Community H2020 programme under the funding schemes: H2020-INFRAIA-2019-1: Research Infrastructure G.A. 871042 SoBigData++ (sobigdata.eu), G.A. 952215 TAILOR, G.A. 952026 Humane AI NET (humane-ai.eu).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mariani, G., Monreale, A., Naretto, F. (2021). Privacy Risk Assessment of Individual Psychometric Profiles. In: Soares, C., Torgo, L. (eds) Discovery Science. DS 2021. Lecture Notes in Computer Science(), vol 12986. Springer, Cham. https://doi.org/10.1007/978-3-030-88942-5_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-88942-5_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88941-8
Online ISBN: 978-3-030-88942-5
eBook Packages: Computer ScienceComputer Science (R0)