Skip to main content

Privacy Risk Assessment of Individual Psychometric Profiles

  • Conference paper
  • First Online:
Discovery Science (DS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12986))

Included in the following conference series:

  • 1488 Accesses

Abstract

In the modern Internet era the usage of social media such as Twitter and Facebook is constantly increasing. These social media are accumulating a lot of textual data, because individuals often use them for sharing their experiences and personal facts writing text messages. These data hide individual psychological aspects that might represent a valuable alternative source with respect to the classical clinical texts. In many studies, text messages are used to extract individuals psychometric profiles that help in analysing the psychological behaviour of users. Unfortunately, both text messages and psychometric profiles may reveal personal and sensitive information about users, leading to privacy violations. Therefore, in this paper, we propose a study of privacy risk for psychometric profiles: we empirically analyse the privacy risk of different aspects of the psychometric profiles, identifying which psychological facts expose users to an identity disclosure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The implementation of these attacks, written in Python 3.7, is available on Github https://github.com/karjudev/text-privacy. For conducting the experiments we used a server with 16x Intel(R) Xeon(R) Gold 5120 CPU @ 2.20 GHz (64 bits), 63 gb RAM.

  2. 2.

    Composed of 517, 401 messages from 158 different authors.

  3. 3.

    482,117 messages from 20,192 authors.

  4. 4.

    230,571 messages from 20,192 authors.

  5. 5.

    176,243 messages from 6,410 authors.

  6. 6.

    176,207 messages from 6,410 authors.

  7. 7.

    The number is obtained using the Sturges formula [22].

  8. 8.

    Colloquial terms are: “mom” or “dad” and “mate” or “buddy”.

  9. 9.

    Words like “think”, “know”, “always”, “never” and “should”.

References

  1. Abul, O., Bonchi, F., Nanni, M.: Anonymization of moving objects databases by clustering and perturbation. Inf. Syst. 35, 884–910 (2010)

    Article  Google Scholar 

  2. Anandan, B., Clifton, C.: Significance of term relationships on anonymization. In: Web Intelligence/IAT Workshops (2011)

    Google Scholar 

  3. Chakaravarthy, V.T., Gupta, H., Roy, P., Mohania, M.K.: Efficient techniques for document sanitization. In: CIKM (2008)

    Google Scholar 

  4. Choudhury, M., Counts, S., Horvitz, E.: Predicting postpartum changes in emotion and behavior via social media. In: Conference on Human Factors in Computing Systems - Proceedings (2013)

    Google Scholar 

  5. Crossley, S., Kyle, K., McNamara, D.: Sentiment analysis and social cognition engine (seance): an automatic tool for sentiment, social cognition, and social-order analysis. Behav. Res. Methods 49, 803–821 (2017)

    Article  Google Scholar 

  6. Cumby, C.M., Ghani, R.: A machine learning based system for semi-automatically redacting documents. In: IAAI (2011)

    Google Scholar 

  7. Deng, M., et al.: A privacy threat analysis framework: supporting the elicitation and fulfillment of privacy requirements. Requir. Eng. 16, 3–32 (2011)

    Article  Google Scholar 

  8. Klimt, B., Yang, Y.: Introducing the enron corpus. In: CEAS (2004)

    Google Scholar 

  9. Li, Y., Baldwin, T., Cohn, T.: Towards robust and privacy-preserving text representations. In: ACL, no. 2 (2018)

    Google Scholar 

  10. Pellungrini, R., Monreale, A., Guidotti, R.: Privacy risk for individual basket patterns. In: MIDAS/PAP@PKDD/ECML (2018)

    Google Scholar 

  11. Pellungrini, R., Pappalardo, L., Pratesi, F., Monreale, A.: Analyzing privacy risk in human mobility data. In: STAF Workshops (2018)

    Google Scholar 

  12. Pellungrini, R., Pratesi, F., Pappalardo, L.: Assessing privacy risk in retail data. In: PAP@PKDD/ECML (2017)

    Google Scholar 

  13. Pennebaker, J.W., Boyd, R.L., Jordan, K., Blackburn, K.: The development and psychometric properties of liwc2015. Technical report (2015)

    Google Scholar 

  14. Pensa, R.G., di Blasi, G.: A semi-supervised approach to measuring user privacy in online social networks. In: DS (2016)

    Google Scholar 

  15. del Pilar Salas-Zárate, M., et al.: A study on LIWC categories for opinion mining in spanish reviews. J. Inf. Sci. 40, 749–760 (2014)

    Article  Google Scholar 

  16. Pratesi, F., Gabrielli, L., Cintia, P., Monreale, A., Giannotti, F.: PRIMULE: privacy risk mitigation for user profiles. Data Knowl. Eng. 125, 101786 (2020)

    Article  Google Scholar 

  17. Pratesi, F., Monreale, A., Giannotti, F., Pedreschi, D.: Privacy preserving multidimensional profiling. In: GOODTECHS (2017)

    Google Scholar 

  18. Pratesi, F., et al.: Prudence: a system for assessing privacy risk vs utility in data sharing ecosystems. Trans. Data Priv. (2018)

    Google Scholar 

  19. Sánchez, D., Batet, M.: Toward sensitive document release with privacy guarantees. Eng. Appl. Artif. Intell. 59, 23–34 (2017)

    Article  Google Scholar 

  20. Shen, J.H., Rudzicz, F.: Detecting anxiety through reddit. In: Proceedings of the Fourth Workshop on Computer Linguistics and Clinical Psychology-From Linguistic Signal to Clinical Reality (2017)

    Google Scholar 

  21. Shrestha, A., Spezzano, F., Joy, A.: Detecting fake news spreaders in social networks via linguistic and personality features. In: CLEF (Working Notes) (2020)

    Google Scholar 

  22. Sturges, H.A.: The choice of a class interval. J. Am. Stat. Assoc. 21, 65–66 (1926)

    Article  Google Scholar 

  23. Tadesse, M.M., Lin, H., Xu, B., Yang, L.: Detection of depression-related posts in reddit social media forum. IEEE Access 7, 44883–44893 (2019)

    Article  Google Scholar 

  24. Xiao, Y., Xiong, L.: Protecting locations with differential privacy under temporal correlations. In: CCS (2015)

    Google Scholar 

Download references

Acknowledgment

This work is partially supported by the European Community H2020 programme under the funding schemes: H2020-INFRAIA-2019-1: Research Infrastructure G.A. 871042 SoBigData++ (sobigdata.eu), G.A. 952215 TAILOR, G.A. 952026 Humane AI NET (humane-ai.eu).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Monreale .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mariani, G., Monreale, A., Naretto, F. (2021). Privacy Risk Assessment of Individual Psychometric Profiles. In: Soares, C., Torgo, L. (eds) Discovery Science. DS 2021. Lecture Notes in Computer Science(), vol 12986. Springer, Cham. https://doi.org/10.1007/978-3-030-88942-5_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88942-5_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88941-8

  • Online ISBN: 978-3-030-88942-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics