Skip to main content

A Test Collection for Research on Depression and Language Use

  • Conference paper
  • First Online:
Book cover Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9822))

Abstract

Several studies in the literature have shown that the words people use are indicative of their psychological states. In particular, depression was found to be associated with distinctive linguistic patterns. However, there is a lack of publicly available data for doing research on the interaction between language and depression. In this paper, we describe our first steps to fill this gap. We outline the methodology we have adopted to build and make publicly available a test collection on depression and language use. The resulting corpus includes a series of textual interactions written by different subjects. The new collection not only encourages research on differences in language between depressed and non-depressed individuals, but also on the evolution of the language use of depressed individuals. Further, we propose a novel early detection task and define a novel effectiveness measure to systematically compare early detection algorithms. This new measure takes into account both the accuracy of the decisions taken by the algorithm and the delay in detecting positive cases. We also present baseline results with novel detection methods that process users’ interactions in different ways.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See http://www.who.int/mediacentre/factsheets/fs369/en/.

  2. 2.

    Reddit privacy policy states explicitly that the posts and comments redditors make are not private and will still be accessible after the redditor’s account is deleted. Reddit does not permit unauthorized commercial use of its contents or redistribution, except as permitted by the doctrine of fair use. This research is an example of fair use.

  3. 3.

    https://praw.readthedocs.org/en/v3.1.0/.

  4. 4.

    The number of terms per submission are counted after pre-processing the texts with the scikit-learn Python toolkit, scikit-learn.org. This was configured with no stopword processing and no vocabulary pruning based on document frequency.

  5. 5.

    http://tec.citius.usc.es/ir/code/dc.html.

  6. 6.

    We employed sklearn library, version 0.16.1, for Python. Vectorisation was done with the TfidfVectorizer–with a standard stoplist and removing terms that appear in less than 20 documents–and classification was done with the LogisticRegression class.

  7. 7.

    This strategy does not make any text analysis and, therefore, it does not make sense to wait any longer to make the decision.

  8. 8.

    Again, this strategy does not make any text analysis and, therefore, it does not make sense to wait any longer to make the decision.

References

  1. Aslam, J., Diaz, F., Ekstrand-Abueg, M., McCreadie, R., Pavlu, V., Sakai, T.: TREC temporal summarization track overview. In: Proceedings of the 23rd Text Retrieval Conference, Gaithersburg (2014)

    Google Scholar 

  2. Biega, J., Mele, I., Weikum, G.: Probabilistic prediction of privacy risks in user search histories. In: Proceedings of the First International Workshop on Privacy and Security of Big Data, PSBD 2014, pp. 29–36. ACM, New York (2014)

    Google Scholar 

  3. Choudhury, M.D., Counts, S., Horvitz, E.: Social media as a measurement tool of depression in populations. In: Davis, H.C., Halpin, H., Pentland, A., Bernstein, M., Adamic, L.A. (eds.) WebSci, pp. 47–56. ACM (2013)

    Google Scholar 

  4. Choudhury, M.D., Gamon, M., Counts, S., Horvitz, E.: Predicting depression via social media. In: Kiciman, E., Ellison, N.B., Hogan, B., Resnick, P., Soboroff, I. (eds.) ICWSM. The AAAI Press (2013)

    Google Scholar 

  5. Coppersmith, G., Dredze, M., Harman, C.: Quantifying mental health signals in Twitter. In: ACL Workshop on Computational Linguistics and Clinical Psychology (2014)

    Google Scholar 

  6. Coppersmith, G., Dredze, M., Harman, C., Hollingshead, K., Mitchell, M.: CLPsych: depression and PTSD on Twitter. In: NAACL Workshop on Computational Linguistics and Clinical Psychology (2015)

    Google Scholar 

  7. Dinakar, K., Weinstein, E., Lieberman, H., Selman, R.L.: Stacked generalization learning to analyze teenage distress. In: Adar, E., Resnick, P., Choudhury, M.D., Hogan, B., Oh, A. (eds.) ICWSM. The AAAI Press (2014)

    Google Scholar 

  8. Genkin, A., Lewis, D., Madigan, D.: Large-scale bayesian logistic regression for text categorization. Technometrics 49(3), 291–304 (2007)

    Article  MathSciNet  Google Scholar 

  9. Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University (2003)

    Google Scholar 

  10. Nallapati, R.: Discriminative models for information retrieval. In: Proceeding of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 64–71 (2004)

    Google Scholar 

  11. O’Dea, B., Wan, S., Batterham, P.J., Calear, A.L., Paris, C., Christensen, H.: Detecting suicidality on Twitter. Internet Interventions 2(2), 183–188 (2015)

    Article  Google Scholar 

  12. Park, M., Cha, C., Cha, M.: Depressive moods of users portrayed in Twitter. In: 18th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD ) Workshop on Health Informatics (HI-KDD ) (2012)

    Google Scholar 

  13. Park, M., McDonald, D.W., Cha, M.: Perception differences between the depressed and non-depressed users in Twitter. In: Kiciman, E., Ellison, N.B., Hogan, B., Resnick, P., Soboroff, I. (eds.) ICWSM. The AAAI Press (2013)

    Google Scholar 

  14. Paul, M.J., Dredze, M.: You are what you Tweet: analyzing Twitter for public health. In: Adamic, L.A., Baeza-Yates, R.A., Counts, S., (eds.) ICWSM. The AAAI Press (2011)

    Google Scholar 

  15. Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G.: Psychological aspects of natural language use: our words, our selves. Annu. Rev. Psychol. 54(1), 547–577 (2003)

    Article  Google Scholar 

  16. Saeb, S., Zhang, M., Karr, C., Schueller, S., Corden, M., Kording, K., Mohr, D.: Mobile phone sensor correlates of depressive symptom severity in daily-life behavior: an exploratory study. J. Med. Internet Res. 17(7), e175 (2015). http://www.jmir.org/2015/7/e175/

    Article  Google Scholar 

Download references

Acknowledgements

This research was funded by the Swiss National Science Foundation (project “Early risk prediction on the Internet: an evaluation corpus”, 2015). The first author also thanks the financial support obtained from “Ministerio de Economía y Competitividad” of the Goverment of Spain and FEDER Funds under the research project TIN2015-64282-R.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David E. Losada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Losada, D.E., Crestani, F. (2016). A Test Collection for Research on Depression and Language Use. In: Fuhr, N., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2016. Lecture Notes in Computer Science(), vol 9822. Springer, Cham. https://doi.org/10.1007/978-3-319-44564-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44564-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44563-2

  • Online ISBN: 978-3-319-44564-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics