Skip to main content

Assessment of Word Embedding Techniques for Identification of Personal Experience Tweets Pertaining to Medication Uses

  • Chapter
  • First Online:
Precision Health and Medicine (W3PHAI 2019)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 843))

Included in the following conference series:

Abstract

Twitter, a general purpose social media service, has seen growing interests as an active data source for possible use of post-market surveillance of medicinal products. Being able to identify Twitter posts of personal experience related to medication use is as important as being able to identify expressions of adverse medical events/reactions for the surveillance purpose. Identifying personal experience tweets is a challenging task, especially in the aspect of engineering features for classification. Word embedding has become a superior alternative to engineered features in many text classification applications. To investigate if word embedding-based methods can perform constantly better than conventional classification methods with engineered features, we assessed the classification performance of 4 word embedding techniques: GloVe, word2vec, fastText, and wordRank. Using a corpus of 22 million unlabeled tweets for learning of word embedding and a corpus of 12,331 annotated tweets for classification, we discovered that word embedding-based classification methods consistently outperform the engineered feature-based classification methods with statistical significance of pā€‰<ā€‰0.01, but there exist no significantly statistical differences among the 4 study word embedding methods (pā€‰<ā€‰0.05).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://developer.twitter.com/en/docs/tweets/filter-realtime/overview.

  2. 2.

    https://github.com/medeffects/tweet_corpora.

  3. 3.

    http://scikit-learn.org.

  4. 4.

    https://www.tensorflow.org/.

  5. 5.

    https://keras.io/.

References

  1. Alvaro, N., Conway, M., Doan, S., Lofi, C., Overington, J., Collier, N.: Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use. J. Biomed. Inform. 58, 280ā€“287 (2015)

    ArticleĀ  Google ScholarĀ 

  2. Baroni, M., Dinu, G., Kruszewski, G.: Donā€™t count, predict! A systematic comparison of context-counting versus context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 238ā€“247 (2014) (Volume 1: Long Papers)

    Google ScholarĀ 

  3. Bian, J., Topaloglu, U., Yu, F.: Towards large-scale twitter mining for drug-related adverse events. In: Proceedings of the 2012 International Workshop on Smart Health and Wellbeing, pp. 25ā€“32. ACM (2012)

    Google ScholarĀ 

  4. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T. Enriching word vectors with subword information (2016). arXiv:1607.04606

  5. Calix, R.A., Gupta, R., Gupta, M., Jiang, K.: Deep gramulator: Improving precision in the classification of personal health-experience tweets with deep learning. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1154ā€“1159. IEEE (2017)

    Google ScholarĀ 

  6. Cocos, A., Fiks, A.G., Masino, A.J.: Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts. J. Am. Med. Inform. Assoc. 24(4), 813ā€“821 (2017)

    ArticleĀ  Google ScholarĀ 

  7. Eshleman, R., Singh, R.: Leveraging graph topology and semantic context for pharmacovigilance through twitter-streams. BMC Bioinform. 17(13), 335 (2016)

    ArticleĀ  Google ScholarĀ 

  8. Freifeld, C.C., Brownstein, J.S., Menone, C.M., Bao, W., Filice, R., Kass-Hout, T., Dasgupta, N.: Digital drug safety surveillance: monitoring pharmaceutical products in twitter. Drug Saf. 37(5), 343ā€“350 (2014)

    ArticleĀ  Google ScholarĀ 

  9. Hazell, L., Shakir, S.A.: Under-reporting of adverse drug reactions. Drug Saf. 29(5), 385ā€“396 (2006)

    ArticleĀ  Google ScholarĀ 

  10. Ji, S., Yun, H., Yanardag, P., Matsushima, S., Vishwanathan, S.V.N.: WordRank: Learning Word Embeddings via Ro-bust Ranking. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 658ā€“668 (2016)

    Google ScholarĀ 

  11. Jiang, K., Calix, R., Gupta, M.: Construction of a personal experience tweet Corpus for health surveillance. In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing, pp. 128ā€“135 (2016)

    Google ScholarĀ 

  12. Jiang, K., Zheng, Y.: Mining twitter data for potential drug effects. In: International Conference on Advanced Data Mining and Applications, pp. 434ā€“443. Springer, Berlin (2013)

    ChapterĀ  Google ScholarĀ 

  13. Jiang, K., Chen, T., Calix, R.A., Bernard, G.R.: Identifying consumer health terms of side effects in twitter posts. Stud. Health Technol. Inform. 251, 273 (2018)

    Google ScholarĀ 

  14. Jiang, K., Feng, S., Song, Q., Calix, R.A., Gupta, M., Bernard, G.R.: Identifying tweets of personal health experience through word embedding and LSTM neural network. BMC Bioinform. 19(8), 210 (2018)

    ArticleĀ  Google ScholarĀ 

  15. Koutkias, V.G., Lillo-Le LouĆ«t, A., Jaulent, M.C.: Exploiting heterogeneous publicly available data sources for drug safety surveillance: computational framework and case studies. Expert. Opin. Drug Saf. 16(2), 113ā€“124 (2017)

    ArticleĀ  Google ScholarĀ 

  16. Lardon, J., Bellet, F., Aboukhamis, R., Asfari, H., Souvignet, J., Jaulent, M.C., Beyens, M., Lillo-LeLouĆ«t, A., Bousquet, C.: Evaluating Twitter as a complementary data source for pharmacovigilance. Expert. Opin. Drug Saf. 17(8), 763ā€“774 (2018)

    ArticleĀ  Google ScholarĀ 

  17. Leaman, R., Wojtulewicz, L., Sullivan, R., Skariah, A., Yang, J., Gonzalez, G.: Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks. In: Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pp. 117ā€“125. Association for Computational Linguistics (2010)

    Google ScholarĀ 

  18. Medicines and Healthcare products Regulatory Agency: UK regulator leads innovative EU project on the use of smartphones and social media for drug safety information (2014)

    Google ScholarĀ 

  19. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In Proceedings of Workshop at ICLR (2013)

    Google ScholarĀ 

  20. Oā€™Connor, K., Pimpalkhute, P., Nikfarjam, A., Ginn, R., Smith, K. L., & Gonzalez, G.: Pharmacovigilance on twitter? Mining tweets for adverse drug reactions. In: AMIA Annual Symposium Proceedings, p. 924. American Medical Informatics Association (2014).

    Google ScholarĀ 

  21. Pennington, J., Socher, R., & Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532ā€“1543 (2014)

    Google ScholarĀ 

  22. Pierce, C.E., Bouri, K., Pamer, C., Proestel, S., Rodriguez, H.W., Van Le, H., Freifeld, C.C., Brownstein, J.S., Walderhaug, M., Edwards, I.R., Dasgupta, N.: Evaluation of facebook and twitter monitoring to detect safety signals for medical products: an analysis of recent fda safety alerts. Drug Saf. 40(4), 317ā€“331 (2017)

    ArticleĀ  Google ScholarĀ 

  23. Powell, G.E., Seifert, H.A., Reblin, T., Burstein, P.J., Blowers, J., Menius, J.A., Painter, J.L., Thomas, M., Pierce, C.E., Rodriguez, H.W., Brownstein, J.S., Freifeld, C.C., Bell, H.G., Dasgupta, N.: Social media listening for routine post-marketing safety surveillance. Drug Saf. 39(5), 443ā€“454 (2016)

    ArticleĀ  Google ScholarĀ 

  24. Wijeratne, S., Sheth, A., Bhatt, S., Balasuriya, L., Al-Olimat, H.S., Gaur, M., Yazdavar, A.H., Thirunarayan, K.: Feature Engineering for Twitter-based Applications. Feature Engineering for Machine Learning and Data Analytics, vol. 35 (2017)

    Google ScholarĀ 

Download references

Acknowledgements

Authors wish to thank anonymous reviewers in critiquing our work and providing constructive comments that improved the manuscript. Authors wish to acknowledge these individuals for their contribution to this project: Dustin Franz, Ravish Gupta for collecting the Twitter data, Alexandra Vest, Cecelia Lai, Bridget Swindell, Mary Stroud, and Matrika Gupta for annotating the tweets. This work was supported by the National Institutes of Health Grant 1R15LM011999ā€“01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keyuan Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Jiang, K., Feng, S., Calix, R.A., Bernard, G.R. (2020). Assessment of Word Embedding Techniques for Identification of Personal Experience Tweets Pertaining to Medication Uses. In: Shaban-Nejad, A., Michalowski, M. (eds) Precision Health and Medicine. W3PHAI 2019. Studies in Computational Intelligence, vol 843. Springer, Cham. https://doi.org/10.1007/978-3-030-24409-5_5

Download citation

Publish with us

Policies and ethics