Enhancing convolution-based sentiment extractor via dubbed N-gram embedding-related drug vocabulary

Abstract

Everyday patients’ narratives on social media can reveal crucial public health issues. Mining those online narratives, which remained so far unconsidered, may mirror further hidden patient health status. Deep learning-based sentiment analysis (SA) approaches broadly focus on grammar directions such as semantic direction or only center on extract sentiment words. They provide both richer representation capabilities and better performance but do not consider the related medication concepts. As a result, the inaccurate recognition of related drug entities may seriously fail to retrieve the relevant sentiment expressed, leading to a lower recall than desired. Thus, the frequent use of informal medical language, non-standard format, wrongly spelled, and abbreviation forms, as well as typos in social media messages, has to be taken into consideration. In other words, the core of efficiently quantifying the sentimental aspects for related medication texts hardly involves a degree of medical language comprehension. In this paper, we seek to improve the importance of considering related drug entities that keep appearing in new Unicode Versions, ranging from drugs’ names, disease symptoms, drug misuse to potentially adverse effects. We propose N-gram-based convolution vocabulary scheme, which is dedicated mainly to featuring text under medical setting and clarifying related sentiment at the same level. This vectorization results in highly sentiment extraction, which produces medical concept normalization under distributed dependency. This architecture’s layers are a shared neural network between the medical featuring channel and the bidirectional sentiment information detector channel. Fewer approaches are proposed in this matter, we evaluate the effectiveness and transferability of this study across five benchmarking datasets and various online medication-related posts (Twitter posts, Parkinson’s disease forum’s discussions), which were significantly better than all other baselines.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Notes

  1. 1.

    https://www.pewresearch.org/internet/2013/11/26/part-one-who-lives-with-chronic-conditions/.

  2. 2.

    http://diego.asu.edu/Publications/Drugchatter.html.

  3. 3.

    https://sites.google.com/view/pharmacovigilanceinpsychiatry/home.

  4. 4.

    https://www.drugbank.ca/releases/5-1-5/downloads/all-full-database.

  5. 5.

    https://doi.org/10.17632/rxwfb3tysd.1.

  6. 6.

    http://diego.asu.edu/Publications/ADRMine.

  7. 7.

    https://nlp.stanford.edu/data/glove.twitter.27B.zip.

  8. 8.

    http://diego.asu.edu/Publications/Drugchatter.html.

  9. 9.

    http://help.sentiment140.com/.

  10. 10.

    http://sentistrength.wlv.ac.uk/.

References

  1. Adil B, Hanane G, EL-Habib N (2017) Sentiment analysis tool for Pharmaceutical Industry and Healthcare. Transactions on Machine Learning and Artificial Intelligence, [S.l.]

  2. Araque O, Corcuera-Platas I, Sánchez-Rada JF, Iglesias CA (2017) Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications

  3. Belousov M, Dixon W, Nenadic G (2017) Using an ensemble of generalised linear and deep learning models in the SMM4H 2017 medical concept normalisation task. In CEUR Workshop Proceedings

  4. Cocos A, Fiks AG, Masino AJ (2017) Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts. J Am Med Inform Assoc 24(4):813–821

    Article  Google Scholar 

  5. Frank EB, Allen N, Young J, Kaplan A, Helms JA, Schneider RA (2007) Skeletogenesis in the swell shark Cephaloscyllium ventriosum. J Anat 210(5):542–554

    Article  Google Scholar 

  6. Garcia-Pelaez J, Rodriguez D, Medina-Molina R, Garcia-Rivas G, Jerjes-Sánchez C, Trevino V (2019) PubTerm: A web tool for organizing, annotating and curating genes, diseases, molecules and other concepts from PubMed records. Database

  7. Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N project report, Stanford

  8. Grissette H, Nfaoui EH (2019) A conditional sentiment analysis model for the embedding patient self-report experiences on social media, vol 914. Springer, Cham

    Google Scholar 

  9. Grissette H, EL-Habib N (2019) Daily Life Patients Sentiment Analysis Model Based on Well-Encoded Embedding Vocabulary for Related-Medication Text. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’19, pages 921–928, New York, NY, USA, 2019. Association for Computing Machinery

  10. Han S, Tran T, Rios A, Kavuluru R (2017) Team UKNLP: detecting ADRs, classifying medication intake messages, and normalizing ADR mentions on twitter. In CEUR Workshop Proceedings

  11. Kai S, Zhixuan Z, Hao G, Jonathan L (2018) A sentiment information Collector–Extractor architecture based neural network for sentiment analysis. Inf Sci 467:549–558

    Article  Google Scholar 

  12. Kim S, Yeganova L, John WW (2016) Meshable: searching PubMed abstracts by utilizing MeSH and MeSH-derived topical terms. Bioinformatics 32(19):3044–3046

    Article  Google Scholar 

  13. Limsopatham N, Collier N (2016) Normalising medical concepts in social media texts by learning semantic representation. In 54th annual meeting of the Association for Computational Linguistics, ACL 2016–Long Papers

  14. Mike T, Kevan B, Georgios P (2012) Sentiment strength detection for the social web. J Am Soc Inf Sci Technol 63(1):163–173

    Article  Google Scholar 

  15. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, ICLR 2013–Workshop Track Proceedings

  16. Nikfarjam A, Sarker A, O’Connor K, Ginn R, Gonzalez G (2015) Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J Am Med Inf Assoc 22(3):671–681

    Google Scholar 

  17. Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In EMNLP 2014–2014 Conference on empirical methods in natural language processing, Proceedings of the conference

  18. Rodrigues RG, das Dores RM, Camilo-Junior CG, Rosa TC, (2014) SentiHealth-Cancer: a sentiment analysis tool to help detecting mood of patients in online social networks. Int J Med Inf 85(1):80–95

    Article  Google Scholar 

  19. Rosenthal S, Farra N, Nakov P (2018) SemEval-2017 Task 4: sentiment analysis in Twitter

  20. Sarker A, Gonzalez G (2017) A corpus for mining drug-related knowledge from Twitter chatter: language models and their utilities. Data Brief 10:122–131

    Article  Google Scholar 

  21. Sarker A, Belousov M, Friedrichs J, Hakala K, Kiritchenko S, Mehryary F, Han S, Tran T, Rios A, Kavuluru R, De Bruijn B, Ginter F, Mahata D, Mohammad SM, Nenadic G, Gonzalez-Hernandez G (2018) Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task. J Am Med Inf Assoc 25(10):1274–1283

    Article  Google Scholar 

  22. Speriosu M, Sudan N, Upadhyay S, Baldridge J (2011) Twitter polarity classification with label propagation over lexical links and the follower graph. In: Proceedings of the conference on empirical methods in natural language processing. ISBN: 9781937284138

  23. Tu-Bao H, Ly L, Dang TT, Siriwon T (2016) Data-driven approach to detect and predict adverse drug reactions. Curr Pharm Design 22(23):3498–3526

    Article  Google Scholar 

  24. Wei CH, Kao HY, Lu Z (2013) PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res 41(W1):518–522

    Article  Google Scholar 

  25. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, MacIejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46(D1):1074–1082

    Article  Google Scholar 

  26. Zolnoori M, Fung KW, Patrick TB, Fontelo P, Kharrazi H, Faiola A, Shah ND, Shirley WYS, Eldredge CE, Luo J, Conway M, Zhu J, Park SK, Xu K, Moayyed H (2019) The PsyTAR dataset: from patients generated narratives to a corpus of adverse drug events and effectiveness of psychiatric medications. Data Brief 24:103838

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Hanane Grissette.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Grissette, H., Nfaoui, E.H. Enhancing convolution-based sentiment extractor via dubbed N-gram embedding-related drug vocabulary. Netw Model Anal Health Inform Bioinforma 9, 42 (2020). https://doi.org/10.1007/s13721-020-00248-5

Download citation

Keywords

  • Drug-related knowledge
  • Medical sentiment analysis
  • Bidirectional LSTM
  • Convolutional neural network
  • Biomedical natural language processing