Skip to main content

Semi-supervised Knowledge Extraction for Detection of Drugs and Their Effects

  • Conference paper
  • First Online:
Book cover Social Informatics (SocInfo 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10046))

Included in the following conference series:

Abstract

New Psychoactive Substances (NPS) are drugs that lay in a grey area of legislation, since they are not internationally and officially banned, possibly leading to their not prosecutable trade. The exacerbation of the phenomenon is that NPS can be easily sold and bought online. Here, we consider large corpora of textual posts, published on online forums specialized on drug discussions, plus a small set of known substances and associated effects, which we call seeds. We propose a semi-supervised approach to knowledge extraction, applied to the detection of drugs (comprising NPS) and effects from the corpora under investigation. Based on the very small set of initial seeds, the work highlights how a contrastive approach and context deduction are effective in detecting substances and effects from the corpora. Our promising results, which feature a F1 score close to 0.9, pave the way for shortening the detection time of new psychoactive substances, once these are discussed and advertised on the Internet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.emcdda.europa.eu/start/2016/drug-markets#pane2/4; All URLs in the paper have been accessed on July 10, 2016.

  2. 2.

    https://www.drugabuse.gov/publications/research-reports/prescription-drugs/director.

  3. 3.

    http://www.bluelight.org.

  4. 4.

    https://drugs-forum.com.

  5. 5.

    http://www.talktofrank.com.

  6. 6.

    http://www.drugbank.ca.

  7. 7.

    A noun-phrase is a phrase that plays the role of a noun such as “the kid that Santa Claus forgot".

  8. 8.

    http://lucene.apache.org/.

  9. 9.

    \(precision=\frac{TP}{TP+FP}\).

  10. 10.

    \(recall=\frac{TP}{TP+FN}\).

  11. 11.

    harmonic mean of precision and recall: \(F1=2\cdot \frac{precision \cdot recall}{precision+recall}\).

References

  1. Attardi, G., Gull, A., Sebastiani, F.: Theseus: categorization by context. Univ. Comput. Sci. (1998)

    Google Scholar 

  2. Bellandi, A., Nasoni, S., Tommasi, A., Zavattari, C.: Ontology-driven relation extraction by pattern discovery. In: Information, Process, and Knowledge Management, pp. 1–6. IEEE Computer Society (2010)

    Google Scholar 

  3. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Computational Learning Theory. pp. 92–100. ACM (1998)

    Google Scholar 

  4. Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: Web Search and Data Mining, pp. 101–110. ACM (2010)

    Google Scholar 

  5. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)

    Article  Google Scholar 

  6. Chang, M.W., Ratinov, L., Roth, D.: Guiding semi-supervision with constraint-driven learning. In: Annual Meeting - Association for Computational Linguistics, pp. 280–287 (2007)

    Google Scholar 

  7. Davey, Z., Schifano, F., Corazza, O., Deluca, P.: e-Psychonauts: conducting research in online drug forum communities. J. Ment. Health 21(4), 386–394 (2012)

    Article  Google Scholar 

  8. Davies, S., et al.: Purchasing legal highs on the Internet - is there consistency in what you get? QJM 103(7), 489–493 (2010)

    Article  Google Scholar 

  9. Del Vigna, F., Avvenuti, M., Bacciu, C., Deluca, P., Marchetti, A., Petrocchi, M., Tesconi, M.: Spotting the diffusion of new psychoactive substances over the internet. arXiv preprint arXiv:1605.03817 (2016)

  10. Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165(1), 91–134 (2005)

    Article  Google Scholar 

  11. Freifeld, C.C., Brownstein, J.S., Menone, C.M., Bao, W., Filice, R., Kass-Hout, T., Dasgupta, N.: Digital drug safety surveillance: monitoring pharmaceutical products in Twitter. Drug Saf. 37(5), 343–350 (2014)

    Article  Google Scholar 

  12. Katsuki, T., Mackey, T.K., Cuomo, R.: Establishing a link between prescription drug abuse and illicit online pharmacies: analysis of Twitter data. J. Med. Internet Res. 17(12) (2015)

    Google Scholar 

  13. Mackey, T.K., Liang, B.A., Strathdee, S.A.: Digital social media, youth, and nonmedical use of prescription drugs: the need for reform. J. Med. Internet Res. 15(7), e143 (2013)

    Article  Google Scholar 

  14. Marsh, E., Perzanowski, D.: MUC-7 evaluation of IE technology: overview of results. In: Seventh Message Understanding Conference (MUC-7) (1998)

    Google Scholar 

  15. Nikfarjam, A., Sarker, A., OConnor, K., Ginn, R., Gonzalez, G.: Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J. Am. Med. Inform. Assoc. 22(3), 671–681 (2015)

    Google Scholar 

  16. Penas, A., Verdejo, F., Gonzalo, J.: Corpus-based terminology extraction applied to information access. In: Corpus Linguistics, pp. 458–465 (2001)

    Google Scholar 

  17. Riloff, E., Jones, R., et al.: Learning dictionaries for information extraction by multi-level bootstrapping. In: AAAI/IAAI, pp. 474–479 (1999)

    Google Scholar 

  18. Rosenfeld, B., Feldman, R.: Using corpus statistics on entities to improve semi-supervised relation extraction from the web. In: Annual Meeting - Association for Computational Linguistics, pp. 600–607 (2007)

    Google Scholar 

  19. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  20. Schifano, F., Corazza, O., Deluca, P., Davey, Z., Furia, L.D., Farre’, M., Flesland, L., Mannonen, M., Pagani, S., Peltoniemi, T., Pezzolesi, C., Scherbaum, N., Siemann, H., Skutle, A., Torrens, M., Kreeft, P.V.D.: Psychoactive drug or mystical incense? Overview of the online available information on Spice products. Int. J. Cult. Ment. Health 2(2), 137–144 (2009)

    Article  Google Scholar 

  21. Schmidt, M.M., Sharma, A., Schifano, F., Feinmann, C.: Legal highs on the net-Evaluation of UK-based websites, products and product information. Forensic Sci. Int. 206(1), 92–97 (2011)

    Article  Google Scholar 

  22. Smith, N.A., Eisner, J.: Contrastive estimation: training log-linear models on unlabeled data. In: Annual Meeting - Association for Computational Linguistics, pp. 354–362 (2005)

    Google Scholar 

  23. Soussan, C., Kjellgren, A.: Harm reduction and knowledge exchange–a qualitative analysis of drug-related Internet discussion forums. Harm Reduct. J. 11(1), 1–9 (2014)

    Article  Google Scholar 

  24. Watters, P.A., Phair, N.: Detecting illicit drugs on social media using automated social media intelligence analysis (ASMIA). In: Xiang, Y., Lopez, J., Kuo, C.-C.J., Zhou, W. (eds.) CSS 2012. LNCS, vol. 7672, pp. 66–76. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  25. Witten, H.I., Don, J.K., Dewsnip, M., Tablan, V.: Text mining in a digital library. Int. J. Digit. Libr. 4(1), 56–59 (2004)

    Article  Google Scholar 

  26. Xie, J., Xiong, T.: Stochastic semi-supervised learning on partially labeled imbalanced data. In: Active Learning Challenge Challenges in Machine Learning (2011)

    Google Scholar 

  27. Yang, C.C., Yang, H., Jiang, L.: Postmarketing drug safety surveillance using publicly available health-consumer-contributed content in social media. ACM Trans. Manage. Inf. Syst. 5(1), 2:1–2:21 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This publication arises from the project CASSANDRA, (Computer Assisted Solutions for Studying the Availability aNd Distribution of novel psychoActive substances)" which has received funding from the European Union under the ISEC programme.

Prevention of and fight against crime [JUST2013/ISEC/DRUGS/AG/6414].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marinella Petrocchi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Del Vigna, F., Petrocchi, M., Tommasi, A., Zavattari, C., Tesconi, M. (2016). Semi-supervised Knowledge Extraction for Detection of Drugs and Their Effects. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10046. Springer, Cham. https://doi.org/10.1007/978-3-319-47880-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47880-7_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47879-1

  • Online ISBN: 978-3-319-47880-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics