Semi-supervised Knowledge Extraction for Detection of Drugs and Their Effects

Del Vigna, Fabio; Petrocchi, Marinella; Tommasi, Alessandro; Zavattari, Cesare; Tesconi, Maurizio

doi:10.1007/978-3-319-47880-7_31

Fabio Del Vigna^15,16,
Marinella Petrocchi¹⁶,
Alessandro Tommasi¹⁷,
Cesare Zavattari¹⁷ &
…
Maurizio Tesconi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10046))

Included in the following conference series:

International Conference on Social Informatics

2946 Accesses
3 Citations

Abstract

New Psychoactive Substances (NPS) are drugs that lay in a grey area of legislation, since they are not internationally and officially banned, possibly leading to their not prosecutable trade. The exacerbation of the phenomenon is that NPS can be easily sold and bought online. Here, we consider large corpora of textual posts, published on online forums specialized on drug discussions, plus a small set of known substances and associated effects, which we call seeds. We propose a semi-supervised approach to knowledge extraction, applied to the detection of drugs (comprising NPS) and effects from the corpora under investigation. Based on the very small set of initial seeds, the work highlights how a contrastive approach and context deduction are effective in detecting substances and effects from the corpora. Our promising results, which feature a F1 score close to 0.9, pave the way for shortening the detection time of new psychoactive substances, once these are discussed and advertised on the Internet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.emcdda.europa.eu/start/2016/drug-markets#pane2/4; All URLs in the paper have been accessed on July 10, 2016.
2.
https://www.drugabuse.gov/publications/research-reports/prescription-drugs/director.
3.
http://www.bluelight.org.
4.
https://drugs-forum.com.
5.
http://www.talktofrank.com.
6.
http://www.drugbank.ca.
7.
A noun-phrase is a phrase that plays the role of a noun such as “the kid that Santa Claus forgot".
8.
http://lucene.apache.org/.
9.
\(precision=\frac{TP}{TP+FP}\).
10.
\(recall=\frac{TP}{TP+FN}\).
11.
harmonic mean of precision and recall: \(F1=2\cdot \frac{precision \cdot recall}{precision+recall}\).

References

Attardi, G., Gull, A., Sebastiani, F.: Theseus: categorization by context. Univ. Comput. Sci. (1998)
Google Scholar
Bellandi, A., Nasoni, S., Tommasi, A., Zavattari, C.: Ontology-driven relation extraction by pattern discovery. In: Information, Process, and Knowledge Management, pp. 1–6. IEEE Computer Society (2010)
Google Scholar
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Computational Learning Theory. pp. 92–100. ACM (1998)
Google Scholar
Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: Web Search and Data Mining, pp. 101–110. ACM (2010)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)
Article Google Scholar
Chang, M.W., Ratinov, L., Roth, D.: Guiding semi-supervision with constraint-driven learning. In: Annual Meeting - Association for Computational Linguistics, pp. 280–287 (2007)
Google Scholar
Davey, Z., Schifano, F., Corazza, O., Deluca, P.: e-Psychonauts: conducting research in online drug forum communities. J. Ment. Health 21(4), 386–394 (2012)
Article Google Scholar
Davies, S., et al.: Purchasing legal highs on the Internet - is there consistency in what you get? QJM 103(7), 489–493 (2010)
Article Google Scholar
Del Vigna, F., Avvenuti, M., Bacciu, C., Deluca, P., Marchetti, A., Petrocchi, M., Tesconi, M.: Spotting the diffusion of new psychoactive substances over the internet. arXiv preprint arXiv:1605.03817 (2016)
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165(1), 91–134 (2005)
Article Google Scholar
Freifeld, C.C., Brownstein, J.S., Menone, C.M., Bao, W., Filice, R., Kass-Hout, T., Dasgupta, N.: Digital drug safety surveillance: monitoring pharmaceutical products in Twitter. Drug Saf. 37(5), 343–350 (2014)
Article Google Scholar
Katsuki, T., Mackey, T.K., Cuomo, R.: Establishing a link between prescription drug abuse and illicit online pharmacies: analysis of Twitter data. J. Med. Internet Res. 17(12) (2015)
Google Scholar
Mackey, T.K., Liang, B.A., Strathdee, S.A.: Digital social media, youth, and nonmedical use of prescription drugs: the need for reform. J. Med. Internet Res. 15(7), e143 (2013)
Article Google Scholar
Marsh, E., Perzanowski, D.: MUC-7 evaluation of IE technology: overview of results. In: Seventh Message Understanding Conference (MUC-7) (1998)
Google Scholar
Nikfarjam, A., Sarker, A., OConnor, K., Ginn, R., Gonzalez, G.: Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J. Am. Med. Inform. Assoc. 22(3), 671–681 (2015)
Google Scholar
Penas, A., Verdejo, F., Gonzalo, J.: Corpus-based terminology extraction applied to information access. In: Corpus Linguistics, pp. 458–465 (2001)
Google Scholar
Riloff, E., Jones, R., et al.: Learning dictionaries for information extraction by multi-level bootstrapping. In: AAAI/IAAI, pp. 474–479 (1999)
Google Scholar
Rosenfeld, B., Feldman, R.: Using corpus statistics on entities to improve semi-supervised relation extraction from the web. In: Annual Meeting - Association for Computational Linguistics, pp. 600–607 (2007)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Article Google Scholar
Schifano, F., Corazza, O., Deluca, P., Davey, Z., Furia, L.D., Farre’, M., Flesland, L., Mannonen, M., Pagani, S., Peltoniemi, T., Pezzolesi, C., Scherbaum, N., Siemann, H., Skutle, A., Torrens, M., Kreeft, P.V.D.: Psychoactive drug or mystical incense? Overview of the online available information on Spice products. Int. J. Cult. Ment. Health 2(2), 137–144 (2009)
Article Google Scholar
Schmidt, M.M., Sharma, A., Schifano, F., Feinmann, C.: Legal highs on the net-Evaluation of UK-based websites, products and product information. Forensic Sci. Int. 206(1), 92–97 (2011)
Article Google Scholar
Smith, N.A., Eisner, J.: Contrastive estimation: training log-linear models on unlabeled data. In: Annual Meeting - Association for Computational Linguistics, pp. 354–362 (2005)
Google Scholar
Soussan, C., Kjellgren, A.: Harm reduction and knowledge exchange–a qualitative analysis of drug-related Internet discussion forums. Harm Reduct. J. 11(1), 1–9 (2014)
Article Google Scholar
Watters, P.A., Phair, N.: Detecting illicit drugs on social media using automated social media intelligence analysis (ASMIA). In: Xiang, Y., Lopez, J., Kuo, C.-C.J., Zhou, W. (eds.) CSS 2012. LNCS, vol. 7672, pp. 66–76. Springer, Heidelberg (2012)
Chapter Google Scholar
Witten, H.I., Don, J.K., Dewsnip, M., Tablan, V.: Text mining in a digital library. Int. J. Digit. Libr. 4(1), 56–59 (2004)
Article Google Scholar
Xie, J., Xiong, T.: Stochastic semi-supervised learning on partially labeled imbalanced data. In: Active Learning Challenge Challenges in Machine Learning (2011)
Google Scholar
Yang, C.C., Yang, H., Jiang, L.: Postmarketing drug safety surveillance using publicly available health-consumer-contributed content in social media. ACM Trans. Manage. Inf. Syst. 5(1), 2:1–2:21 (2014)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This publication arises from the project CASSANDRA, (Computer Assisted Solutions for Studying the Availability aNd Distribution of novel psychoActive substances)" which has received funding from the European Union under the ISEC programme.

Prevention of and fight against crime [JUST2013/ISEC/DRUGS/AG/6414].

Author information

Authors and Affiliations

Department of Information Engineering, University of Pisa, Pisa, Italy
Fabio Del Vigna
Institute of Informatics and Telematics (IIT-CNR), Pisa, Italy
Fabio Del Vigna, Marinella Petrocchi & Maurizio Tesconi
LUCENSE SCaRL, Lucca, Italy
Alessandro Tommasi & Cesare Zavattari

Authors

Fabio Del Vigna
View author publications
You can also search for this author in PubMed Google Scholar
Marinella Petrocchi
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Tommasi
View author publications
You can also search for this author in PubMed Google Scholar
Cesare Zavattari
View author publications
You can also search for this author in PubMed Google Scholar
Maurizio Tesconi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marinella Petrocchi .

Editor information

Editors and Affiliations

University of Washington, Seattle, Washington, USA
Emma Spiro
Indiana University, Bloomington, Indiana, USA
Yong-Yeol Ahn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Del Vigna, F., Petrocchi, M., Tommasi, A., Zavattari, C., Tesconi, M. (2016). Semi-supervised Knowledge Extraction for Detection of Drugs and Their Effects. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10046. Springer, Cham. https://doi.org/10.1007/978-3-319-47880-7_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-47880-7_31
Published: 23 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47879-1
Online ISBN: 978-3-319-47880-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics