Abstract
New Psychoactive Substances (NPS) are drugs that lay in a grey area of legislation, since they are not internationally and officially banned, possibly leading to their not prosecutable trade. The exacerbation of the phenomenon is that NPS can be easily sold and bought online. Here, we consider large corpora of textual posts, published on online forums specialized on drug discussions, plus a small set of known substances and associated effects, which we call seeds. We propose a semi-supervised approach to knowledge extraction, applied to the detection of drugs (comprising NPS) and effects from the corpora under investigation. Based on the very small set of initial seeds, the work highlights how a contrastive approach and context deduction are effective in detecting substances and effects from the corpora. Our promising results, which feature a F1 score close to 0.9, pave the way for shortening the detection time of new psychoactive substances, once these are discussed and advertised on the Internet.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
http://www.emcdda.europa.eu/start/2016/drug-markets#pane2/4; All URLs in the paper have been accessed on July 10, 2016.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
A noun-phrase is a phrase that plays the role of a noun such as “the kid that Santa Claus forgot".
- 8.
- 9.
\(precision=\frac{TP}{TP+FP}\).
- 10.
\(recall=\frac{TP}{TP+FN}\).
- 11.
harmonic mean of precision and recall: \(F1=2\cdot \frac{precision \cdot recall}{precision+recall}\).
References
Attardi, G., Gull, A., Sebastiani, F.: Theseus: categorization by context. Univ. Comput. Sci. (1998)
Bellandi, A., Nasoni, S., Tommasi, A., Zavattari, C.: Ontology-driven relation extraction by pattern discovery. In: Information, Process, and Knowledge Management, pp. 1–6. IEEE Computer Society (2010)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Computational Learning Theory. pp. 92–100. ACM (1998)
Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: Web Search and Data Mining, pp. 101–110. ACM (2010)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)
Chang, M.W., Ratinov, L., Roth, D.: Guiding semi-supervision with constraint-driven learning. In: Annual Meeting - Association for Computational Linguistics, pp. 280–287 (2007)
Davey, Z., Schifano, F., Corazza, O., Deluca, P.: e-Psychonauts: conducting research in online drug forum communities. J. Ment. Health 21(4), 386–394 (2012)
Davies, S., et al.: Purchasing legal highs on the Internet - is there consistency in what you get? QJM 103(7), 489–493 (2010)
Del Vigna, F., Avvenuti, M., Bacciu, C., Deluca, P., Marchetti, A., Petrocchi, M., Tesconi, M.: Spotting the diffusion of new psychoactive substances over the internet. arXiv preprint arXiv:1605.03817 (2016)
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165(1), 91–134 (2005)
Freifeld, C.C., Brownstein, J.S., Menone, C.M., Bao, W., Filice, R., Kass-Hout, T., Dasgupta, N.: Digital drug safety surveillance: monitoring pharmaceutical products in Twitter. Drug Saf. 37(5), 343–350 (2014)
Katsuki, T., Mackey, T.K., Cuomo, R.: Establishing a link between prescription drug abuse and illicit online pharmacies: analysis of Twitter data. J. Med. Internet Res. 17(12) (2015)
Mackey, T.K., Liang, B.A., Strathdee, S.A.: Digital social media, youth, and nonmedical use of prescription drugs: the need for reform. J. Med. Internet Res. 15(7), e143 (2013)
Marsh, E., Perzanowski, D.: MUC-7 evaluation of IE technology: overview of results. In: Seventh Message Understanding Conference (MUC-7) (1998)
Nikfarjam, A., Sarker, A., OConnor, K., Ginn, R., Gonzalez, G.: Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J. Am. Med. Inform. Assoc. 22(3), 671–681 (2015)
Penas, A., Verdejo, F., Gonzalo, J.: Corpus-based terminology extraction applied to information access. In: Corpus Linguistics, pp. 458–465 (2001)
Riloff, E., Jones, R., et al.: Learning dictionaries for information extraction by multi-level bootstrapping. In: AAAI/IAAI, pp. 474–479 (1999)
Rosenfeld, B., Feldman, R.: Using corpus statistics on entities to improve semi-supervised relation extraction from the web. In: Annual Meeting - Association for Computational Linguistics, pp. 600–607 (2007)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Schifano, F., Corazza, O., Deluca, P., Davey, Z., Furia, L.D., Farre’, M., Flesland, L., Mannonen, M., Pagani, S., Peltoniemi, T., Pezzolesi, C., Scherbaum, N., Siemann, H., Skutle, A., Torrens, M., Kreeft, P.V.D.: Psychoactive drug or mystical incense? Overview of the online available information on Spice products. Int. J. Cult. Ment. Health 2(2), 137–144 (2009)
Schmidt, M.M., Sharma, A., Schifano, F., Feinmann, C.: Legal highs on the net-Evaluation of UK-based websites, products and product information. Forensic Sci. Int. 206(1), 92–97 (2011)
Smith, N.A., Eisner, J.: Contrastive estimation: training log-linear models on unlabeled data. In: Annual Meeting - Association for Computational Linguistics, pp. 354–362 (2005)
Soussan, C., Kjellgren, A.: Harm reduction and knowledge exchange–a qualitative analysis of drug-related Internet discussion forums. Harm Reduct. J. 11(1), 1–9 (2014)
Watters, P.A., Phair, N.: Detecting illicit drugs on social media using automated social media intelligence analysis (ASMIA). In: Xiang, Y., Lopez, J., Kuo, C.-C.J., Zhou, W. (eds.) CSS 2012. LNCS, vol. 7672, pp. 66–76. Springer, Heidelberg (2012)
Witten, H.I., Don, J.K., Dewsnip, M., Tablan, V.: Text mining in a digital library. Int. J. Digit. Libr. 4(1), 56–59 (2004)
Xie, J., Xiong, T.: Stochastic semi-supervised learning on partially labeled imbalanced data. In: Active Learning Challenge Challenges in Machine Learning (2011)
Yang, C.C., Yang, H., Jiang, L.: Postmarketing drug safety surveillance using publicly available health-consumer-contributed content in social media. ACM Trans. Manage. Inf. Syst. 5(1), 2:1–2:21 (2014)
Acknowledgements
This publication arises from the project CASSANDRA, (Computer Assisted Solutions for Studying the Availability aNd Distribution of novel psychoActive substances)" which has received funding from the European Union under the ISEC programme.
Prevention of and fight against crime [JUST2013/ISEC/DRUGS/AG/6414].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Del Vigna, F., Petrocchi, M., Tommasi, A., Zavattari, C., Tesconi, M. (2016). Semi-supervised Knowledge Extraction for Detection of Drugs and Their Effects. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10046. Springer, Cham. https://doi.org/10.1007/978-3-319-47880-7_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-47880-7_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47879-1
Online ISBN: 978-3-319-47880-7
eBook Packages: Computer ScienceComputer Science (R0)