Advertisement

Adaptation of Algorithms for Medical Information Retrieval for Working on Russian-Language Text Content

  • Aleksandra Vatian
  • Natalia Dobrenko
  • Anastasia Makarenko
  • Niyaz Nigmatullin
  • Nikolay Vedernikov
  • Artem Vasilev
  • Andrey Stankevich
  • Natalia Gusarova
  • Anatoly Shalyto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

The paper investigates the possibilities of adapting various ADR algorithms to the Russian language environment. In general, the ADR detection process consists of 4 steps: (1) data collection from social media; (2) classification/filtering of ADR assertive text segments; (3) extraction of ADR mentions from text segments; (4) analysis of extracted ADR mentions for signal generation. The implementation of each step in the Russian-language environment is associated with a number of difficulties in comparison with the traditional English-speaking environment. First of all, they are connected with the lack of necessary databases and specialized language resources. In addition, an important negative role is played by the complex grammatical structure of the Russian language. The authors present various methods of machine learning algorithms adaptation in order to overcome these difficulties. For step 3 on the material of Russian-language text forums using the ensemble classifier, the Accuracy = 0.805 was obtained. For step 4 on the material of Russian-language EHR, by adapting pyConTextNLP, the F-measure = 0.935 was obtained, and by adapting ConText algorithm, the F-measure = 0.92–0.95 was obtained. A method for full-scale performing of step 4 was developed using cue-based and rule-based approaches, and the F-measure = 67.5% was obtained that is quite comparable to baseline.

Keywords

Adverse drug reaction Natural language processing Russian-language text content 

Notes

Acknowledgment

This work was financially supported by the Government of Russian Federation, “Grant 08-08”. This work financially supported by Ministry of Education and Science of the Russian Federation, Agreement #14.578.21.0196 (03/10/2016). Unique Identification RFMEFI57816X0196.

References

  1. 1.
    Afzal, Z., Pons, E., Kang, N., Sturkenboom, M.C., Schuemie, M.J., Kors, J.A.: ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus. BMC Bioinform. 15(1), 373 (2014)CrossRefGoogle Scholar
  2. 2.
    Allahyari, M., et al.: Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268 (2017)
  3. 3.
    Baranov, A., et al.: Technologies for complex intelligent clinical data analysis. Vestnik Rossiiskoi akademii meditsinskikh nauk 2, 160–171 (2016)CrossRefGoogle Scholar
  4. 4.
    Bhatia, N., Jaiswal, A.: Automatic text summarization and it’s methods - a review. In: 2016 6th International Conference on Cloud System and Big Data Engineering, Confluence, pp. 65–72. IEEE (2016)Google Scholar
  5. 5.
    Gildeeva, G., Yurkov, V.: Pharmacovigilance in Russia: challenges, prospects and current state of affairs. J. Pharmacovigil. (2016)Google Scholar
  6. 6.
    Gonzalez, G.H., Tahsin, T., Goodale, B.C., Greene, A.C., Greene, C.S.: Recent advances and emerging applications in text and data mining for biomedical discovery. Brief. Bioinform. 17(1), 33–42 (2015)CrossRefGoogle Scholar
  7. 7.
    Grozin, V., Buraya, K., Gusarova, N.: Comparison of text forum summarization depending on query type for text forums. In: Soh, P.J., Woo, W.L., Sulaiman, H.A., Othman, M.A., Saat, M.S. (eds.) Advances in Machine Learning and Signal Processing. LNEE, vol. 387, pp. 269–279. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-32213-1_24CrossRefGoogle Scholar
  8. 8.
    Lapaev, M.: Automated extraction of concept matcher thesaurus from semi-structured catalogue-like sources of data on the web. In: 2016 18th Conference of Open Innovations Association and Seminar on Information Security and Protection of Information Technology, FRUCT-ISPIT, pp. 153–160. IEEE (2016)Google Scholar
  9. 9.
    Liu, X., Chen, H.: A research framework for pharmacovigilance in health social media: identification and evaluation of patient adverse drug event reports. J. Biomed. Inform. 58, 268–279 (2015)CrossRefGoogle Scholar
  10. 10.
    Lushnov, M., Kudashov, V., Vodyaho, A., Lapaev, M., Zhukova, N., Korobov, D.: Medical knowledge representation for evaluation of patient’s state using complex indicators. In: Ngonga Ngomo, A.-C., Křemen, P. (eds.) KESW 2016. CCIS, vol. 649, pp. 344–359. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-45880-9_26CrossRefGoogle Scholar
  11. 11.
    Sarker, A., Gonzalez, G.: Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J. Biomed. Inform. 53, 196–207 (2015)CrossRefGoogle Scholar
  12. 12.
    Shelmanov, A., Smirnov, I., Vishneva, E.: Information extraction from clinical texts in Russian. In: Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference, Dialogue, vol. 14, pp. 537–549 (2015)Google Scholar
  13. 13.
    Velupillai, S., et al.: Cue-based assertion classification for Swedish clinical text—Developing a lexicon for pyConTextSwe. Artif. Intell. Med. 61(3), 137–144 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Aleksandra Vatian
    • 1
  • Natalia Dobrenko
    • 1
  • Anastasia Makarenko
    • 1
  • Niyaz Nigmatullin
    • 1
  • Nikolay Vedernikov
    • 1
  • Artem Vasilev
    • 1
  • Andrey Stankevich
    • 1
  • Natalia Gusarova
    • 1
  • Anatoly Shalyto
    • 1
  1. 1.ITMO UniversitySaint-PetersburgRussia

Personalised recommendations