Detecting Latin-Based Medical Terminology in Croatian Texts

  • Kristina KocijanEmail author
  • Maria Pia di Buono
  • Linda Mijić
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 987)


No matter what the main language of texts in the medical domain is, there is always an evidence of the usage of Latin-derived words and formative elements in terminology development. Generally speaking, this usage presents language-specific morpho-semantic behaviors in forming both technical-scientific and common-usage words. Nevertheless, this usage of Latin in Croatian medical texts does not seem consistent due to the fact that different mechanisms of word formation may be applied to the same term. In our pursuit to map all the different occurrences of the same concept to only one, we propose a model designed within NooJ and based on dictionaries and morphological grammars. Starting from the manual detection of nouns and their variations, we recognize some word formation mechanisms and develop grammars suitable to recognize Latinisms and Croatinized Latin medical terminology.


Medical terminology Morphological grammars Latin terms Latinisms Croatian Latin NooJ 



This research has been partly supported by the European Regional Development Fund under the grant KK. (DATACROSS).


  1. 1.
    Schneier, B.: The Hidden Battles to Collect your Data and Control your World. Data and Goliath, London (2015)Google Scholar
  2. 2.
    Davenport, T.: Big Data at Work: Dispelling the Myths, Uncovering the Opportunities. Harvard Business Review Press, Boston (2014)CrossRefGoogle Scholar
  3. 3.
    Simon, P.: Too Big to Ignore: The Business Case for Big Data, vol. 72. Wiley, Hoboken (2013)Google Scholar
  4. 4.
    Liu, H., Christiansen, T., Baumgartner, W.A., Verspoor, K.: Biolemmatizer: a lemmatization tool for morphological processing of biomedical text. J. Biomed. Semant. 3(1), 3 (2012)CrossRefGoogle Scholar
  5. 5.
    di Buono, M.P., Maisto, A., Pelosi, S.: From linguistic resources to medical entity recognition: a supervised morphosyntactic approach. ALLDATA 2015, 82 (2015)Google Scholar
  6. 6.
    Poljak, Ž.: Quo vadis, Croatian medical terminology-should the diagnoses be written in Croatian, Latin or English? Acta Clinica Croatica 46(1–Supplement 1), 121–126 (2007)Google Scholar
  7. 7.
    Gjuran-Coha, A., Bosnar-Valković, B.: Lingvistička analiza medicinskoga diskursa. JAHR 4(7), 107–128 (2013)Google Scholar
  8. 8.
    Estopa, R., Vivaldi, J., Cabre, M.T.: Use of Greek and Latin forms for term detection. In: LREC (2000)Google Scholar
  9. 9.
    Herrero-Zorita, C., Moreno-Sandoval, A.: Medical term formation in English and Japanese. Rev. Cogn. Linguist. 13(1), 81–105 (2015). Published under the auspices of the Spanish Cognitive Linguistics AssociationCrossRefGoogle Scholar
  10. 10.
    Smith, G.L., Davis, P.E., Soltesz, S.E.: Quick Medical Terminology. In: Smith, G.L., Davis, P.E. (eds.) Consultation with Shirley Soltesz, E. Wiley, Hoboken (1972)Google Scholar
  11. 11.
    Piñero, J.M.L., Terrada, M.L.: Introducción a la terminología médica. Elsevier, España (2005)Google Scholar
  12. 12.
    Abacha, A.B., Zweigenbaum, P.: Medical entity recognition: a comparison of semantic and statistical methods. In: Proceedings of BioNLP 2011 Workshop, pp. 56–64. Association for Computational Linguistics (2011)Google Scholar
  13. 13.
    Pacak, M., Pratt, A.: Identification and transformation of terminal morphemes in medical English Part II. Methods Inf. Med. 17(02), 95–100 (1978)CrossRefGoogle Scholar
  14. 14.
    Wolff, S.: The use of morphosemantic regularities in the medical vocabulary for automatic lexical coding. Methods Inf. Med. 23(04), 195–203 (1984)CrossRefGoogle Scholar
  15. 15.
    Pacak, M.G., Norton, L., Dunham, G.S.: Morphosemantic analysis of-itis forms in medical language. Methods Inf. Med. 19(02), 99–105 (1980)CrossRefGoogle Scholar
  16. 16.
    Norton, L., Pacak, M.G.: Morphosemantic analysis of compound word forms denoting surgical procedures. Methods Inf. Med. 22(01), 29–36 (1983)CrossRefGoogle Scholar
  17. 17.
    Dujols, P., Aubas, P., Baylon, C., Grémy, F.: Morpho-semantic analysis and translation of medical compound terms. Methods Inf. Med. 30(1), 30–35 (1991)CrossRefGoogle Scholar
  18. 18.
    Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: the metamap program. In: Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association (2001)Google Scholar
  19. 19.
    Hahn, U., Romacker, M., Schulz, S.: Medsyndikate-a natural language system for the extraction of medical information from findings reports. Int. J. Med. Inform. 67(1–3), 63–74 (2002)CrossRefGoogle Scholar
  20. 20.
    Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proceedings of the 19th international conference on Computational linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)Google Scholar
  21. 21.
    He, Y., Kayaalp, M.: Biological entity recognition with conditional random fields. In: AMIA Annual Symposium Proceedings, vol. 2008, p. 293. American Medical Informatics Association (2008)Google Scholar
  22. 22.
    Finkel, J., Dingare, S., Nguyen, H., Nissim, M., Manning, C., Sinclair, G.: Exploiting context for biomedical entity recognition: from syntax to the web. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 88–91. Association for Computational Linguistics (2004)Google Scholar
  23. 23.
    de la Villa, M., Aparicio, F., Maña, M.J., de Buenaga, M.: A learning support tool with clinical cases based on concept maps and medical entity recognition. In: Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, pp. 61–70. ACM (2012)Google Scholar
  24. 24.
    Khoo, C.S., Chan, S., Niu, Y.: Extracting causal knowledge from a medical database using graphical patterns. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 336–343. Association for Computational Linguistics (2000)Google Scholar
  25. 25.
    Skeppstedt, M., Kvist, M., Dalianis, H.: Rule-based entity recognition and coverage of snomed ct in swedish clinical text. In: LREC, pp. 1250–1257 (2012)Google Scholar
  26. 26.
    Proux, D., Rechenmann, F., Julliard, L., Pillet, V., Jacq, B.: Detecting gene symbols and names in biological texts. Genome Inform. 9, 72–80 (1998)Google Scholar
  27. 27.
    Liang, T., Shih, P.-K.: Empirical textual mining to protein entities recognition from PubMed corpus. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 56–66. Springer, Heidelberg (2005). Scholar
  28. 28.
    Roberts, A., Gaizauskas, R.J., Hepple, M., Guo, Y.: Combining terminology resources and statistical methods for entity recognition: an evaluation. In: LREC (2008)Google Scholar
  29. 29.
    Silberztein, M.: Formalizing Natural Languages: The NooJ Approach. Wiley, London (2016)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Information and Communication Sciences, Faculty of Humanities and Social SciencesUniversity of ZagrebZagrebCroatia
  2. 2.TakeLab ZEMRIS, Faculty of Electrical Engineering and ComputingUniversity of ZagrebZagrebCroatia
  3. 3.Department of Classical PhilologyUniversity of ZadarZadarCroatia

Personalised recommendations