Computational Methods for Text Analysis and Text Classification

  • Hercules Dalianis
Open Access


This chapter presents the computational methods for text analysis and text classification, including both rule-based and machine learning-based methods such as unsupervised and supervised methods.


  1. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings 20th International Conference on Very Large Data Bases, VLDB (Vol. 1215, pp. 487–499).Google Scholar
  2. Alpaydin, E. (2014). Introduction to Machine Learning. Cambridge, MA: The MIT Press.Google Scholar
  3. Bank, M., & Schierle, M. (2012). A survey of text mining architectures and the UIMA Standard. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012 (pp. 3479–3486).Google Scholar
  4. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.CrossRefGoogle Scholar
  5. Boström, H., & Dalianis, H. (2012). De-identifying health records by means of active learning. In Proceedings of the 29th International Conference on Machine Learning ICML 2012 (pp. 1–3).Google Scholar
  6. Boytcheva, S., Nikolova, I., & Angelova, G. (2017a). Mining association rules from clinical narratives. In Proceedings of Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria (pp. 130–138).Google Scholar
  7. Dalianis, H., & Boström, H. (2012). Releasing a Swedish clinical corpus after removing all words–de-identification experiments with conditional random fields and random forests. In Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012) Held in Conjunction with LREC (pp. 45–48).Google Scholar
  8. Ehrentraut, C., Kvist, M., Sparrelid, E., & Dalianis, H. (2014). Detecting healthcare-associated infections in electronic health records: Evaluation of machine learning and preprocessing techniques. In Sixth International Symposium on Semantic Mining in Biomedicine (SMBM 2014). University of Aveiro.Google Scholar
  9. Friedman, C. (2005). Semantic text parsing for patient records. In Medical Informatics (pp. 423–448). Berlin: Springer.Google Scholar
  10. Friedman, C., Johnson, S. B., Forman, B., & Starren, J. (1995). Architectural requirements for a multipurpose natural language processor in the clinical environment. In Proceedings of the Annual Symposium on Computer Application in Medical Care (p. 347). American Medical Informatics Association.Google Scholar
  11. Hanauer, D., Aberdeen, J., Bayer, S., Wellner, B., Clark, C., Zheng, K., & Hirschman, L. (2013). Bootstrapping a de-identification system for narrative patient records: Cost-performance tradeoffs. International Journal of Medical Informatics, 82(9), 821–831.CrossRefGoogle Scholar
  12. Henriksson, A., Moen, H., Skeppstedt, M., Daudaravicius, V., & Duneld, M. (2014). Synonym extraction and abbreviation expansion with ensembles of semantic spaces. Journal of Biomedical Semantics, 5, 6.CrossRefGoogle Scholar
  13. Kholghi, M., Sitbon, L., Zuccon, G., & Nguyen, A. (2015). Active learning: A step towards automating medical concept extraction. Journal of the American Medical Informatics Association, 23(2), 289–296.CrossRefGoogle Scholar
  14. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings 18th International Conference on Machine Learning (pp. 282–289). Los Altos, CA: Morgan Kaufmann.Google Scholar
  15. Lingren, T., Deleger, L., Molnar, K., Zhai, H., Meinzen-Derr, J., Kaiser, M., et al. (2014). Evaluating the impact of pre-annotation on annotation speed and potential bias: Natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. Journal of the American Medical Informatics Association, 21(3), 406–413.CrossRefGoogle Scholar
  16. Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge: Cambridge University Press.Google Scholar
  17. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119).Google Scholar
  18. Olsson, F. (2008). Bootstrapping Named Entity Annotation by Means of Active Machine Learning: A Method for Creating Corpora. PhD thesis, Department of Swedish Language, University of Gothenburg.Google Scholar
  19. Olsson, F. (2009). A Literature Survey of Active Machine Learning in the Context of Natural Language Processing. Technical report, Swedish Institute of Computer Science.Google Scholar
  20. Rosell, M. (2009). Text Clustering Exploration: Swedish Text Representation and Clustering Results Unraveled. PhD thesis, Computer Science and Communications, CSC, KTH.Google Scholar
  21. Sahlgren, M. (2006). The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations Between Words in High-Dimensional Vector Spaces. PhD thesis, Department of Linguistics, Stockholm University.Google Scholar
  22. Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C., et al. (2010). Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17(5), 507–513.CrossRefGoogle Scholar
  23. Settles, B. (2009). Active Learning Literature Survey. Computer Sciences Technical report 1648, University of Wisconsin–Madison.Google Scholar
  24. Skeppstedt, M. (2013). Annotating named entities in clinical text by combining pre-annotation and active learning. In ACL (Student Research Workshop) (pp. 74–80).Google Scholar
  25. Skeppstedt, M., Kvist, M., Nilsson, G., & Dalianis, H. (2014). Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. In Journal of Biomedical Informatics, 49, 148–158.CrossRefGoogle Scholar
  26. Skeppstedt, M., Paradis, C., & Kerren, A. (2017). PAL, a tool for pre-annotation and active learning. Journal for Language Technology and Computational Linguistics, 31(1), 91–110.Google Scholar
  27. Stumpf, S., Rajaram, V., Li, L., Wong, W.-K., Burnett, M., Dietterich, T., et al. (2009). Interacting meaningfully with machine learning systems: Three experiments. International Journal of Human-Computer Studies, 67(8), 639–662.CrossRefGoogle Scholar
  28. Van Rijsbergen, C. J. (1979). Information Retrieval. Butterworth & Co. Accessed 11 Jan 2018.zbMATHGoogle Scholar

Copyright information

© The Author(s) 2018

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Hercules Dalianis
    • 1
  1. 1.DSV-Stockholm UniversityKistaSweden

Personalised recommendations