Computational Methods for Text Analysis and Text Classification

Dalianis, Hercules

doi:10.1007/978-3-319-78503-5_8

Hercules Dalianis²

21k Accesses
3 Citations

Abstract

This chapter presents the computational methods for text analysis and text classification, including both rule-based and machine learning-based methods such as unsupervised and supervised methods.

Download to read the full chapter text

Chapter PDF

References

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings 20th International Conference on Very Large Data Bases, VLDB (Vol. 1215, pp. 487–499).
Google Scholar
Alpaydin, E. (2014). Introduction to Machine Learning. Cambridge, MA: The MIT Press.
Google Scholar
Bank, M., & Schierle, M. (2012). A survey of text mining architectures and the UIMA Standard. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012 (pp. 3479–3486).
Google Scholar
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Article Google Scholar
Boström, H., & Dalianis, H. (2012). De-identifying health records by means of active learning. In Proceedings of the 29th International Conference on Machine Learning ICML 2012 (pp. 1–3).
Google Scholar
Boytcheva, S., Nikolova, I., & Angelova, G. (2017a). Mining association rules from clinical narratives. In Proceedings of Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria (pp. 130–138).
Google Scholar
Dalianis, H., & Boström, H. (2012). Releasing a Swedish clinical corpus after removing all words–de-identification experiments with conditional random fields and random forests. In Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012) Held in Conjunction with LREC (pp. 45–48).
Google Scholar
Ehrentraut, C., Kvist, M., Sparrelid, E., & Dalianis, H. (2014). Detecting healthcare-associated infections in electronic health records: Evaluation of machine learning and preprocessing techniques. In Sixth International Symposium on Semantic Mining in Biomedicine (SMBM 2014). University of Aveiro.
Google Scholar
Friedman, C. (2005). Semantic text parsing for patient records. In Medical Informatics (pp. 423–448). Berlin: Springer.
Google Scholar
Friedman, C., Johnson, S. B., Forman, B., & Starren, J. (1995). Architectural requirements for a multipurpose natural language processor in the clinical environment. In Proceedings of the Annual Symposium on Computer Application in Medical Care (p. 347). American Medical Informatics Association.
Google Scholar
Hanauer, D., Aberdeen, J., Bayer, S., Wellner, B., Clark, C., Zheng, K., & Hirschman, L. (2013). Bootstrapping a de-identification system for narrative patient records: Cost-performance tradeoffs. International Journal of Medical Informatics, 82(9), 821–831.
Article Google Scholar
Henriksson, A., Moen, H., Skeppstedt, M., Daudaravicius, V., & Duneld, M. (2014). Synonym extraction and abbreviation expansion with ensembles of semantic spaces. Journal of Biomedical Semantics, 5, 6.
Article Google Scholar
Kholghi, M., Sitbon, L., Zuccon, G., & Nguyen, A. (2015). Active learning: A step towards automating medical concept extraction. Journal of the American Medical Informatics Association, 23(2), 289–296.
Article Google Scholar
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings 18th International Conference on Machine Learning (pp. 282–289). Los Altos, CA: Morgan Kaufmann.
Google Scholar
Lingren, T., Deleger, L., Molnar, K., Zhai, H., Meinzen-Derr, J., Kaiser, M., et al. (2014). Evaluating the impact of pre-annotation on annotation speed and potential bias: Natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. Journal of the American Medical Informatics Association, 21(3), 406–413.
Article Google Scholar
Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge: Cambridge University Press.
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119).
Google Scholar
Olsson, F. (2008). Bootstrapping Named Entity Annotation by Means of Active Machine Learning: A Method for Creating Corpora. PhD thesis, Department of Swedish Language, University of Gothenburg.
Google Scholar
Olsson, F. (2009). A Literature Survey of Active Machine Learning in the Context of Natural Language Processing. Technical report, Swedish Institute of Computer Science.
Google Scholar
Rosell, M. (2009). Text Clustering Exploration: Swedish Text Representation and Clustering Results Unraveled. PhD thesis, Computer Science and Communications, CSC, KTH.
Google Scholar
Sahlgren, M. (2006). The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations Between Words in High-Dimensional Vector Spaces. PhD thesis, Department of Linguistics, Stockholm University.
Google Scholar
Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C., et al. (2010). Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17(5), 507–513.
Article Google Scholar
Settles, B. (2009). Active Learning Literature Survey. Computer Sciences Technical report 1648, University of Wisconsin–Madison.
Google Scholar
Skeppstedt, M. (2013). Annotating named entities in clinical text by combining pre-annotation and active learning. In ACL (Student Research Workshop) (pp. 74–80).
Google Scholar
Skeppstedt, M., Kvist, M., Nilsson, G., & Dalianis, H. (2014). Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. In Journal of Biomedical Informatics, 49, 148–158.
Article Google Scholar
Skeppstedt, M., Paradis, C., & Kerren, A. (2017). PAL, a tool for pre-annotation and active learning. Journal for Language Technology and Computational Linguistics, 31(1), 91–110.
Google Scholar
Stumpf, S., Rajaram, V., Li, L., Wong, W.-K., Burnett, M., Dietterich, T., et al. (2009). Interacting meaningfully with machine learning systems: Three experiments. International Journal of Human-Computer Studies, 67(8), 639–662.
Article Google Scholar
Van Rijsbergen, C. J. (1979). Information Retrieval. Butterworth & Co. http://www.dcs.glasgow.ac.uk/Keith/Preface.html. Accessed 11 Jan 2018.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

DSV-Stockholm University, Kista, Sweden
Hercules Dalianis

Authors

Hercules Dalianis
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dalianis, H. (2018). Computational Methods for Text Analysis and Text Classification. In: Clinical Text Mining. Springer, Cham. https://doi.org/10.1007/978-3-319-78503-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-78503-5_8
Published: 15 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78502-8
Online ISBN: 978-3-319-78503-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics