Abstract
Electronic medical discharge summaries provide a wealth of information. Extracting useful structured information from such unstructured text is challenging. However, supervised machine learning (ML) algorithms can achieve good performance in extracting useful relations between different entities. To use supervised ML techniques, huge annotated datasets are required. Annotating manually is very expensive and time taking due to the requirement of domain experts for annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. Active learning leverages the advantage of training the classifier with a limited number of samples but achieving maximum performance. This strategy not only saves time but also decreases the annotation cost involved. Active learning works well with datasets where annotation cost is high, and training a decent classifier with the available annotated dataset is a requirement. The key factor for an active learning model’s success is its selection of samples that needs annotation. The more informative the samples are, the less time it takes to train the supervised model with high accuracy. Thus, the query strategy in sample selection plays a vital role in the AL process. In this study, we aim to develop a novel query strategy to select the most informative samples from the dataset that can eventually accelerate the supervised model’s performance. The query strategy is designed using deep reinforcement learning techniques like actor-critic. The performance of the sample selection strategy is determined by finding the accuracy of the model after a predefined number of iterations.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yadav, R., Gupta, D.: Annotation guidelines for hindi-english word alignment. In: Proceedings of the International Conference on Asian Language Processing, pp. 293–296 (2010)
Sanagar, S., Gupta, D.: Roadmap for polarity lexicon learning and resources: a survey. In: International Symposium on Intelligent Systems Technologies and Applications, pp. 647–663 (2016)
Dligach, S., Palmer, M.: Good seed makes a good crop: accelerating active learning using language modeling. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: short papers, pp. 6–10 (2011)
Chairi, I., Alaoui, S., Lyhyaouier, A.: Sample selection based active learning for imbalanced data. In: Tenth International Conference on Signal-Image Technology & Internet-Based Systems (2014)
Vu, V.-V., Labroche, N.: Active seed selection for constrained clustering. In: Intelligent Data Analysis. IOS Press, pp. 537–552 (2017)
Xu, Y., Hong, K., Tsujii, J., Chang, E.I.-C.: Feature engineering combined with machine learning and rule-based methods for structured information extraction from narrative clinical discharge summaries. J. Am. Med. Inf. Assoc. JAMIA 195, 824–832 (2012)
Siddhant, A., Lipton, Z.: Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study. ArXIV (2019)
Fang, M., Li, Y., Cohn, T.: Learning how to Active Learn: A Deep Reinforcement Learning Approach. ArXIV (2017)
Narasimhan, K., Yala, A., Barzilay, R.: Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning. ArXIV (2016)
Chalapathy, R., Borzeshi, E.Z., Piccardi, M.: Bidirectional LSTM-CRF for Clinical Concept Extraction. ArXIV (2016)
Zhu, H., Paschalidis, I.C., Tahmasebi, A.: Clinical Concept Extraction with Contextual Word Embedding. ArXIV (2018)
Unanue, I.J., Borzeshi, E.Z., Piccardi, M.: Recurrent Neural Networks with Specialized Word Embeddings for Health-Domain Named-Entity Recognition. ArXIV (2018)
Ling, Y., Hasan, S.A., Datla, V., Qadir, A., Lee, K., Liu, J., Farri, O.: Learning to diagnose: assimilating clinical narratives using deep reinforcement learning. In: Proceedings of the 8th International Joint Conference on Natural Language Processing, pp. 895–905 (2017)
Millan, C., Fernandes, B., Cruz, F.: Human feedback in continuous actor-critic reinforcement learning. In: Proceedings of the 27th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 661–666 (2019)
Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: 32nd International Conference on Machine Learning, vol. 2, pp. 957–966 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tandra, S., Nautiyal, A., Gupta, D. (2020). An Efficient Text Labeling Framework Using Active Learning Model. In: Thampi, S., et al. Intelligent Systems, Technologies and Applications. Advances in Intelligent Systems and Computing, vol 1148. Springer, Singapore. https://doi.org/10.1007/978-981-15-3914-5_11
Download citation
DOI: https://doi.org/10.1007/978-981-15-3914-5_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3913-8
Online ISBN: 978-981-15-3914-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)