Abstract
The named entity recognition (NER) problem has an important role in many natural language processing (NLP) applications and is one of the fundamental tasks for building NLP systems. Supervised learning methods can achieve high performance but they require a large amount of training data that is time-consuming and expensive to obtain. Active learning (AL) is well-suited to many problems in NLP, where unlabeled data may be abundant but labeled data is limited. The AL method aims to minimize annotation costs while maximizing the desired performance from the model. This study proposes a method to classify named entities from Tweet streams on Twitter by using an AL method with different query strategies. The samples were queried for labeling by human annotators based on query by committee and diversity-based querying. The experiments evaluated the proposed method on Tweet data and achieved promising results that proved better than the baseline.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abdallah, S., Shaalan, K., Shoaib, M.: Integrating rule-based system with classification for arabic named entity recognition. In: Computational Linguistics and Intelligent Text Processing, pp. 311–322. Springer (2012)
Cano Basave, A.E., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S.: Making sense of microposts (#msm2013) concept extraction challenge (2013)
Chen, H.H., Ding, Y.W., Tsai, S.C.: Named entity extraction for information retrieval. Comput. Process. Orient. Lang. 12(1), 75–85 (1998)
Chen, Y., Lasko, T.A., Mei, Q., Denny, J.C., Xu, H.: A study of active learning methods for named entity recognition in clinical text. J. Biomed. Inf. 58, 11–18 (2015)
Giao, B.C., Anh, D.T.: Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization. Vietnam J. Comput. Sci. pp. 1–16 (2016)
Hassanzadeh, H., Keyvanpour, M.: A variance based active learning approach for named entity recognition. In: Intelligent Computing and Information Science, pp. 347–352. Springer (2011)
Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Lee, B.S.: Twiner: named entity recognition in targeted twitter stream. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 721–730. ACM (2012)
Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1. pp. 359–367. Association for Computational Linguistics (2011)
Meyer, C., Schramm, H.: Boosting hmm acoustic models in large vocabulary speech recognition. Speech Commun. 48(5), 532–548 (2006)
Nobata, C., Sekine, S., Isahara, H., Grishman, R.: Summarization system integrated with named entity tagging and ie pattern discovery. In: Proceedings of Third International Conference on Language Resources and Evaluation, pp. 1742–1745 (2002)
Olsson, F.: A literature survey of active machine learning in the context of natural language processing (2009)
Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Association for Computational Linguistics (2011)
Settles, B.: Active learning literature survey. Univ. Wis. Madison 52(55–66), 11 (2010)
Stahl, F., Schomm, F., Vossen, G., Vomfell, L.: A classification framework for data marketplaces. Vietnam J. Comput. Sci. pp. 1–7 (2016)
Tran, T., Nguyen, D.T.: Algorithm of computing verbal relationships for generating vietnamese paragraph of summarization from the logical expression of discourse representation structure. Vietnam J. Comput. Sci. pp. 1–12 (2015)
Tran, V.C., Hwang, D., Jung, J.J.: Semi-supervised approach based on co-occurrence coefficient for named entity recognition on twitter. In: 2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS), pp. 141–146. IEEE (2015)
Yao, L., Sun, C., Wang, X., Wang, X.: Combining self learning and active learning for chinese named entity recognition. J. Softw. 5(5), 530–537 (2010)
Acknowledgments
This work was supported by the BK21+ program of the National Research Foundation (NRF) of Korea.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this paper
Cite this paper
Van Tran, C., Nguyen, T.T., Hoang, D.T., Hwang, D., Nguyen, N.T. (2017). Active Learning-Based Approach for Named Entity Recognition on Short Text Streams. In: Zgrzywa, A., Choroś, K., Siemiński, A. (eds) Multimedia and Network Information Systems. Advances in Intelligent Systems and Computing, vol 506. Springer, Cham. https://doi.org/10.1007/978-3-319-43982-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-43982-2_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43981-5
Online ISBN: 978-3-319-43982-2
eBook Packages: EngineeringEngineering (R0)