Skip to main content

Active Learning-Based Approach for Named Entity Recognition on Short Text Streams

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 506))

Abstract

The named entity recognition (NER) problem has an important role in many natural language processing (NLP) applications and is one of the fundamental tasks for building NLP systems. Supervised learning methods can achieve high performance but they require a large amount of training data that is time-consuming and expensive to obtain. Active learning (AL) is well-suited to many problems in NLP, where unlabeled data may be abundant but labeled data is limited. The AL method aims to minimize annotation costs while maximizing the desired performance from the model. This study proposes a method to classify named entities from Tweet streams on Twitter by using an AL method with different query strategies. The samples were queried for labeling by human annotators based on query by committee and diversity-based querying. The experiments evaluated the proposed method on Tweet data and achieved promising results that proved better than the baseline.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://nlp.stanford.edu/software/CRF-NER.shtml.

  2. 2.

    https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html.

  3. 3.

    http://nlp.stanford.edu/software/.

  4. 4.

    http://twitter4j.org.

References

  1. Abdallah, S., Shaalan, K., Shoaib, M.: Integrating rule-based system with classification for arabic named entity recognition. In: Computational Linguistics and Intelligent Text Processing, pp. 311–322. Springer (2012)

    Google Scholar 

  2. Cano Basave, A.E., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S.: Making sense of microposts (#msm2013) concept extraction challenge (2013)

    Google Scholar 

  3. Chen, H.H., Ding, Y.W., Tsai, S.C.: Named entity extraction for information retrieval. Comput. Process. Orient. Lang. 12(1), 75–85 (1998)

    Google Scholar 

  4. Chen, Y., Lasko, T.A., Mei, Q., Denny, J.C., Xu, H.: A study of active learning methods for named entity recognition in clinical text. J. Biomed. Inf. 58, 11–18 (2015)

    Article  Google Scholar 

  5. Giao, B.C., Anh, D.T.: Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization. Vietnam J. Comput. Sci. pp. 1–16 (2016)

    Google Scholar 

  6. Hassanzadeh, H., Keyvanpour, M.: A variance based active learning approach for named entity recognition. In: Intelligent Computing and Information Science, pp. 347–352. Springer (2011)

    Google Scholar 

  7. Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Lee, B.S.: Twiner: named entity recognition in targeted twitter stream. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 721–730. ACM (2012)

    Google Scholar 

  8. Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1. pp. 359–367. Association for Computational Linguistics (2011)

    Google Scholar 

  9. Meyer, C., Schramm, H.: Boosting hmm acoustic models in large vocabulary speech recognition. Speech Commun. 48(5), 532–548 (2006)

    Article  Google Scholar 

  10. Nobata, C., Sekine, S., Isahara, H., Grishman, R.: Summarization system integrated with named entity tagging and ie pattern discovery. In: Proceedings of Third International Conference on Language Resources and Evaluation, pp. 1742–1745 (2002)

    Google Scholar 

  11. Olsson, F.: A literature survey of active machine learning in the context of natural language processing (2009)

    Google Scholar 

  12. Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Association for Computational Linguistics (2011)

    Google Scholar 

  13. Settles, B.: Active learning literature survey. Univ. Wis. Madison 52(55–66), 11 (2010)

    Google Scholar 

  14. Stahl, F., Schomm, F., Vossen, G., Vomfell, L.: A classification framework for data marketplaces. Vietnam J. Comput. Sci. pp. 1–7 (2016)

    Google Scholar 

  15. Tran, T., Nguyen, D.T.: Algorithm of computing verbal relationships for generating vietnamese paragraph of summarization from the logical expression of discourse representation structure. Vietnam J. Comput. Sci. pp. 1–12 (2015)

    Google Scholar 

  16. Tran, V.C., Hwang, D., Jung, J.J.: Semi-supervised approach based on co-occurrence coefficient for named entity recognition on twitter. In: 2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS), pp. 141–146. IEEE (2015)

    Google Scholar 

  17. Yao, L., Sun, C., Wang, X., Wang, X.: Combining self learning and active learning for chinese named entity recognition. J. Softw. 5(5), 530–537 (2010)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the BK21+ program of the National Research Foundation (NRF) of Korea.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dosam Hwang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this paper

Cite this paper

Van Tran, C., Nguyen, T.T., Hoang, D.T., Hwang, D., Nguyen, N.T. (2017). Active Learning-Based Approach for Named Entity Recognition on Short Text Streams. In: Zgrzywa, A., Choroś, K., Siemiński, A. (eds) Multimedia and Network Information Systems. Advances in Intelligent Systems and Computing, vol 506. Springer, Cham. https://doi.org/10.1007/978-3-319-43982-2_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43982-2_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43981-5

  • Online ISBN: 978-3-319-43982-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics