Skip to main content

Information Filtering Method for Twitter Streaming Data Using Human-in-the-Loop Machine Learning

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11030))

Included in the following conference series:

Abstract

There are a massive amount of texts on social media. However, only a small portion of these texts is informative for a specific purpose. If we accurately filter the texts in the streams, we can obtain useful information in real time. In a keyword-based approach, filters are constructed using keywords, but selecting the appropriate keywords to include is often difficult. In this work, we propose a method for filtering texts that are related to specific topics using both crowdsourcing and machine learning based text classification method. In our approach, we construct a text classifier using FastText and then annotate whether the tweets are related to the topics using crowdsourcing. In this step, we consider two strategies, optimistic and pessimistic approach, for selecting tweets which should be assessed. Then, we reconstruct the text classifier using the annotated texts and classify them again. We assume that if we continue instigating this loop, the accuracy of the classifier will improve, and we will obtain useful information without having to specify keywords. Experimental results demonstrated that our proposed system is effective for filtering social media streams. Moreover, we confirmed that the pessimistic approach is better than the optimistic approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://taku910.github.io/mecab/.

  2. 2.

    https://github.com/neologd/mecab-ipadic-neologd.

  3. 3.

    https://qiita.com/Hironsan/items/513b9f93752ecee9e670.

  4. 4.

    https://dumps.wikimedia.org/jawiki/20170101/.

References

  1. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 2670–2676, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. (2007)

    Google Scholar 

  2. Belkin, N.J., Croft, W.B.: Information filtering and information retrieval: two sides of the same coin? Commun. ACM 35(12), 29–38 (1992)

    Article  Google Scholar 

  3. Abel, F., Hauff, C., Houben, G.J., Stronkman, R., Tao, K.: Twitcident: fighting fire with information from social web streams. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012 Companion, pp. 305–308, New York. ACM (2012)

    Google Scholar 

  4. Shardanand, U., Maes, P.: Social information filtering: algorithms for automating“word of mouth”. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 1995, pp. 210–217, New York. ACM Press/Addison-Wesley Publishing Co. (1995)

    Google Scholar 

  5. Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2010, pp. 841–842, New York. ACM (2010)

    Google Scholar 

  6. Hannon, J., Bennett, M., Smyth, B.: Recommending twitter users to follow using content and collaborative filtering approaches. In: Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys 2010, pp. 199–206, New York. ACM (2010)

    Google Scholar 

  7. Uysal, I., Croft, W.B.: User oriented tweet ranking: a filtering approach to microblogs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 2261–2264, New York. ACM (2011)

    Google Scholar 

  8. Rocchio, J.J.: Relevance feedback in information retrieval. In: Salton, G. (ed.) The Smart Retrieval System: Experiments in Automatic Document Processing, pp. 313–323. Prentice Hall (1971)

    Google Scholar 

  9. Grady, C., Lease, M.: Crowdsourcing document relevance assessment with mechanical turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, CSLDAMT 2010, pp. 172–179, Stroudsburg, PA, USA. Association for Computational Linguistics (2010)

    Google Scholar 

  10. Alonso, O., Baeza-Yates, R.: Design and implementation of relevance assessments using crowdsourcing. In: Clough, P., et al. (eds.) Advances in Information Retrieval, pp. 153–164. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_16

    Chapter  Google Scholar 

  11. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Google Scholar 

  12. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427–431. Association for Computational Linguistics (2017)

    Google Scholar 

  13. Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., Mikolov, T.: Fasttext.zip: compressing text classification models, December 2016

    Google Scholar 

  14. Campbell, C., Cristianini, N., Smola, A.J.: Query learning with large margin classifiers. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 111–118, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. (2000)

    Google Scholar 

  15. Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR 1994, pp. 3–12. Springer, New York (1994). https://doi.org/10.1007/978-1-4471-2099-5_1

    Chapter  Google Scholar 

  16. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2002)

    MATH  Google Scholar 

Download references

Acknowledgments

The research results have been achieved by “Research and Development on Fundamental and Utilization Technologies for Social Big Data,” the Commissioned Research of National Institute of Information and Communications Technology (NICT), JAPAN.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Suzuki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Suzuki, Y., Nakamura, S. (2018). Information Filtering Method for Twitter Streaming Data Using Human-in-the-Loop Machine Learning. In: Hartmann, S., Ma, H., Hameurlain, A., Pernul, G., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2018. Lecture Notes in Computer Science(), vol 11030. Springer, Cham. https://doi.org/10.1007/978-3-319-98812-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98812-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98811-5

  • Online ISBN: 978-3-319-98812-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics