Improved Identification of Tweets that Mention Books: Selection of Effective Features

Yada, Shuntaro; Kageura, Kyo

doi:10.1007/978-3-319-49304-6_19

Shuntaro Yada¹⁶ &
Kyo Kageura¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10075))

Included in the following conference series:

International Conference on Asian Digital Libraries

2270 Accesses

Abstract

In this paper, we assessed the effectiveness of different types of features for the identification of tweets on Twitter that mention books among tweets that contain the same strings as full book titles. In the previous work, the bag-of-words based features were taken from the context of individual tweets. While performance was reasonable, we identified room for improvement in terms of the extraction of features. We proposed additional types of features such as words appearing in the profiles of tweet authors, POS tags of mentioned book titles, and bibliographic elements within tweets, e.g. authors and publishers. We conducted a grid search for all combinations of the above feature sets, and observed performance improvements suitable for practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The number of local bookstores is rapidly decreasing in Japan. In 1999, there were 22,296 bookstores in Japan, and the number had fallen to 13,488 by 2015.
2.
This is the only verb among the five keywords; all the others are nouns.
3.
We utilised the readability algorithm of arc90.
4.
Taking into account the fact that the number of TMBs is not so great, recall is important. However, the lack of precision greatly hampers the mission of the system.

References

Prasetyo, P.K., Lo, D., Achananuparp, P., Tian, Y., Lim, E.P.: Automatic Classification of Software Related Microblogs. In: 28th International Conference on Software Maintenance, pp. 596–599. IEEE (2012)
Google Scholar
Theodotou, A., Stassopoulou, A.: A system for automatic classification of twitter messages into categories. In: Christiansen, H., Stojanovic, I., Papadopoulos, G.A. (eds.) CONTEXT 2015. LNCS (LNAI), vol. 9405, pp. 532–537. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25591-0_44
Chapter Google Scholar
Tuarob, S., Tucker, C.S., Salathe, M., Ram, N.: An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. J. Biomed. Inf. 49, 255–268 (2014)
Article Google Scholar
Yada, S.: Development of a book recommendation system to inspire “Infrequent Readers”. In: Tuamsuk, K., Jatowt, A., Rasmussen, E. (eds.) ICADL 2014. LNCS, vol. 8839, pp. 399–404. Springer, Heidelberg (2014). doi:10.1007/978-3-319-12823-8_43
Google Scholar
Yada, S., Kageura, K.: Identification of Tweets that Mention Books: an experimental comparison of machine learning methods. In: Allen, R.B., Hunter, J., Zeng, M.L. (eds.) ICADL 2015. LNCS, vol. 9469, pp. 278–288. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27974-9_30
Chapter Google Scholar

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Grant Number JP 16K12542.

Author information

Authors and Affiliations

Graduate School of Education, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
Shuntaro Yada & Kyo Kageura

Authors

Shuntaro Yada
View author publications
You can also search for this author in PubMed Google Scholar
Kyo Kageura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuntaro Yada .

Editor information

Editors and Affiliations

University of Tsukuba, Tsukuba, Japan
Atsuyuki Morishima
Vienna University of Technology, Vienna, Austria
Andreas Rauber
Victoria University of Wellington, Wellington, New Zealand
Chern Li Liew

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yada, S., Kageura, K. (2016). Improved Identification of Tweets that Mention Books: Selection of Effective Features. In: Morishima, A., Rauber, A., Liew, C. (eds) Digital Libraries: Knowledge, Information, and Data in an Open Access Society. ICADL 2016. Lecture Notes in Computer Science(), vol 10075. Springer, Cham. https://doi.org/10.1007/978-3-319-49304-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-49304-6_19
Published: 15 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49303-9
Online ISBN: 978-3-319-49304-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics