Abstract
In this paper, we assessed the effectiveness of different types of features for the identification of tweets on Twitter that mention books among tweets that contain the same strings as full book titles. In the previous work, the bag-of-words based features were taken from the context of individual tweets. While performance was reasonable, we identified room for improvement in terms of the extraction of features. We proposed additional types of features such as words appearing in the profiles of tweet authors, POS tags of mentioned book titles, and bibliographic elements within tweets, e.g. authors and publishers. We conducted a grid search for all combinations of the above feature sets, and observed performance improvements suitable for practical applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The number of local bookstores is rapidly decreasing in Japan. In 1999, there were 22,296 bookstores in Japan, and the number had fallen to 13,488 by 2015.
- 2.
This is the only verb among the five keywords; all the others are nouns.
- 3.
We utilised the readability algorithm of arc90.
- 4.
Taking into account the fact that the number of TMBs is not so great, recall is important. However, the lack of precision greatly hampers the mission of the system.
References
Prasetyo, P.K., Lo, D., Achananuparp, P., Tian, Y., Lim, E.P.: Automatic Classification of Software Related Microblogs. In: 28th International Conference on Software Maintenance, pp. 596–599. IEEE (2012)
Theodotou, A., Stassopoulou, A.: A system for automatic classification of twitter messages into categories. In: Christiansen, H., Stojanovic, I., Papadopoulos, G.A. (eds.) CONTEXT 2015. LNCS (LNAI), vol. 9405, pp. 532–537. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25591-0_44
Tuarob, S., Tucker, C.S., Salathe, M., Ram, N.: An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages. J. Biomed. Inf. 49, 255–268 (2014)
Yada, S.: Development of a book recommendation system to inspire “Infrequent Readers”. In: Tuamsuk, K., Jatowt, A., Rasmussen, E. (eds.) ICADL 2014. LNCS, vol. 8839, pp. 399–404. Springer, Heidelberg (2014). doi:10.1007/978-3-319-12823-8_43
Yada, S., Kageura, K.: Identification of Tweets that Mention Books: an experimental comparison of machine learning methods. In: Allen, R.B., Hunter, J., Zeng, M.L. (eds.) ICADL 2015. LNCS, vol. 9469, pp. 278–288. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27974-9_30
Acknowledgement
This work was supported by JSPS KAKENHI Grant Number JP 16K12542.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Yada, S., Kageura, K. (2016). Improved Identification of Tweets that Mention Books: Selection of Effective Features. In: Morishima, A., Rauber, A., Liew, C. (eds) Digital Libraries: Knowledge, Information, and Data in an Open Access Society. ICADL 2016. Lecture Notes in Computer Science(), vol 10075. Springer, Cham. https://doi.org/10.1007/978-3-319-49304-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-49304-6_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49303-9
Online ISBN: 978-3-319-49304-6
eBook Packages: Computer ScienceComputer Science (R0)