Abstract
Classification of short text (SMS, reviews, feedback, etc.) presents a unique set of challenges compared to classic text classification. Short texts are characterized by cryptic constructions, poor spelling, improper grammar, etc. that makes the application of traditional methods difficult. Proper classification enables us to use this information for further action. We study this problem in the context of online auctions. The paper presents a score assigning approach which outperforms traditional methods (e.g. Naïve Bayes) in terms of accuracy.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Borko, H., Bernick, M.: Automatic document classification. J. ACM (JACM) 10(2), 151–162 (1963)
Callan, J., Connell, M., Du, A.: Automatic discovery of language models for text databases. Paper Presented at the ACM SIGMOD Record (1999)
Cavnar, W.: Using an N-gram-based document representation with a vector processing retrieval model, pp. 269–269. NIST Special Publication SP (1995)
Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. Ann Arbor MI 48113(2), 161–175 (1994)
Chen, J., Huang, H., Tian, S., Qu, Y.: Feature selection for text classification with Naïve Bayes. Expert Syst. Appl. 36(3), 5432–5435 (2009)
Chuang, S.-L., Chien, L.-F.: Enriching web taxonomies through subject categorization of query terms from search engine logs. Decis. Support Syst. 35(1), 113–127 (2003)
Cormack, G.V., Lynam, T.R.: Online supervised spam filter evaluation. ACM Trans. Inf. Syst. (TOIS) 25(3), 11 (2007)
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
Jain, A., Zongker, D.: Feature selection: evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 153–158 (1997)
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. Paper Presented at the Proceedings of the Eleventh International Conference Machine Learning (1994)
Kohavi, R.A.: Study of cross-validation and bootstrap for accuracy estimation and model selection. Paper Presented at the Ijcai (1995)
Koller, D., Sahami, M.: Toward optimal feature selection (1996)
Larkey, L.S., Croft, W.B.: Automatic assignment of ICD9 codes to discharge summaries. University of Massachusetts (1995)
Li, L., Qu, S.: Short text classification based on improved ITC. J. Comput. Commun. 1, 22–27 (2013)
Losiewicz, P., Oard, D.W., Kostoff, R.N.: Textual data mining to support science and technology management. J. Intell. Inf. Syst. 15(2), 99–119 (2000)
Moraes, R., Valiati, J.F., Neto, W.P.G.: Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst. Appl. 40(2), 621–633 (2013)
Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. Paper Presented at the Proceedings of the 17th International Conference on World Wide Web (2008)
Rogati, M., Yang, Y.: High-performing feature selection for text classification. Paper Presented at the Proceedings of the Eleventh International Conference on Information and Knowledge Management (2002)
Song, G., Li, Y., Li, C., Chen, J., Ye, Y.: Mining textual stream with partial labeled instances using ensemble framework. Int. J. Database Theory Appl. 7(4), 47–58 (2014)
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. Paper presented at the Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2010)
Sun, A.: Short text classification using very few words. Paper Presented at the Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (2012)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. Paper Presented at the ICML (1997)
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. Paper Presented at the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Youn, S., McLeod, D.: A comparative study for email classification. In: Elleithy, K. (ed.) Advances and Innovations in Systems, Computing Sciences and Software Engineering, pp. 387–391. Springer, Dordrecht (2007)
Zhang, D., Lee, W.S.: Question classification using support vector machines. Paper Presented at the Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (2003)
Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Li, Y., Srinivasan, A., Tripathi, A. (2017). Short Text Classification of Buyer-Initiated Questions in Online Auctions: A Score Assigning Method. In: Johansson, B., Møller, C., Chaudhuri, A., Sudzina, F. (eds) Perspectives in Business Informatics Research. BIR 2017. Lecture Notes in Business Information Processing, vol 295. Springer, Cham. https://doi.org/10.1007/978-3-319-64930-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-64930-6_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64929-0
Online ISBN: 978-3-319-64930-6
eBook Packages: Business and ManagementBusiness and Management (R0)