Extracting Novel Features for E-Commerce Page Quality Classification

Wang, Jing; Lin, Lanfen; Wang, Feng; Yu, Penghua; Liu, Jiaolong; Zhu, Xiaowei

doi:10.1007/978-3-642-53914-5_41

Jing Wang²⁵,
Lanfen Lin²⁵,
Feng Wang²⁵,
Penghua Yu²⁵,
Jiaolong Liu²⁵ &
…
Xiaowei Zhu²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8346))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2376 Accesses

Abstract

There’re a huge amount of web pages describing the same product on e-commerce websites, while their quality varies greatly. Therefore, there is a growing need for automated, accurate and efficient quality classification methods. Several link-based, click-based and content-based approaches have been proposed to evaluate the quality of pages for general search engines. However, these methods only consider the surface features of the html documents. What’s more, features like link relations have drawbacks when dealing with e-commerce pages, because the hypothesis that links mean endorsements is not always right in the environment of e-commerce. In this paper, we propose two kinds of features that can directly indicate the quality of content. We analyze pages’ content structure with a corpus of labeled texts, and evaluate the property completeness with the help of ontology. Then we combine these features with other commonly used features in literature. We apply several learning methods to train and classify pages into good and bad ones. Experiments on real e-commerce pages show that the proposed novel features can greatly improve the accuracy of classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Computing Surveys (CSUR) 41, 1–52 (2009)
Article Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)
Article Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46, 604–632 (1999)
Article MATH MathSciNet Google Scholar
Lempel, R., Moran, S.: SALSA: the stochastic approach for link-structure analysis. ACM Trans. Inf. Syst. 19, 131–160 (2001)
Article Google Scholar
Liu, Y., Gao, B., Liu, T., Zhang, Y., Ma, Z., He, S., Li, H.: BrowseRank: letting web users vote for page importance. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 451–458. ACM, Singapore (2008)
Google Scholar
Richardson, M., Prakash, A., Brill, E.: Beyond PageRank: machine learning for static ranking. In: Proceedings of the 15th International Conference on World Wide Web, pp. 707–715. ACM, Edinburgh (2006)
Chapter Google Scholar
Zhu, X., Gauch, S.: Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 288–295. ACM, Athens (2000)
Google Scholar
Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of the 15th International Conference on World Wide Web, pp. 83–92. ACM, Edinburgh (2006)
Chapter Google Scholar
Bendersky, M., Croft, W.B., Diao, Y.: Quality-biased ranking of web documents. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 95–104. ACM, Hong Kong (2011)
Chapter Google Scholar
Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 183–194. ACM, Palo Alto (2008)
Google Scholar
Wu, O., Chen, Y., Li, B., Hu, W.: Learning to evaluate the visual quality of web pages. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1205–1206. ACM, Raleigh (2010)
Chapter Google Scholar
Wu, O., Chen, Y., Li, B., Hu, W.: Evaluating the visual quality of web pages using a computational aesthetic approach. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 337–346. ACM, Hong Kong (2011)
Chapter Google Scholar
Pun, J.C.C., Lochovsky, F.H.: Ranking search results by web quality dimensions. Journal of Web Engineering 3, 216–235 (2004)
Google Scholar
Mandl, T.: Implementation and evaluation of a quality-based search engine. In: Proceedings of the Seventeenth Conference on Hypertext and Hypermedia, pp. 73–84. ACM, Odense (2006)
Chapter Google Scholar
Cai, D., Yu, S., Wen, J., Ma, W.: VIPS: a vision-based page segmentation algorithm. Microsoft Technical Report, MSR-TR-2003-79 (2003)
Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002) (2002)
Google Scholar
Cen, R., Liu, Y., Zhang, M., Ru, L., Ma, S.: Web page quality estimation based on linear discriminant function. Journal of Computational Information Systems 3, 1117–1126 (2007)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Article Google Scholar
Guyon, I., Andr, E.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Hall, M.A.: Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc. (2000)
Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science, Zhejiang University, Hangzhou, China
Jing Wang, Lanfen Lin, Feng Wang, Penghua Yu, Jiaolong Liu & Xiaowei Zhu

Authors

Jing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lanfen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Feng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Penghua Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jiaolong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US Air Force Office of Scientific Research, 106-0032, Tokyo, Japan
Hiroshi Motoda
School of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
Zhaohui Wu
Faculty of Engineering and Information Technology, University of Technology, Chippendale, 2008, Sydney, NSW, Australia
Longbing Cao
Department of Computing Science, University of Alberta, T6G 2E8, Edmonton, Canada
Osmar Zaiane
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Min Yao
School of Computer Science, Fudan University, 200433, Shanghai, China
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Lin, L., Wang, F., Yu, P., Liu, J., Zhu, X. (2013). Extracting Novel Features for E-Commerce Page Quality Classification. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53914-5_41

Download citation

DOI: https://doi.org/10.1007/978-3-642-53914-5_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53913-8
Online ISBN: 978-3-642-53914-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics