Abstract
We develop an unsupervised learning framework for extracting popular product attributes from different Web product description pages. Unlike existing systems which do not differentiate the popularity of the attributes, we propose a framework which is able not only to detect concerned popular features of a product from a collection of customer reviews, but also to map these popular features to the related product attributes, and at the same time to extract these attributes from description pages. To tackle the technical challenges, we develop a discriminative graphical model based on hidden Conditional Random Fields. We have conducted experiments on several product domains. The empirical results show that our framework is effective.
The work described in this paper is substantially supported by grants from the Research Grant Council of the Hong Kong Special Administrative Region, China (Project Code: CUHK413510) and the Direct Grant of the Faculty of Engineering, CUHK (Project Codes: 2050476 and 2050522). This work is also affiliated with the CUHK MoE-Microsoft Key Laboratory of Human-centric Computing and Interface Technologies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alfonseca, E., Pasca, M., Robledo-Arnuncio, E.: Acquisition of instance attributes via labeled and related instances. In: SIGIR, pp. 58–65 (2010)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)
Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: Block-based web search. In: SIGIR, pp. 456–463 (2004)
Ding, X., Liu, B., Zhang, L.: Entity discovery and assignment for opinion mining applications. In: KDD, pp. 1125–1134 (2009)
Ghani, R., Probst, K., Liu, Y., Krema, M., Fano, A.: Text mining for product attribute extraction. SIGKDD Explorations 8(1), 41–48 (2006)
Kobayashi, N., Inui, K., Matsumoto, Y., Tateishi, K., Fukushima, T.: Collecting evaluative expressions for opinion extraction. In: IJCNLP, pp. 584–589 (2004)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289 (2001)
Liu, B., Hu, M., Cheng, J.: Opinion observer: analyzing and comparing opinions on the web. In: WWW, pp. 342–351 (2005)
Luo, P., Lin, F., Xiong, Y., Zhao, Y., Shi, Z.: Towards combining web classification and web information extraction: a case study. In: KDD, pp. 1235–1244 (2009)
Popescu, A.M., Etzioni, O.: Extracting product features and opinions from reviews. In: HLT/EMNLP, pp. 339–346 (2005)
Probst, K., Ghai, R., Krema, M., Fano, A., Liu, Y.: Semi-supervised learning of attribute-value pairs from product descriptions. In: IJCAI, pp. 2838–2843 (2007)
Tang, H., Tan, S., Cheng, X.: A survey on sentiment detection of reviews. Expert Systems with Applications 36(7), 10760–10773 (2009)
Turney, P.D.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: ACL, pp. 417–424 (2002)
Yang, C., Cao, Y., Nie, Z., Zhou, J., Wen, J.-R.: Closing the loop in webpage understanding. TKDE 22(5), 639–650 (2010)
Zhang, L., Liu, B., Lim, S.H., O’Brien-Strain, E.: Extracting and ranking product features in opinion documents. In: Coling: Posters, pp. 1462–1470 (2010)
Zheng, S., Song, R., Wen, J.R., Giles, C.L.: Efficient record-level wrapper induction. In: CIKM, pp. 47–56 (2009)
Zhu, J., Nie, Z., Zhang, B., Wen, J.-R.: Dynamic hierarchical markov random fields for integrated web data extraction. JMLR 9, 1583–1614 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bing, L., Wong, TL., Lam, W. (2012). Unsupervised Extraction of Popular Product Attributes from Web Sites. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-35341-3_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35340-6
Online ISBN: 978-3-642-35341-3
eBook Packages: Computer ScienceComputer Science (R0)