Abstract
Huge collections of collaboratively constructed content (e.g. blogs, consumer reviews, etc.) are now available online. This content has become a valuable knowledge repository, which enables users to seek quality information. However, such content is often unorganized, leading to difficulty in information navigation and knowledge acquisition. This chapter focuses on discovering the structure of the content and organizing them accordingly, so as to facilitate users in understanding the knowledge inherent within the content. In particular, we employ one example of the collaboratively constructed content, i.e. consumer reviews on products, as a case study, and propose a domain-assisted approach to generate a hierarchical structure to organize the reviews. The hierarchy organizes product aspects as nodes following their parent-child relations. For each aspect, the reviews and corresponding opinions on this aspect are stored. Such hierarchy provides a well-visualized way to browse consumer reviews at different granularity to meet various users’ needs, which can help to improve information dissemination and accessibility. We further apply the generated hierarchy to support the application of opinion Question Answering (opinion-QA) for products, which aims to generate appropriate answers for opinion questions about products. The experimental results on 11 popular products in 4 domains demonstrate the effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
Using stanford POS tagger, http://nlp.stanford.edu/software/tagger.shtml
- 12.
- 13.
- 14.
Empirically set to 10 in the experiment.
- 15.
- 16.
It represents any pair of words in their sentence order, allowing at most two gaps in between.
References
Adler B-T, Chatterjee K, Alfaro L, Faella M, Pye I, Raman V (2008) Assigning trust to wikipedia content. In: Proceedings of the 4th international symposium on Wikis Article (WikiSym), Porto, Portugal. Article No. 26
Agichtein E, Castillo C, Donato D (2008) Finding high-quality content in social media. In: Proceedings of the international conference on web search and web data mining (WSDM), Palo Alto, California, USA, pp 183–194
Balahur A, Boldrini E, Ferrandez O, Montoyo A, Palomar M, Munoz R (2008) The DLSIUAES team’s participation in the TAC 2008 tracks. In: Proceedings of the text analysis conference (TAC), Chicago, IL, USA
Beckham J (2005) The Cnet E-commerce data set. In: Technical University of Wisconsin
Berkhin P (2002) Survey of clustering data mining techniques. In: Accrue software, San Jose
Carenini G, Ng R, Zwart E (2006) Multi-document summarization of evaluative text. In: Proceedings of the 44st annual meeting of the association for computational linguistics on computational linguistics (ACL), Sydney, Australia, pp 3–7
Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3), Article No. 27
Cnet Content Solutions (2008) http://cnetcontentsolutions.com/news/press_release_2008_11_06.aspx
Cimiano P (2006) Ontology learning and population from text: algorithms, evaluation and applications. Springer, Secaucus
Crammer K, Dekel O, Keshet J, Shwartz S-S, Singer Y (2006) Online passive aggressive algorithms. J Mach Learn Res 7:551–585
Davidov D, Gabrilovich E, Markovitch S (2004) Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In: Proceedings of 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR), Sheffield, UK, pp 250–257
Deng J, Dong W, Socher R, Li L-J, Li K, Li F-F (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), Miami, Florida, USA, pp 248–255
Deshpande P, Barzilay R, Karger D-R (2007) Randomized decoding for selection-and-ordering problems. In: Proceedings of the conference of the North American chapter of the association for computational linguistics (NAACL), Rochester, New York, USA, pp 444–451
Ding X, Liu B, Yu P-S (2008) A holistic lexicon-based approach to opinion mining. In: Proceedings of first ACM international conference on web search and data mining (WSDM), Palo Alto, California, USA, pp 231–240
Elsas J, Dumais S-T (2010) Leveraging temporal dynamics of document content in relevance ranking. In: Proceedings of the 3rd ACM international conference on web search and data mining (WSDM), New York, NY, USA, pp 1–10
Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22(1):457–479
Etzioni O, Cafarella M, Downey D, Popescu A, Shaked T, Soderland S, Weld D, Yates A (2005) Unsupervised named-entity extraction from the web: an experimental study. J Artif Intell 165(1):91–134
Girju R, Badulescu A (2006) Automatic discovery of part-whole relations. J Comput Linguist 32(1):83–135
He J, Dai D (2011) Summarization of yes/no questions using a feature function model. J Mach Learn Res 20:351–366
Hearst M-A (1992) Automatic acquisition of hyponyms from large text Corpora. In: Proceedings of the 14th international conference on computational linguistics (COLING), Nantes, France, pp 539–545
Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, USA, pp 168–177
Jiang P, Fu H, Zhang C, Niu Z (2010) A framework for opinion question answering. In: Advanced information management and service (IMS), Seoul, Korea, pp 424–427
Ku L-W, Liang Y-T, Chen H-H (2008) Question analysis and answer passage retrieval for opinion question answering systems. Int J Comput Linguist Chin Lang Process 13:307–326
Kullback S (1951) On information and sufficiency. Ann Math Stat 22(1):79–6
Lapata M (2003) Probabilistic text structuring: experiments with sentence ordering. In: Proceedings of the 41st annual meeting of the association for computational linguistics on computational linguistics (ACL), Sapporo, Japan, pp 545–552
Lapata M (2006) Automatic evaluation of information ordering: Kendallś Tau. J Comput Linguist 32(4):471–484
Li F, Tang Y, Huang M, Zhu X (2009) Answering opinion questions with random walks on graphs. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP (ACL/AFNLP), Singapore, pp 737–745
Li T, Zhang Y, Sindhwani V (2009) A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In: Proceedings of the 47th annual meeting of the association for computational linguistics on computational linguistics (ACL), Singapore, pp 244–252
Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of the 17th international conference on computational linguistics (COLING), Montreal, Quebec, Canada, pp 768–774
Lin C-Y, Hovy E (2003) Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language (HLT-NAACL), Edmonton, Canada, pp 71–78
Liu B (2009) Sentiment analysis and subjectivity. In: Handbook of natural language processing. Marcel Dekker, New York
Liu B, Hu M, Cheng J (2005) Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th international conference on world wide web (WWW), Chiba, Japan, pp 342–351
Liu Y, Bian J, Agichtein E (2008) Predicting information seeker satisfaction in community question answering. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR), Singapore, pp 483–490
Liu Y, Huang X, An A, Yu X (2008) Modeling and predicting the helpfulness of online reviews. In: Proceedings of the 18th IEEE international conference on data mining (ICDM), Pisa, Italy, pp 443–452
Lloret E, Balahur A, Palomar M, Montoyo A (2011) Towards a unified approach for opinion question answering and summarization. In: Proceedings of the 49th annual meeting of the association for computational linguistics on computational linguistics (ACL), Portland, Oregon, USA, pp 168–174
Lu Y, Tsaparas P, Ntoulas A, Polanyi L (2010) Exploiting social context for review quality prediction. In: Proceedings of the 19th international world wide web conference (WWW), Raleigh, North Carolina, USA, pp 691–700
Lu Y, Duan H, Wang H, Zhai C-X (2010) Exploiting structured ontology to organize scattered online opinions. In: Proceedings of the 14th international conference on computational linguistics (COLING), Beijing, China, pp 734–742
Manevitz L-M, Yousef M (2002) One-class SVMs for document classification. J Mach Learn Res 2:139–154
Mei Q, Ling X, Wondra M, Su H, Zhai C-X (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of the 16th international conference on world wide web (WWW), Banff, Alberta, Canada, pp 171–180
Mizil C-D, Kossinets G, Kleinberg J, Lee L (2009) How opinions are received by online communities: a case study on Amazon.com helpfulness votes. In: Proceedings of the 18th international conference on world wide web (WWW), Madrid, Spain, pp 141–150
Moghaddam S, Ester M (2011) AQA: aspect-based opinion question answering. In: IEEE international conference on data mining, Vancouver, BC, Canada, pp 89–96
Murthy K, Faruquie T-A, Subramaniam LV, Prasad KH, Mohania M (2010) Automatically generating term-frequency-induced taxonomies. In: Proceedings of the 48th annual meeting of the association for computational linguistics on computational linguistics (ACL), Uppsala, Sweden, pp 126–131
Nishikawa H, Hasegawa T, Matsuo Y, Kikui G (2010) Optimizing informativeness and readability for sentiment summarization. In: Proceedings of the 48th annual meeting of the association for computational linguistics on computational linguistics (ACL), Uppsala, Sweden, pp 325–330
Ohana B, Tierney B (2009) Sentiment classification of reviews using SentiWordNet. In: Proceedings of the 9th IT&T conference, Dublin, Ireland
Ouyang Y, Li W, Lu Q (2009) An integrated multi-document summarization approach based on word hierarchical representation. In: Proceedings of the ACL-IJCNLP 2009 conference, Singapore, pp 113–116
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the conference on empirical methods on natural language processing (EMNLP), Philadelphia, USA, pp 79–86
Pantel P, Pennacchiotti M (2006) Espresso: leveraging generic patterns for automatically harvesting semantic relations. In: Proceedings of the 44th annual meeting of the association for computational linguistics on computational linguistics (ACL), Sydney, Australia, pp 113–120
Popescu A-M, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT/EMNLP), Vancouver, BC, Canada, pp 339–346
Ramakrishnan R, Tomkins A (2007) Toward a PeopleWeb. Computer 40(8):63–72
Santamaria C, Gonzalo J, Verdejo F (2003) Automatic association of Web directories with word senses. J Comput Linguist 29(3):485–502
Schrijver A (1998) Theory of linear and integer programming. Wiley, Chichester/New York
Scott D-W (1992) Multivariate density estimation: theory, practice, and visualization. Wiley, New York
Shi B, Chang K (2008) Generating a concept hierarchy for sentiment analysis. In: IEEE international conference on systems man and cybernetics, Singapore, pp 312–317
Silla C, Freitas A (2011) A survey of hierarchical classification across different application domains. J Data Min Knowl Disc 22(1–2):31–72
Snow R, Jurafsky D (2006) Semantic taxonomy induction from heterogenous evidence. In: Proceedings of the 44th annual meeting of the association for computational linguistics on computational linguistics (ACL), Sydney, Australia, pp 801–808
Somasundaran S, Wilson T, Wiebe J, Stoyanov V (2007) QA with attitude: exploiting opinion type analysis for improving question answering in online discussions and the News. In: Proceedings of the conference on weblogs and social (ICWSM), Boulder, Colorado, USA
Strube M, Ponzetto S-P (2006) WikiRelate! computing semantic relatedness using Wikipedia. In: Proceedings of the 21st national conference on artificial intelligence (AAAI), Boston, Massachusetts, USA, pp 1419–1424
Su Q, Xu X, Guo H, Wu X, Zhang X, Swen B, Su Z (2008) Hidden sentiment association in Chinese Web opinion mining. In: Proceedings of the 17th international conference on world wide web (WWW), Beijing, China, pp 959–968
Wang H, Lu Y, Zhai C-X (2010) Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA, pp 783–792
Wikipedia (2012) http://en.wikipedia.org/wiki/Wikipedia
Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT/EMNLP), Vancouver, BC, Canada, pp 347–354
Wong T-L, Lam W (2005) Hot item mining and summarization from multiple auction Web sites. In: Proceedings of the 2005 eighth IEEE international conference on data mining (ICDM), Washington, DC, USA, pp 797–800
Wu Y, Zhang Q, Huang X, Wu L (2009) Phrase dependency parsing for opinion mining. In: Proceedings of the 47th annual meeting of the association for computational linguistics on computational linguistics (ACL), Singapore, pp 1533–1541
Yang H (2011) Personalized concept hierarchy construction. Ph.D. thesis, Carnegie Mellon University
Yang H, Callan J (2009) A metric-based framework for automatic taxonomy induction. In: Proceedings of the 47th annual meeting of the association for computational linguistics on computational linguistics (ACL), Singapore, pp 271–279
Ye S, Chua T-S (2006) Learning object models from semi-structured web documents. IEEE Trans Knowl Data Eng 18(3):334–349
Yu H, Hatzivassiloglou V (2003) Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the conference on empirical methods on natural language processing (EMNLP), Sapporo, Japan, pp 129–136
Yu J, Zha Z-J, Wang M, Wang K, Chua T-S (2011) Domain-assisted product aspect hierarchy generation: towards hierarchical organization of unstructured consumer reviews. In: Proceedings of the conference on empirical methods on natural language processing (EMNLP), Edinburgh, UK, pp 140–150
Yu J, Zha Z-J, Wang M, Chua T-S (2012) Answering opinion questions on products by exploiting hierarchical organization of consumer reviews. In: Proceedings of the conference on empirical methods on natural language processing (EMNLP), Jeju, Korea, pp 391–401
Zhang W, Yu C, Meng W (2007) Opinion retrieval from blogs. In: Proceedings of the 18th ACM international conference on information and knowledge management (CIKM), Lisboa, Portugal, pp 831–840
Acknowledgements
This work is supported by NUS-Tsinghua Extreme Search (NExT) project under the grant number: R-252-300-001-490. We give warm thanks to the project and anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Yu, J., Zha, ZJ., Chua, TS. (2013). Hierarchical Organization of Collaboratively Constructed Content. In: Gurevych, I., Kim, J. (eds) The People’s Web Meets NLP. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35085-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-35085-6_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35084-9
Online ISBN: 978-3-642-35085-6
eBook Packages: Computer ScienceComputer Science (R0)