Skip to main content

Hierarchical Organization of Collaboratively Constructed Content

  • Chapter
  • First Online:
The People’s Web Meets NLP

Abstract

Huge collections of collaboratively constructed content (e.g. blogs, consumer reviews, etc.) are now available online. This content has become a valuable knowledge repository, which enables users to seek quality information. However, such content is often unorganized, leading to difficulty in information navigation and knowledge acquisition. This chapter focuses on discovering the structure of the content and organizing them accordingly, so as to facilitate users in understanding the knowledge inherent within the content. In particular, we employ one example of the collaboratively constructed content, i.e. consumer reviews on products, as a case study, and propose a domain-assisted approach to generate a hierarchical structure to organize the reviews. The hierarchy organizes product aspects as nodes following their parent-child relations. For each aspect, the reviews and corresponding opinions on this aspect are stored. Such hierarchy provides a well-visualized way to browse consumer reviews at different granularity to meet various users’ needs, which can help to improve information dissemination and accessibility. We further apply the generated hierarchy to support the application of opinion Question Answering (opinion-QA) for products, which aims to generate appropriate answers for opinion questions about products. The experimental results on 11 popular products in 4 domains demonstrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.comp.nus.edu.sg/~Jianxing/Products_Reviews.rar

  2. 2.

    http://nlp.stanford.edu/software/lex-parser.shtml

  3. 3.

    http://thesaurus.com

  4. 4.

    http://cemantix.org/assert.html

  5. 5.

    Available in http://www.aclweb.org/supplementals/D/D11/D11-1013.Attachment.zip

  6. 6.

    Available in http://www.comp.nus.edu.sg/~Jianxing/Product_Reviews.rar

  7. 7.

    Available in http://www.aclweb.org/supplementals/D/D11/D11-1013.Attachment.zip

  8. 8.

    http://nlp.stanford.edu/software/CRF-NER.shtml

  9. 9.

    http://answers.yahoo.com

  10. 10.

    http://www.comp.nus.edu.sg/~Jianxing/auxiliary_material.zip

  11. 11.

    Using stanford POS tagger, http://nlp.stanford.edu/software/tagger.shtml

  12. 12.

    http://svmlight.joachims.org/svm_multiclass.html

  13. 13.

    http://www.comp.nus.edu.sg/~Jianxing/auxiliary_material.zip

  14. 14.

    Empirically set to 10 in the experiment.

  15. 15.

    http://www.comp.nus.edu.sg/~Jianxing/auxiliary_material.zip

  16. 16.

    It represents any pair of words in their sentence order, allowing at most two gaps in between.

References

  1. Adler B-T, Chatterjee K, Alfaro L, Faella M, Pye I, Raman V (2008) Assigning trust to wikipedia content. In: Proceedings of the 4th international symposium on Wikis Article (WikiSym), Porto, Portugal. Article No. 26

    Google Scholar 

  2. Agichtein E, Castillo C, Donato D (2008) Finding high-quality content in social media. In: Proceedings of the international conference on web search and web data mining (WSDM), Palo Alto, California, USA, pp 183–194

    Google Scholar 

  3. Balahur A, Boldrini E, Ferrandez O, Montoyo A, Palomar M, Munoz R (2008) The DLSIUAES team’s participation in the TAC 2008 tracks. In: Proceedings of the text analysis conference (TAC), Chicago, IL, USA

    Google Scholar 

  4. Beckham J (2005) The Cnet E-commerce data set. In: Technical University of Wisconsin

    Google Scholar 

  5. Berkhin P (2002) Survey of clustering data mining techniques. In: Accrue software, San Jose

    Google Scholar 

  6. Carenini G, Ng R, Zwart E (2006) Multi-document summarization of evaluative text. In: Proceedings of the 44st annual meeting of the association for computational linguistics on computational linguistics (ACL), Sydney, Australia, pp 3–7

    Google Scholar 

  7. Chang C-C, Lin C-J (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(3), Article No. 27

    Google Scholar 

  8. Cnet Content Solutions (2008) http://cnetcontentsolutions.com/news/press_release_2008_11_06.aspx

  9. Cimiano P (2006) Ontology learning and population from text: algorithms, evaluation and applications. Springer, Secaucus

    Google Scholar 

  10. Crammer K, Dekel O, Keshet J, Shwartz S-S, Singer Y (2006) Online passive aggressive algorithms. J Mach Learn Res 7:551–585

    Google Scholar 

  11. Davidov D, Gabrilovich E, Markovitch S (2004) Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In: Proceedings of 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR), Sheffield, UK, pp 250–257

    Google Scholar 

  12. Deng J, Dong W, Socher R, Li L-J, Li K, Li F-F (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), Miami, Florida, USA, pp 248–255

    Google Scholar 

  13. Deshpande P, Barzilay R, Karger D-R (2007) Randomized decoding for selection-and-ordering problems. In: Proceedings of the conference of the North American chapter of the association for computational linguistics (NAACL), Rochester, New York, USA, pp 444–451

    Google Scholar 

  14. Ding X, Liu B, Yu P-S (2008) A holistic lexicon-based approach to opinion mining. In: Proceedings of first ACM international conference on web search and data mining (WSDM), Palo Alto, California, USA, pp 231–240

    Google Scholar 

  15. Elsas J, Dumais S-T (2010) Leveraging temporal dynamics of document content in relevance ranking. In: Proceedings of the 3rd ACM international conference on web search and data mining (WSDM), New York, NY, USA, pp 1–10

    Google Scholar 

  16. Erkan G, Radev DR (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22(1):457–479

    Article  Google Scholar 

  17. Etzioni O, Cafarella M, Downey D, Popescu A, Shaked T, Soderland S, Weld D, Yates A (2005) Unsupervised named-entity extraction from the web: an experimental study. J Artif Intell 165(1):91–134

    Article  Google Scholar 

  18. Girju R, Badulescu A (2006) Automatic discovery of part-whole relations. J Comput Linguist 32(1):83–135

    Google Scholar 

  19. He J, Dai D (2011) Summarization of yes/no questions using a feature function model. J Mach Learn Res 20:351–366

    Google Scholar 

  20. Hearst M-A (1992) Automatic acquisition of hyponyms from large text Corpora. In: Proceedings of the 14th international conference on computational linguistics (COLING), Nantes, France, pp 539–545

    Google Scholar 

  21. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, USA, pp 168–177

    Google Scholar 

  22. Jiang P, Fu H, Zhang C, Niu Z (2010) A framework for opinion question answering. In: Advanced information management and service (IMS), Seoul, Korea, pp 424–427

    Google Scholar 

  23. Ku L-W, Liang Y-T, Chen H-H (2008) Question analysis and answer passage retrieval for opinion question answering systems. Int J Comput Linguist Chin Lang Process 13:307–326

    Google Scholar 

  24. Kullback S (1951) On information and sufficiency. Ann Math Stat 22(1):79–6

    Article  Google Scholar 

  25. Lapata M (2003) Probabilistic text structuring: experiments with sentence ordering. In: Proceedings of the 41st annual meeting of the association for computational linguistics on computational linguistics (ACL), Sapporo, Japan, pp 545–552

    Google Scholar 

  26. Lapata M (2006) Automatic evaluation of information ordering: Kendallś Tau. J Comput Linguist 32(4):471–484

    Article  Google Scholar 

  27. Li F, Tang Y, Huang M, Zhu X (2009) Answering opinion questions with random walks on graphs. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP (ACL/AFNLP), Singapore, pp 737–745

    Google Scholar 

  28. Li T, Zhang Y, Sindhwani V (2009) A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In: Proceedings of the 47th annual meeting of the association for computational linguistics on computational linguistics (ACL), Singapore, pp 244–252

    Google Scholar 

  29. Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of the 17th international conference on computational linguistics (COLING), Montreal, Quebec, Canada, pp 768–774

    Google Scholar 

  30. Lin C-Y, Hovy E (2003) Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language (HLT-NAACL), Edmonton, Canada, pp 71–78

    Google Scholar 

  31. Liu B (2009) Sentiment analysis and subjectivity. In: Handbook of natural language processing. Marcel Dekker, New York

    Google Scholar 

  32. Liu B, Hu M, Cheng J (2005) Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th international conference on world wide web (WWW), Chiba, Japan, pp 342–351

    Google Scholar 

  33. Liu Y, Bian J, Agichtein E (2008) Predicting information seeker satisfaction in community question answering. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR), Singapore, pp 483–490

    Google Scholar 

  34. Liu Y, Huang X, An A, Yu X (2008) Modeling and predicting the helpfulness of online reviews. In: Proceedings of the 18th IEEE international conference on data mining (ICDM), Pisa, Italy, pp 443–452

    Google Scholar 

  35. Lloret E, Balahur A, Palomar M, Montoyo A (2011) Towards a unified approach for opinion question answering and summarization. In: Proceedings of the 49th annual meeting of the association for computational linguistics on computational linguistics (ACL), Portland, Oregon, USA, pp 168–174

    Google Scholar 

  36. Lu Y, Tsaparas P, Ntoulas A, Polanyi L (2010) Exploiting social context for review quality prediction. In: Proceedings of the 19th international world wide web conference (WWW), Raleigh, North Carolina, USA, pp 691–700

    Google Scholar 

  37. Lu Y, Duan H, Wang H, Zhai C-X (2010) Exploiting structured ontology to organize scattered online opinions. In: Proceedings of the 14th international conference on computational linguistics (COLING), Beijing, China, pp 734–742

    Google Scholar 

  38. Manevitz L-M, Yousef M (2002) One-class SVMs for document classification. J Mach Learn Res 2:139–154

    Google Scholar 

  39. Mei Q, Ling X, Wondra M, Su H, Zhai C-X (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of the 16th international conference on world wide web (WWW), Banff, Alberta, Canada, pp 171–180

    Google Scholar 

  40. Mizil C-D, Kossinets G, Kleinberg J, Lee L (2009) How opinions are received by online communities: a case study on Amazon.com helpfulness votes. In: Proceedings of the 18th international conference on world wide web (WWW), Madrid, Spain, pp 141–150

    Google Scholar 

  41. Moghaddam S, Ester M (2011) AQA: aspect-based opinion question answering. In: IEEE international conference on data mining, Vancouver, BC, Canada, pp 89–96

    Google Scholar 

  42. Murthy K, Faruquie T-A, Subramaniam LV, Prasad KH, Mohania M (2010) Automatically generating term-frequency-induced taxonomies. In: Proceedings of the 48th annual meeting of the association for computational linguistics on computational linguistics (ACL), Uppsala, Sweden, pp 126–131

    Google Scholar 

  43. Nishikawa H, Hasegawa T, Matsuo Y, Kikui G (2010) Optimizing informativeness and readability for sentiment summarization. In: Proceedings of the 48th annual meeting of the association for computational linguistics on computational linguistics (ACL), Uppsala, Sweden, pp 325–330

    Google Scholar 

  44. Ohana B, Tierney B (2009) Sentiment classification of reviews using SentiWordNet. In: Proceedings of the 9th IT&T conference, Dublin, Ireland

    Google Scholar 

  45. Ouyang Y, Li W, Lu Q (2009) An integrated multi-document summarization approach based on word hierarchical representation. In: Proceedings of the ACL-IJCNLP 2009 conference, Singapore, pp 113–116

    Google Scholar 

  46. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the conference on empirical methods on natural language processing (EMNLP), Philadelphia, USA, pp 79–86

    Google Scholar 

  47. Pantel P, Pennacchiotti M (2006) Espresso: leveraging generic patterns for automatically harvesting semantic relations. In: Proceedings of the 44th annual meeting of the association for computational linguistics on computational linguistics (ACL), Sydney, Australia, pp 113–120

    Google Scholar 

  48. Popescu A-M, Etzioni O (2005) Extracting product features and opinions from reviews. In: Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT/EMNLP), Vancouver, BC, Canada, pp 339–346

    Google Scholar 

  49. Ramakrishnan R, Tomkins A (2007) Toward a PeopleWeb. Computer 40(8):63–72

    Article  Google Scholar 

  50. Santamaria C, Gonzalo J, Verdejo F (2003) Automatic association of Web directories with word senses. J Comput Linguist 29(3):485–502

    Article  Google Scholar 

  51. Schrijver A (1998) Theory of linear and integer programming. Wiley, Chichester/New York

    Google Scholar 

  52. Scott D-W (1992) Multivariate density estimation: theory, practice, and visualization. Wiley, New York

    Book  Google Scholar 

  53. Shi B, Chang K (2008) Generating a concept hierarchy for sentiment analysis. In: IEEE international conference on systems man and cybernetics, Singapore, pp 312–317

    Google Scholar 

  54. Silla C, Freitas A (2011) A survey of hierarchical classification across different application domains. J Data Min Knowl Disc 22(1–2):31–72

    Article  Google Scholar 

  55. Snow R, Jurafsky D (2006) Semantic taxonomy induction from heterogenous evidence. In: Proceedings of the 44th annual meeting of the association for computational linguistics on computational linguistics (ACL), Sydney, Australia, pp 801–808

    Google Scholar 

  56. Somasundaran S, Wilson T, Wiebe J, Stoyanov V (2007) QA with attitude: exploiting opinion type analysis for improving question answering in online discussions and the News. In: Proceedings of the conference on weblogs and social (ICWSM), Boulder, Colorado, USA

    Google Scholar 

  57. Strube M, Ponzetto S-P (2006) WikiRelate! computing semantic relatedness using Wikipedia. In: Proceedings of the 21st national conference on artificial intelligence (AAAI), Boston, Massachusetts, USA, pp 1419–1424

    Google Scholar 

  58. Su Q, Xu X, Guo H, Wu X, Zhang X, Swen B, Su Z (2008) Hidden sentiment association in Chinese Web opinion mining. In: Proceedings of the 17th international conference on world wide web (WWW), Beijing, China, pp 959–968

    Google Scholar 

  59. Wang H, Lu Y, Zhai C-X (2010) Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA, pp 783–792

    Google Scholar 

  60. Wikipedia (2012) http://en.wikipedia.org/wiki/Wikipedia

  61. Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT/EMNLP), Vancouver, BC, Canada, pp 347–354

    Google Scholar 

  62. Wong T-L, Lam W (2005) Hot item mining and summarization from multiple auction Web sites. In: Proceedings of the 2005 eighth IEEE international conference on data mining (ICDM), Washington, DC, USA, pp 797–800

    Google Scholar 

  63. Wu Y, Zhang Q, Huang X, Wu L (2009) Phrase dependency parsing for opinion mining. In: Proceedings of the 47th annual meeting of the association for computational linguistics on computational linguistics (ACL), Singapore, pp 1533–1541

    Google Scholar 

  64. Yang H (2011) Personalized concept hierarchy construction. Ph.D. thesis, Carnegie Mellon University

    Google Scholar 

  65. Yang H, Callan J (2009) A metric-based framework for automatic taxonomy induction. In: Proceedings of the 47th annual meeting of the association for computational linguistics on computational linguistics (ACL), Singapore, pp 271–279

    Google Scholar 

  66. Ye S, Chua T-S (2006) Learning object models from semi-structured web documents. IEEE Trans Knowl Data Eng 18(3):334–349

    Article  Google Scholar 

  67. Yu H, Hatzivassiloglou V (2003) Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the conference on empirical methods on natural language processing (EMNLP), Sapporo, Japan, pp 129–136

    Google Scholar 

  68. Yu J, Zha Z-J, Wang M, Wang K, Chua T-S (2011) Domain-assisted product aspect hierarchy generation: towards hierarchical organization of unstructured consumer reviews. In: Proceedings of the conference on empirical methods on natural language processing (EMNLP), Edinburgh, UK, pp 140–150

    Google Scholar 

  69. Yu J, Zha Z-J, Wang M, Chua T-S (2012) Answering opinion questions on products by exploiting hierarchical organization of consumer reviews. In: Proceedings of the conference on empirical methods on natural language processing (EMNLP), Jeju, Korea, pp 391–401

    Google Scholar 

  70. Zhang W, Yu C, Meng W (2007) Opinion retrieval from blogs. In: Proceedings of the 18th ACM international conference on information and knowledge management (CIKM), Lisboa, Portugal, pp 831–840

    Google Scholar 

Download references

Acknowledgements

This work is supported by NUS-Tsinghua Extreme Search (NExT) project under the grant number: R-252-300-001-490. We give warm thanks to the project and anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianxing Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Yu, J., Zha, ZJ., Chua, TS. (2013). Hierarchical Organization of Collaboratively Constructed Content. In: Gurevych, I., Kim, J. (eds) The People’s Web Meets NLP. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35085-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35085-6_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35084-9

  • Online ISBN: 978-3-642-35085-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics