Skip to main content

Exploring Wikipedia and DMoz as Knowledge Bases for Engineering a User Interests Hierarchy for Social Network Applications

  • Conference paper
On the Move to Meaningful Internet Systems: OTM 2009 (OTM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5871))

Abstract

The outgrowth of social networks in the recent years has resulted in opportunities for interesting data mining problems, such as interest or friendship recommendations. A global ontology over the interests specified by the users of a social network is essential for accurate recommendations. We propose, evaluate and compare three approaches to engineering a hierarchical ontology over user interests. The proposed approaches make use of two popular knowledge bases, Wikipedia and Directory Mozilla, to extract interest definitions and/or relationships between interests. More precisely, the first approach uses Wikipedia to find interest definitions, the latent semantic analysis technique to measure the similarity between interests based on their definitions, and an agglomerative clustering algorithm to group similar interests into higher level concepts. The second approach uses the Wikipedia Category Graph to extract relationships between interests, while the third approach uses Directory Mozilla to extract relationships between interests. Our results show that the third approach, although the simplest, is the most effective for building a hierarchy over user interests.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bausch, S., Han, L.: Social networking sites grow 47 percent, year over year, reaching 45 percent of web users, according to nielsen/netratings (2006), http://www.nielsen-online.com/pr/pr_060511.pdf

  2. Gruber, T.: A translation approach to portable ontology specifications. Technical report 5(2), 199–220, Knowledge Systems AI Laboratory, Stanford University (1993)

    Google Scholar 

  3. Bahirwani, V., Caragea, D., Aljandal, W., Hsu, W.: Ontology engineering and feature construction for predicting friendship links in the LiveJournal social network. In: The 2nd SNA-KDD Workshop 2008, Las Vegas, Nevada, USA (2008)

    Google Scholar 

  4. Jardine, N., van Rijsbergen, C.J.: The use of hierarchical clustering in information retrieval. Information Storage and Retrieval 7, 217–240 (1971)

    Article  Google Scholar 

  5. Haridas, M.: Exploring Wikipedia and DMoz as knowledge bases for engineering a user interest hierarchy for social network applications. M.S. Thesis, Department of Computing and Information Sciences. KSU, Manhattan, KS, USA (2009)

    Google Scholar 

  6. Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In: The 21st National Conference on Artificial Intelligence, Boston, MA (2006)

    Google Scholar 

  7. Janik, M., Kochut, K.: Wikipedia in action: Ontological knowledge in text categorization. Technical report no. uga-cs-tr-07-001, University of Georgia (2007)

    Google Scholar 

  8. Syed, Z.S., Finin, T., Joshi, A.: Wikipedia as an ontology for describing documents. In: The 2nd International Conference on Weblogs and Social Media (2008)

    Google Scholar 

  9. Strube, M., Ponzetto, S.P.: WikiRelate! computing semantic relatedness using Wikipedia. In: The 21st National Conf. on AI, Boston, MA (2006)

    Google Scholar 

  10. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: The 20th International Joint Conference on Artificial Intelligence, Hyderabad, India (2007)

    Google Scholar 

  11. Milne, D.: Computing semantic relatedness using Wikipedia link structure. In: The New Zealand Computer Science Research Student Conference (2007)

    Google Scholar 

  12. Lee, C.H., Yang, H.C., Ma, S.M.: A novel multi-language text categorization system using latent semantic indexing. In: The First International Conference on Innovative Computing, Information and Control, Beijing, China (2006)

    Google Scholar 

  13. Grobelnik, M., Mladeni, D.: Simple classification into large topic ontology of web documents. In: The 27th International Conference on Information Technology Interfaces, Cavtat, Croatia (2005)

    Google Scholar 

  14. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  15. Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    MATH  Google Scholar 

  16. Rosario, B.: Latent semantic indexing: An overview. Final paper INFOSYS 240. University of Berkeley (2000)

    Google Scholar 

  17. Zesch, T., Gurevynch, I.: Analysis of the Wikipedia category graph for NLP applications. In: The TextGraphs-2 Workshop (2007)

    Google Scholar 

  18. Maarek, Y.S., Shaul, I.Z.B.: Automatically organizing bookmarks per contents. Comput. Netw. ISDN Syst. 28(7-11), 1321–1333 (1996)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Haridas, M., Caragea, D. (2009). Exploring Wikipedia and DMoz as Knowledge Bases for Engineering a User Interests Hierarchy for Social Network Applications. In: Meersman, R., Dillon, T., Herrero, P. (eds) On the Move to Meaningful Internet Systems: OTM 2009. OTM 2009. Lecture Notes in Computer Science, vol 5871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05151-7_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-05151-7_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05150-0

  • Online ISBN: 978-3-642-05151-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics