Abstract
The outgrowth of social networks in the recent years has resulted in opportunities for interesting data mining problems, such as interest or friendship recommendations. A global ontology over the interests specified by the users of a social network is essential for accurate recommendations. We propose, evaluate and compare three approaches to engineering a hierarchical ontology over user interests. The proposed approaches make use of two popular knowledge bases, Wikipedia and Directory Mozilla, to extract interest definitions and/or relationships between interests. More precisely, the first approach uses Wikipedia to find interest definitions, the latent semantic analysis technique to measure the similarity between interests based on their definitions, and an agglomerative clustering algorithm to group similar interests into higher level concepts. The second approach uses the Wikipedia Category Graph to extract relationships between interests, while the third approach uses Directory Mozilla to extract relationships between interests. Our results show that the third approach, although the simplest, is the most effective for building a hierarchy over user interests.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bausch, S., Han, L.: Social networking sites grow 47 percent, year over year, reaching 45 percent of web users, according to nielsen/netratings (2006), http://www.nielsen-online.com/pr/pr_060511.pdf
Gruber, T.: A translation approach to portable ontology specifications. Technical report 5(2), 199–220, Knowledge Systems AI Laboratory, Stanford University (1993)
Bahirwani, V., Caragea, D., Aljandal, W., Hsu, W.: Ontology engineering and feature construction for predicting friendship links in the LiveJournal social network. In: The 2nd SNA-KDD Workshop 2008, Las Vegas, Nevada, USA (2008)
Jardine, N., van Rijsbergen, C.J.: The use of hierarchical clustering in information retrieval. Information Storage and Retrieval 7, 217–240 (1971)
Haridas, M.: Exploring Wikipedia and DMoz as knowledge bases for engineering a user interest hierarchy for social network applications. M.S. Thesis, Department of Computing and Information Sciences. KSU, Manhattan, KS, USA (2009)
Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In: The 21st National Conference on Artificial Intelligence, Boston, MA (2006)
Janik, M., Kochut, K.: Wikipedia in action: Ontological knowledge in text categorization. Technical report no. uga-cs-tr-07-001, University of Georgia (2007)
Syed, Z.S., Finin, T., Joshi, A.: Wikipedia as an ontology for describing documents. In: The 2nd International Conference on Weblogs and Social Media (2008)
Strube, M., Ponzetto, S.P.: WikiRelate! computing semantic relatedness using Wikipedia. In: The 21st National Conf. on AI, Boston, MA (2006)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: The 20th International Joint Conference on Artificial Intelligence, Hyderabad, India (2007)
Milne, D.: Computing semantic relatedness using Wikipedia link structure. In: The New Zealand Computer Science Research Student Conference (2007)
Lee, C.H., Yang, H.C., Ma, S.M.: A novel multi-language text categorization system using latent semantic indexing. In: The First International Conference on Innovative Computing, Information and Control, Beijing, China (2006)
Grobelnik, M., Mladeni, D.: Simple classification into large topic ontology of web documents. In: The 27th International Conference on Information Technology Interfaces, Cavtat, Croatia (2005)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Rosario, B.: Latent semantic indexing: An overview. Final paper INFOSYS 240. University of Berkeley (2000)
Zesch, T., Gurevynch, I.: Analysis of the Wikipedia category graph for NLP applications. In: The TextGraphs-2 Workshop (2007)
Maarek, Y.S., Shaul, I.Z.B.: Automatically organizing bookmarks per contents. Comput. Netw. ISDN Syst. 28(7-11), 1321–1333 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Haridas, M., Caragea, D. (2009). Exploring Wikipedia and DMoz as Knowledge Bases for Engineering a User Interests Hierarchy for Social Network Applications. In: Meersman, R., Dillon, T., Herrero, P. (eds) On the Move to Meaningful Internet Systems: OTM 2009. OTM 2009. Lecture Notes in Computer Science, vol 5871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05151-7_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-05151-7_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05150-0
Online ISBN: 978-3-642-05151-7
eBook Packages: Computer ScienceComputer Science (R0)