Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 541))

  • 601 Accesses

Abstract

In the chapter we propose methods for identifying new associations between Wikipedia categories. The first method is based on Bag-of-Words (BOW) representation of Wikipedia articles. Using similarity of the articles belonging to different categories allows to calculate the information about categories similarity. The second method is based on average scores given to categories while categorizing documents by our dedicated score-based classifier. As a result of application of presented methods we obtain weighed category graphs that allow to extend original relations between Wikipedia categories. We propose the method for selecting the weight value for cutting off less important relations. The given preliminary examination of the quality of obtained new relations supports our procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://dumps.wikimedia.org/ [dumpfile from 01.04.2010].

References

  1. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. (1999)

    Google Scholar 

  2. Langville, A.N., Meyer, C.D.: Deeper inside pagerank. Internet Math. 1, 335–380 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  3. Baeza-Yates, R., Davis, E.: Web page ranking using link attributes. In: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers and Posters, ACM, 328–329, 2004

    Google Scholar 

  4. Cleophas, T.J., Zwinderman, A.H.: Missing data imputation. In: Statistical Analysis of Clinical Data on a Pocket Calculator, Part 2, pp. 7–10. Springer (2012)

    Google Scholar 

  5. Deptuła, M., Szymański, J., Krawczyk, H.: Interactive information search in text data collections. In: Intelligent Tools for Building a Scientific Information Platform, pp. 25–40, Springer. (2013)

    Google Scholar 

  6. Zhang, S., Qin, Z., Ling, C.X., Sheng, S.: Missing is useful: missing values in cost-sensitive decision trees. IEEE Trans. Knowl. Data Eng. 17, 1689–1693 (2005)

    Article  Google Scholar 

  7. Zesch, T., Gurevych, I.: Analysis of the wikipedia category graph for nlp applications. In: Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007), pp. 1–8, 2007

    Google Scholar 

  8. Schonhofen, P.: Identifying document topics using the wikipedia category network. In: Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on, IEEE. pp. 456–462 (2006)

    Google Scholar 

  9. Hu, X., Zhang, X., Lu, C., Park, E.K., Zhou, X.: Exploiting wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 389–396, 2009

    Google Scholar 

  10. Biuk-Aghai, R.P., Pang, C.I., Si, Y.W.: Visualizing large-scale human collaboration in wikipedia. Future Gener. Comput. Syst. 31, 120–133 (2013)

    Google Scholar 

  11. Szymański, J.: Mining relations between wikipedia categories. In: Networked Digital Technologies, 248—255. Springer (2010)

    Google Scholar 

  12. Chernov, S., Iofciu, T., Nejdl, W., Zhou, X.: Extracting semantic relationships between wikipedia categories. In: Proceedings of Workshop on Semantic Wikis (SemWiki 2006), Citeseer (2006)

    Google Scholar 

  13. Holloway, T., Bozicevic, M., Börner, K.: Analyzing and visualizing the semantic coverage of wikipedia and its authors. Complexity 12, 30–40 (2007)

    Article  Google Scholar 

  14. Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42, 143–175 (2001)

    Article  MATH  Google Scholar 

  15. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1, Cambridge University Press, Cambridge (2008)

    Google Scholar 

  16. Day, W.H., Edelsbrunner, H.: Efficient algorithms for agglomerative hierarchical clustering methods. J. Classif. 1, 7–24 (1984)

    Article  MATH  Google Scholar 

  17. Yang, Y.: A study of thresholding strategies for text categorization. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ACM. pp. 137–145 (2001)

    Google Scholar 

  18. Ioannou, M., Sakkas, G., Tsoumakas, G., Vlahavas, L.: Obtaining bipartitions from score vectors for multi-label classification. In: Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on, vol. 1, 409–416 (2010)

    Google Scholar 

  19. Draszawka, K., Szymański, J.: Thresholding strategies for large scale multi-label text classifier. In: Proceedings of the 6th International Conference on Human System Interaction, IEEE. pp. 347–352 (2013)

    Google Scholar 

  20. Draszawka, K., Szymanski, J.: External validation measures for nested clustering of text documents. In: Ryzko D., Rybinski H., Gawrysiak P., Kryszkiewicz M. (eds.) ISMIS Industrial Session. Volume 369 of Studies in Computational Intelligence, Springer. 207–225 (2011)

    Google Scholar 

Download references

Acknowledgments

This work has been supported by the National Centre for Research and Development (NCBiR) under research Grant No. SP/I/1/77065/1 SYNAT: “Establishment of the universal, open, hosting and communication, repository platform for network resources of knowledge to be used by science, education and open knowledge society”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julian Szymański .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Draszawka, K., Szymański, J., Krawczyk, H. (2014). Towards Increasing Density of Relations in Category Graphs. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds) Intelligent Tools for Building a Scientific Information Platform: From Research to Implementation. Studies in Computational Intelligence, vol 541. Springer, Cham. https://doi.org/10.1007/978-3-319-04714-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04714-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04713-3

  • Online ISBN: 978-3-319-04714-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics