Towards Increasing Density of Relations in Category Graphs

Draszawka, Karol; Szymański, Julian; Krawczyk, Henryk

doi:10.1007/978-3-319-04714-0_4

Karol Draszawka⁷,
Julian Szymański⁷ &
Henryk Krawczyk⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 541))

601 Accesses

Abstract

In the chapter we propose methods for identifying new associations between Wikipedia categories. The first method is based on Bag-of-Words (BOW) representation of Wikipedia articles. Using similarity of the articles belonging to different categories allows to calculate the information about categories similarity. The second method is based on average scores given to categories while categorizing documents by our dedicated score-based classifier. As a result of application of presented methods we obtain weighed category graphs that allow to extend original relations between Wikipedia categories. We propose the method for selecting the weight value for cutting off less important relations. The given preliminary examination of the quality of obtained new relations supports our procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://dumps.wikimedia.org/ [dumpfile from 01.04.2010].

References

Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. (1999)
Google Scholar
Langville, A.N., Meyer, C.D.: Deeper inside pagerank. Internet Math. 1, 335–380 (2004)
Article MATH MathSciNet Google Scholar
Baeza-Yates, R., Davis, E.: Web page ranking using link attributes. In: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers and Posters, ACM, 328–329, 2004
Google Scholar
Cleophas, T.J., Zwinderman, A.H.: Missing data imputation. In: Statistical Analysis of Clinical Data on a Pocket Calculator, Part 2, pp. 7–10. Springer (2012)
Google Scholar
Deptuła, M., Szymański, J., Krawczyk, H.: Interactive information search in text data collections. In: Intelligent Tools for Building a Scientific Information Platform, pp. 25–40, Springer. (2013)
Google Scholar
Zhang, S., Qin, Z., Ling, C.X., Sheng, S.: Missing is useful: missing values in cost-sensitive decision trees. IEEE Trans. Knowl. Data Eng. 17, 1689–1693 (2005)
Article Google Scholar
Zesch, T., Gurevych, I.: Analysis of the wikipedia category graph for nlp applications. In: Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007), pp. 1–8, 2007
Google Scholar
Schonhofen, P.: Identifying document topics using the wikipedia category network. In: Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on, IEEE. pp. 456–462 (2006)
Google Scholar
Hu, X., Zhang, X., Lu, C., Park, E.K., Zhou, X.: Exploiting wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 389–396, 2009
Google Scholar
Biuk-Aghai, R.P., Pang, C.I., Si, Y.W.: Visualizing large-scale human collaboration in wikipedia. Future Gener. Comput. Syst. 31, 120–133 (2013)
Google Scholar
Szymański, J.: Mining relations between wikipedia categories. In: Networked Digital Technologies, 248—255. Springer (2010)
Google Scholar
Chernov, S., Iofciu, T., Nejdl, W., Zhou, X.: Extracting semantic relationships between wikipedia categories. In: Proceedings of Workshop on Semantic Wikis (SemWiki 2006), Citeseer (2006)
Google Scholar
Holloway, T., Bozicevic, M., Börner, K.: Analyzing and visualizing the semantic coverage of wikipedia and its authors. Complexity 12, 30–40 (2007)
Article Google Scholar
Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text data using clustering. Mach. Learn. 42, 143–175 (2001)
Article MATH Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1, Cambridge University Press, Cambridge (2008)
Google Scholar
Day, W.H., Edelsbrunner, H.: Efficient algorithms for agglomerative hierarchical clustering methods. J. Classif. 1, 7–24 (1984)
Article MATH Google Scholar
Yang, Y.: A study of thresholding strategies for text categorization. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ACM. pp. 137–145 (2001)
Google Scholar
Ioannou, M., Sakkas, G., Tsoumakas, G., Vlahavas, L.: Obtaining bipartitions from score vectors for multi-label classification. In: Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on, vol. 1, 409–416 (2010)
Google Scholar
Draszawka, K., Szymański, J.: Thresholding strategies for large scale multi-label text classifier. In: Proceedings of the 6th International Conference on Human System Interaction, IEEE. pp. 347–352 (2013)
Google Scholar
Draszawka, K., Szymanski, J.: External validation measures for nested clustering of text documents. In: Ryzko D., Rybinski H., Gawrysiak P., Kryszkiewicz M. (eds.) ISMIS Industrial Session. Volume 369 of Studies in Computational Intelligence, Springer. 207–225 (2011)
Google Scholar

Download references

Acknowledgments

This work has been supported by the National Centre for Research and Development (NCBiR) under research Grant No. SP/I/1/77065/1 SYNAT: “Establishment of the universal, open, hosting and communication, repository platform for network resources of knowledge to be used by science, education and open knowledge society”.

Author information

Authors and Affiliations

Department of Computer Systems Architecture, Gdańsk University of Technology, Gdańsk, Poland
Karol Draszawka, Julian Szymański & Henryk Krawczyk

Authors

Karol Draszawka
View author publications
You can also search for this author in PubMed Google Scholar
Julian Szymański
View author publications
You can also search for this author in PubMed Google Scholar
Henryk Krawczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julian Szymański .

Editor information

Editors and Affiliations

Faculty of Electronics and Information Technology, Warsaw University of Technology, Institute of Computer Science, Warsaw, Poland
Robert Bembenik
Faculty of Electronics and Information Technology, Warsaw University of Technology, Institute of Computer Science, Warsaw, Poland
Łukasz Skonieczny
Faculty of Electronics and Information Technology, Warsaw University of Technology, Institute of Computer Science, Warsaw, Poland
Henryk Rybiński
Faculty of Electronics and Information Technology, Warsaw University of Technology, Institute of Computer Science, Warsaw, Poland
Marzena Kryszkiewicz
InterdisciplinaryCentre for Mathematical and Computational Modelling (ICM), University of Warsaw, Warsaw, Poland
Marek Niezgódka

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Draszawka, K., Szymański, J., Krawczyk, H. (2014). Towards Increasing Density of Relations in Category Graphs. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds) Intelligent Tools for Building a Scientific Information Platform: From Research to Implementation. Studies in Computational Intelligence, vol 541. Springer, Cham. https://doi.org/10.1007/978-3-319-04714-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-04714-0_4
Published: 27 February 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04713-3
Online ISBN: 978-3-319-04714-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics