Exploration of Document Classification with Linked Data and PageRank

Dostal, Martin; Nykl, Michal; Ježek, Karel

doi:10.1007/978-3-319-01571-2_6

Martin Dostal⁵,
Michal Nykl⁶ &
Karel Ježek⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 511))

1216 Accesses
3 Citations

Abstract

In this article, we would like to present a new approach to classification using Linked Data and PageRank. Our research is focused on classification methods that are enhanced by semantic information. The semantic information can be obtained from ontology or from Linked Data. DBpedia was used as a source of Linked Data in our case. The feature selection method is semantically based so features can be recognized by non-professional users as they are in a human readable and understandable form. PageRank is used during the feature selection and generation phase for the expansion of basic features into more general representatives. This means that feature selection and PageRank processing is based on network relations obtained from Linked Data. The discovered features can be used by standard classification algorithms. We will present promising results that show the simple applicability of this approach to two different datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berners-Lee, T.: Linked Data - Design Issues. Online document (2006), http://www.w3.org/DesignIssues/LinkedData.html/ (Cited January 12, 2013)
Bloehdorn, S., Hotho, A.: Boosting for Text Classification with Semantic Features. In: Mobasher, B., Nasraoui, O., Liu, B., Masand, B. (eds.) WebKDD 2004. LNCS (LNAI), vol. 3932, pp. 149–166. Springer, Heidelberg (2006)
Chapter Google Scholar
Brine, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)
Article Google Scholar
Cohen, W., Singer, Y.: Context-sensitive learning methods for text categorization. In: Proceedings of the ACM SIGIR 1996 (1996)
Google Scholar
DBPedia, http://dbpedia.org/ (Cited January 12, 2013)
de Melo, G., Siersdorfer, S.: Multilingual text classification using ontologies. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 541–548. Springer, Heidelberg (2007)
Chapter Google Scholar
Gabrilovich, E., et al.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the IJCAI 2007, Hyderabad, India, pp. 1606–1611 (2007)
Google Scholar
Jaffri, A., Glaser, H., Millard, I.: URI Disambiguation in the Context of Linked Data. In: Proceedings of the LDOW 2008, Beijing, China (2008)
Google Scholar
Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the Twelfth International Conference on Machine Learning, 20 News groups dataset, pp. 331–339 (1995)
Google Scholar
Langville, A.N., Meyer, C.D.: Google’s PageRank and Beyond: The Science of Search Engine Ranking. Princeton University Press, Princeton (2006)
Google Scholar
Ma, N., et al.: Bringing PageRank to the citation analysis. Proceedings of the Information Processing & Management 44(2), 800–810 (2008)
Article Google Scholar
Ramakrishnanan, G., Bhattacharyya, P.: Text Representation with WordNet Synsets using Soft Sense Disambiguation. In: Proceedings of the 8th NLDB, Burg, Germany (2003)
Google Scholar
Salton, G.: The SMART Retrieval System. Prentice-Hall, Englewood Cliffs (1971)
Google Scholar
Schapire, R., Singer, Y.: BoosTexter: A boosting-based system for text categorization. In: Machine Learning, pp. 135–168 (1999)
Google Scholar
Strube, M., Ponzetto, S.P.: WikiRelate! Computing semantic relatedness using Wikipedia. In: Proceedings of the AAAI 2006, Boston, USA, pp. 1419–1424 (2006)
Google Scholar
Wang, W., Do, D.B., Lin, X.: Term Graph Model for Text Classification. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 19–30. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

NTIS - New Technologies for the Information Society, Faculty of Applied Sciences, University of West Bohemia, Pilsen, Czech Republic
Martin Dostal
Department of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia, Pilsen, Czech Republic
Michal Nykl & Karel Ježek

Authors

Martin Dostal
View author publications
You can also search for this author in PubMed Google Scholar
Michal Nykl
View author publications
You can also search for this author in PubMed Google Scholar
Karel Ježek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Dostal .

Editor information

Editors and Affiliations

Faculty of Mathematics and Physics, Charles University in Prague, Prague, Czech Republic
Filip Zavoral
Department of Computer Engineering, Yeungnam University, Gyeingsan, Korea, Republic of (South Korea)
Jason J. Jung
Faculty of Automatics, Computers and Electronics, University of Craiova Software Engineering Department, Craiova, Romania
Costin Badica

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dostal, M., Nykl, M., Ježek, K. (2014). Exploration of Document Classification with Linked Data and PageRank. In: Zavoral, F., Jung, J., Badica, C. (eds) Intelligent Distributed Computing VII. Studies in Computational Intelligence, vol 511. Springer, Cham. https://doi.org/10.1007/978-3-319-01571-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-01571-2_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01570-5
Online ISBN: 978-3-319-01571-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics