Text Classification by Aggregation of SVD Eigenvectors

Symeonidis, Panagiotis; Kehayov, Ivaylo; Manolopoulos, Yannis

doi:10.1007/978-3-642-33074-2_29

Panagiotis Symeonidis¹⁹,
Ivaylo Kehayov¹⁹ &
Yannis Manolopoulos¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7503))

Included in the following conference series:

East European Conference on Advances in Databases and Information Systems

786 Accesses
2 Citations

Abstract

Text classification is a process where documents are categorized usually by topic, place, readability easiness, etc. For text classification by topic, a well-known method is Singular Value Decomposition. For text classification by readability, “Flesch Reading Ease index” calculates the readability easiness level of a document (e.g. easy, medium, advanced). In this paper, we propose Singular Value Decomposition combined either with Cosine Similarity or with Aggregated Similarity Matrices to categorize documents by readability easiness and by topic. We experimentally compare both methods with Flesch Reading Ease index, and the vector-based cosine similarity method on a synthetic and a real data set (Reuters-21578). Both methods clearly outperform all other comparison partners.

This work has been partially funded by the Greek GSRT (project number 10TUR/4-3-3) and the Turkish TUBITAK (project number 109E282) national agencies as part of Greek-Turkey 2011-2012 bilateral scientific cooperation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. Journal of Applied Psychology 60, 283–284 (1975)
Article Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)
Article MATH Google Scholar
Dale, E., Chall, J.: A Formula for Predicting Readability. Educational Research Bulletin 27, 11–20, 28 (1948)
Google Scholar
Furnas, G.W., Deerwester, S., et al.: Information Retrieval Using a Singular Value Decomposition Model of Latent Semantic Structure. In Proceedings of SIGIR Conference, pp.465-480, Grenoble, France (1988)
Google Scholar
Guan, H., Zhou, J., Guo, M.: A Class-Feature-Centroid Classifier for Text Categorization. In: Proceedings of WWW Conference, Madrid, Spain, pp. 201–210 (2009)
Google Scholar
Hans-Henning, G., Spiliopoulou, M., Nanopoulos, A.: Eigenvector-Based Clustering Using Aggregated Similarity Matrices. In: Proceedings of ACM SAC Conference, Sierre, Switzerland, pp. 1083–1087 (2010)
Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Kincaid, J.P., Fishburne, R.P., Rogers, R.L., Chissom, B.S.: Derivation of New Readability Formulas (Automated Readability Index, Fog Count, and Flesch Reading Ease formula) for Navy Enlisted Personnel. Chief of Naval Technical Training: Naval Air Station Memphis, Research Branch Report 8-75. Memphis, USA (1975)
Google Scholar
McLaughlin, G.H.: SMOG Grading a New Readability Formula. Journal of Reading 12(8), 639–646 (1969)
Google Scholar
Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Application of Dimensionality Reduction in Recommenders Systems: a Case Study. In: Proceedings of ACM WebKDD Workshop, Boston, MA, pp. 285–295 (2000)
Google Scholar
Smith, E.A., Senter, R.J.: Automated Readability Index. Wright Patterson AFB, Ohio. Aerospace Medical Division (1967)
Google Scholar
Spache, G.: A New Readability Formula for Primary-Grade Reading Materials. The Elementary School Journal 53(7), 410–413 (1953)
Article Google Scholar
Symeonidis, P.: Content-based Dimensionality Reduction for Recommender Systems. In: Proceedings of GfKl Conference, Freiburg, Germany, pp. 619–626 (2007)
Google Scholar
Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1(1-2), 69–90 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, Aristotle University, Thessaloniki, 54124, Greece
Panagiotis Symeonidis, Ivaylo Kehayov & Yannis Manolopoulos

Authors

Panagiotis Symeonidis
View author publications
You can also search for this author in PubMed Google Scholar
Ivaylo Kehayov
View author publications
You can also search for this author in PubMed Google Scholar
Yannis Manolopoulos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computing Science, Poznań University of Technology, Piotrowo 2, 60-965, Poznań, Poland
Tadeusz Morzy
Department of Computer Science, AG DBIS, University of Kaiserslautern, Germany, P.O. Box 3049, 67653, Kaiserslautern, Germany
Theo Härder
Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965, Poznan, Poland
Robert Wrembel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Symeonidis, P., Kehayov, I., Manolopoulos, Y. (2012). Text Classification by Aggregation of SVD Eigenvectors. In: Morzy, T., Härder, T., Wrembel, R. (eds) Advances in Databases and Information Systems. ADBIS 2012. Lecture Notes in Computer Science, vol 7503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33074-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-33074-2_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33073-5
Online ISBN: 978-3-642-33074-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics