Text Classifiers for Automatic Articles Categorization

Westa, Mateusz; Szymański, Julian; Krawczyk, Henryk

doi:10.1007/978-3-642-29350-4_24

Mateusz Westa²³,
Julian Szymański²³ &
Henryk Krawczyk²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7268))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

1696 Accesses
4 Citations

Abstract

The article concerns the problem of automatic classification of textual content. We present selected methods for generation of documents representation and we evaluate them in classification tasks. The experiments have been performed on Wikipedia articles classified automatically to their categories made by Wikipedia editors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aas, K., Eikvil, L.: Text Categorisation: A Survey. Raport NR 941 (1999)
Google Scholar
Bennett, C., Li, M., Ma, B.: Chain Letters and Evolutionary Histories. Scientific American 288(6), 76–81 (2003)
Article Google Scholar
Cavnar, W.B., Trenkle, J.M.: N-Gram-Based Text Categorization
Google Scholar
Duch, W., Blachnik, M., Wieczorek, T.: Probabilistic Distance Measures for Prototype-Based Rules (in polish). In: Proc. of the 12 International Conference on Neural Information Processing, ICONIP, Citeseer, pp. 445–450 (2005)
Google Scholar
Eyheramendy, S., Lewis, D., Madigan, D.: On the Naive Bayes Model for Text Categorization (2003)
Google Scholar
Grossi, R., Vitter, J.: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. In: Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, pp. 397–406. ACM (2000)
Google Scholar
Korenius, T., Laurikkala, J., Juhola, M.: On Principal Component Analysis, Cosine and Euclidean Measures in Information Retrieval (in polish). Information Sciences 177(22), 4893–4905 (2007)
Article MathSciNet MATH Google Scholar
Kosmulski, M.: Representation of Text Documents in The Vector Space Model (in polish), 14–25, 34–41 (2005)
Google Scholar
Łazewski, Ł., Pikuła, M., Siemion, A., Szklarzewski, M., Pindelski, S.: The Classification of Text Documents (in polish), 17–26, 62–66
Google Scholar
Leahy, P.: n-Gram-Based Text Attribution
Google Scholar
Li, Y., Jain, A.: Classification of Text Documents. The Computer Journal 41(8), 537 (1998)
Article MATH Google Scholar
Miller, G.A., Beckitch, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An On-line Lexical Database. Cognitive Science Laboratory. Princeton University Press (1993)
Google Scholar
Newman, M.: Power laws, Pareto Distributions and Zipf’s Law. Arxiv Preprint cond-mat/0412004 (2004)
Google Scholar
Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 Extension to Multiple Weighted Fields. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 42–49. ACM (2004)
Google Scholar
Steffen, J.: N-gram Language Modeling for Robust Multi-Lingual Document Classification. In: The 4th International Conference on Language Resources and Evaluation (LREC 2004). German Research Center for Artificial Intelligence (2004)
Google Scholar
Szymański, J., Mizgier, A., Szopiński, M., Lubomski, P.: Disambiguation Words Meaning Using WordNet Dictionary (in polish). Scientific Publishers PG TI 2008 18, 89–195 (2008)
Google Scholar
Wong, S.K.M., Ziarko, W., Wong, P.N.: Generalized Vector Spaces Model in Information Retrieval. In: SIGIR 1985, pp. 18–25. ACM Press, New York (1985)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Poland
Mateusz Westa, Julian Szymański & Henryk Krawczyk

Authors

Mateusz Westa
View author publications
You can also search for this author in PubMed Google Scholar
Julian Szymański
View author publications
You can also search for this author in PubMed Google Scholar
Henryk Krawczyk
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Częstochowa University of Technology, Armii Krajowej 36, 42-200, Częstochowa, Poland
Leszek Rutkowski , Marcin Korytkowski & Rafał Scherer , &
AGH University of Science and Technology, Mickiewicza 30, 30-059, Kraków, Poland
Ryszard Tadeusiewicz
Department of Electrical Engineering and Computer Sciences, Computer Science Division, University of California Berkeley, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh
Computational Intelligence Laboratory, Electrical and Computer Engineering, University of Louisville, 405 Lutz Hall, 40292, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Westa, M., Szymański, J., Krawczyk, H. (2012). Text Classifiers for Automatic Articles Categorization. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2012. Lecture Notes in Computer Science(), vol 7268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29350-4_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-29350-4_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29349-8
Online ISBN: 978-3-642-29350-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics