Abstract
In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets of Wikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bennett, C., Li, M., Ma, B.: Chain letters and evolutionary histories. Scientific American 288(6), 76–81 (2003)
Biesiada, J., Duch, W.: Feature selection for high-dimensional data: A kolmogorov-smirnov correlation-based filter. Computer Recognition Systems, 95–103 (2005)
Büttcher, S., Clarke, C., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 621–622. ACM (2006)
Chevet, S.: Kernel associated with a cylindrical measure. Probability in Banach Spaces III, 51–84 (1981)
Czarnul, P.: Modeling, run-time optimization and execution of distributed workflow applications in the jee-based beesycluster environment. The Journal of Supercomputing, 1–26 (2010)
Davis, R., Shrobe, H., Szolovits, P.: What is a knowledge representation? AI magazine 14(1), 17 (1993)
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, vol. 6, p. 12. Morgan Kaufmann Publishers Inc. (2007)
Islam, A., Inkpen, D.: Real-word spelling correction using google web it 3-grams. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3, pp. 1241–1249. Association for Computational Linguistics (2009)
Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and its Applications, 3rd edn. Springer (2008)
Martins, B., Silva, M.: Language identification in web pages. In: Proceedings of the 2005 ACM Symposium on Applied Computing, pp. 764–768. ACM (2005)
Milne, D., Witten, I.: Learning to link with wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM (2008)
Papadimitriou, C., Sideri, M.: On the Floyd-Warshall algorithm for logic programs. Journal of Logic Programming 41(1), 129–137 (1999)
Sowa, J., et al.: Knowledge representation: logical, philosophical, and computational foundations, vol. 511. MIT Press (2000)
Szymański, J.: Categorization of Wikipedia Articles with Spectral Clustering. In: Yin, H., Wang, W., Rayward-Smith, V. (eds.) IDEAL 2011. LNCS, vol. 6936, pp. 108–115. Springer, Heidelberg (2011)
Szymański, J.: Self–Organizing Map Representation for Clustering Wikipedia Search Results. In: Nguyen, N.T., Kim, C.-G., Janiak, A. (eds.) ACIIDS 2011, Part II. LNCS, vol. 6592, pp. 140–149. Springer, Heidelberg (2011)
Wallach, H.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. ACM (2006)
Westa, M., Szymański, J., Krawczyk, H.: Text Classifiers for Automatic Articles Categorization. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part II. LNCS, vol. 7268, pp. 196–204. Springer, Heidelberg (2012)
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometrics and Intelligent Laboratory Systems 2(1-3), 37–52 (1987)
Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann Publishers, Inc. (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Szymański, J. (2013). Wikipedia Articles Representation with Matrix’u. In: Hota, C., Srimani, P.K. (eds) Distributed Computing and Internet Technology. ICDCIT 2013. Lecture Notes in Computer Science, vol 7753. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36071-8_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-36071-8_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36070-1
Online ISBN: 978-3-642-36071-8
eBook Packages: Computer ScienceComputer Science (R0)