Skip to main content

Wikipedia Articles Representation with Matrix’u

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7753))

Abstract

In the article we evaluate different text representation methods used for a task of Wikipedia articles categorization. We present the Matrix’u application used for creating computational datasets of Wikipedia articles. The representations have been evaluated with SVM classifiers used for reconstruction human made categories.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bennett, C., Li, M., Ma, B.: Chain letters and evolutionary histories. Scientific American 288(6), 76–81 (2003)

    Article  Google Scholar 

  2. Biesiada, J., Duch, W.: Feature selection for high-dimensional data: A kolmogorov-smirnov correlation-based filter. Computer Recognition Systems, 95–103 (2005)

    Google Scholar 

  3. Büttcher, S., Clarke, C., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 621–622. ACM (2006)

    Google Scholar 

  4. Chevet, S.: Kernel associated with a cylindrical measure. Probability in Banach Spaces III, 51–84 (1981)

    Google Scholar 

  5. Czarnul, P.: Modeling, run-time optimization and execution of distributed workflow applications in the jee-based beesycluster environment. The Journal of Supercomputing, 1–26 (2010)

    Google Scholar 

  6. Davis, R., Shrobe, H., Szolovits, P.: What is a knowledge representation? AI magazine 14(1), 17 (1993)

    Google Scholar 

  7. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  8. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, vol. 6, p. 12. Morgan Kaufmann Publishers Inc. (2007)

    Google Scholar 

  9. Islam, A., Inkpen, D.: Real-word spelling correction using google web it 3-grams. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3, pp. 1241–1249. Association for Computational Linguistics (2009)

    Google Scholar 

  10. Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and its Applications, 3rd edn. Springer (2008)

    Google Scholar 

  11. Martins, B., Silva, M.: Language identification in web pages. In: Proceedings of the 2005 ACM Symposium on Applied Computing, pp. 764–768. ACM (2005)

    Google Scholar 

  12. Milne, D., Witten, I.: Learning to link with wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM (2008)

    Google Scholar 

  13. Papadimitriou, C., Sideri, M.: On the Floyd-Warshall algorithm for logic programs. Journal of Logic Programming 41(1), 129–137 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  14. Sowa, J., et al.: Knowledge representation: logical, philosophical, and computational foundations, vol. 511. MIT Press (2000)

    Google Scholar 

  15. Szymański, J.: Categorization of Wikipedia Articles with Spectral Clustering. In: Yin, H., Wang, W., Rayward-Smith, V. (eds.) IDEAL 2011. LNCS, vol. 6936, pp. 108–115. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  16. Szymański, J.: Self–Organizing Map Representation for Clustering Wikipedia Search Results. In: Nguyen, N.T., Kim, C.-G., Janiak, A. (eds.) ACIIDS 2011, Part II. LNCS, vol. 6592, pp. 140–149. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Wallach, H.: Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 977–984. ACM (2006)

    Google Scholar 

  18. Westa, M., Szymański, J., Krawczyk, H.: Text Classifiers for Automatic Articles Categorization. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012, Part II. LNCS, vol. 7268, pp. 196–204. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  19. Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometrics and Intelligent Laboratory Systems 2(1-3), 37–52 (1987)

    Article  Google Scholar 

  20. Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann Publishers, Inc. (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Szymański, J. (2013). Wikipedia Articles Representation with Matrix’u. In: Hota, C., Srimani, P.K. (eds) Distributed Computing and Internet Technology. ICDCIT 2013. Lecture Notes in Computer Science, vol 7753. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36071-8_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36071-8_40

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36070-1

  • Online ISBN: 978-3-642-36071-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics