Computational Topology in Text Mining

  • Hubert Wagner
  • Paweł Dłotko
  • Marian Mrozek
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7309)

Abstract

In this paper we present our ongoing research on applying computational topology to analysis of structure of similarities within a collection of text documents. Our work is on the fringe between text mining and computational topology, and we describe techniques from each of these disciplines. We transform text documents to the so-called vector space model, which is often used in text mining. This representation is suitable for topological computations. We compute homology, using discrete Morse theory, and persistent homology of the Flag complex built from the point-cloud representing the input data. Since the space is high-dimensional, many difficulties appear. We describe how we tackle these problems and point out what challenges are still to be solved.

Keywords

Computational topology Computational homology Flag Complex Discrete Morse theory Text mining Vector space model 

References

  1. 1.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval, p. 192. Addison-Wesley Longman, Reading (1999)Google Scholar
  2. 2.
    Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proc. of ICML 2006 (2006)Google Scholar
  3. 3.
    Carlson, G.: Topology and Data. Bulletin of the AMS 46(2), 255–308 (2009)CrossRefGoogle Scholar
  4. 4.
    Chen, C., Kerber, M.: Persistent homology computation with a twist. In: 27th European Workshop on Computational Geometry, EuroCG 2011 (2011)Google Scholar
  5. 5.
    Edelsbrunner, H., Harer, J.L.: Computational Topology. An Introduction. Amer. Math. Soc., Providence (2010)MATHGoogle Scholar
  6. 6.
    Feng, A.-X., Fu, C.-H., Xu, X.-L., Liu, A.-F., Chang, H., He, D.-R., Feng, G.-L.: An Empirical Investigation on Important Subgraphs in Cooperation-Competition networks. Science (2011)Google Scholar
  7. 7.
    Forman, R.: A User’s Guide To Discrete Morse Theory. Séminaire Lotharingien de Combinatoire B48c, 1–35 (2002)MathSciNetGoogle Scholar
  8. 8.
    Kozlov, D.: Combinatorial Algebraic Topology. Springer (2007)Google Scholar
  9. 9.
    Lewiner, T.: Geometric discrete Morse complexes, PhD Thesis (2005)Google Scholar
  10. 10.
    Lewiner, T., Lopes, H., Tavares, G.: Toward Optimality in Discrete Morse Theory. Experiment. Math. 12(3), 271–286 (2003)MathSciNetMATHCrossRefGoogle Scholar
  11. 11.
    Polanco, X., Juan, E.S.: Text Data Network Analysis Using Graph Approach. In: Proc. of InSciT, pp. 586–592 (2006)Google Scholar
  12. 12.
    Robins, V., Wood, P.J., Sheppard, A.P.: Theory and Algorithms for Constructing Discrete Morse Complexes from Grayscale Digital Images. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1646–1658 (2011)CrossRefGoogle Scholar
  13. 13.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)MATHCrossRefGoogle Scholar
  14. 14.
    Zipf, G.K.: Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge (1949)Google Scholar
  15. 15.
    Zomorodian, A.: Fast construction of the Vietoris-Rips complex. Computers & Graphics 34(3), 263–271 (2010)CrossRefGoogle Scholar
  16. 16.
    Günther, D., Reininghaus, J., Wagner, H., Hotz, I.: Memory Efficient Computation of Persistent Homology for 3D Image Data using Discrete Morse Theory. In: Sibgrapi 2011, Maceio, Brazil (2011)Google Scholar
  17. 17.
    English Wikipedia corpus, http://dumps.wikimedia.org/enwiki/
  18. 18.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hubert Wagner
    • 1
  • Paweł Dłotko
    • 1
  • Marian Mrozek
    • 1
  1. 1.Institute of Computer Science Jagiellonian UniversityPoland

Personalised recommendations