Skip to main content

Document Analysis Based on Multi-view Intact Space Learning with Manifold Regularization

  • Conference paper
  • First Online:
Intelligence Science and Big Data Engineering (IScIDE 2017)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10559))

  • 2227 Accesses

Abstract

Document analysis plays an important role in our life, and traditional models like Latent Semantic Analysis (LSI) or Latent Dirichlet Allocation (LDA) cannot handle data from many sources. Multi-view learning technology like Multi-view Intact Space Learning (MISL), which integrates the complementary information on multiple views to discover a latent intact representation of the data, is effective for image or video application. But the model has not been applied to multi-lingual documents and has not considered the intrinsic geometrical and discriminating structure of the document data. To overcome this issue, we assume that if documents are close in the origin representation, they should also be close in the intact space representation. And we introduce a manifold regularization term to MISL so that the data is more smoothly in latent space. We conduct classification experiments on 10505 Wiki documents we crawled, and the result shows that it is outperforming TFIDF, LSI, LDA, and MISL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Salton, G., Wong, A., Yang, C.-S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  2. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391 (1990)

    Article  Google Scholar 

  3. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1), 177–196 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  4. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, pp. 50–57. ACM, New York (1999)

    Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)

    MATH  Google Scholar 

  6. Zhao, J., Xie, X., Xin, X., Sun, S.: Multi-view learning overview: recent progress and new challenges. Inf. Fusion 38, 43–54 (2017)

    Article  Google Scholar 

  7. Xu, C., Tao, D., Xu, C.: Multi-view intact space learning. IEEE Trans. Pattern Anal. Mach. Intell. 37(12), 2531–2544 (2015)

    Article  Google Scholar 

  8. Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. NIPS 14, 585–591 (2001)

    Google Scholar 

  9. Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7(Nov), 2399–2434 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Chung, F.R.K.: Spectral Graph Theory, vol. 92. American Mathematical Society, Providence (1997)

    MATH  Google Scholar 

  11. Belkin, M.: Problems of learning on manifolds. Ph.D. thesis, The University of Chicago (2003). AAI3097083

    Google Scholar 

  12. A fast and powerful scraping and web crawling framework (2017). https://scrapy.org/

  13. Processing xml and html with python (2017). https://lxml.de/

  14. Natural language toolkit (2017). http://www.nltk.org/

  15. The Stanford natural language processing group (2017). https://nlp.stanford.edu/

  16. Efficient topic modelling of text semantics in python (2017). https://radimrehurek.com/gensim/index.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zengrong Zhan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhan, Z., Ma, Z. (2017). Document Analysis Based on Multi-view Intact Space Learning with Manifold Regularization. In: Sun, Y., Lu, H., Zhang, L., Yang, J., Huang, H. (eds) Intelligence Science and Big Data Engineering. IScIDE 2017. Lecture Notes in Computer Science(), vol 10559. Springer, Cham. https://doi.org/10.1007/978-3-319-67777-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67777-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67776-7

  • Online ISBN: 978-3-319-67777-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics