Skip to main content

An Automatic Extraction of Academia-Industry Collaborative Research and Development Documents on the Web

  • Conference paper
  • First Online:
  • 888 Accesses

Abstract

This research focuses on an automatic extraction method of Japanese documents describing University-Industry (U-I) relations from the Web. The method proposed here consists of a preprocessing step for Japanese texts and a classification step with a SVM. The feature selection process is especially tuned up for U-I relations documents. A U-I document extraction experiment has been conducted and the features found to be relevant for this task are discussed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.gnu.org/software/wget/

  2. 2.

    http://svmlight.joachims.org/

References

  • Aizawa A (2003) An information-theoretic perspective of tf-idf measures. Inf Process Manag 39(1):45–65. doi:10.1016/S0306-4573(02)00021-3

    Article  MathSciNet  MATH  Google Scholar 

  • Kudo T, Matsumoto Y (2002) Japanese dependency analysis using cascaded chunking. In: Roth D, van den Bosch A (eds) CoNLL-2002, Taipei, pp 63–69

    Google Scholar 

  • Kudo T, Yamamoto K, Matsumoto Y (2004) Applying conditional random fields to Japanese morphological analysis. In: 2004 conference on empirical methods in natural language processing (EMNLP-2004), Barcelona, pp 230–237. http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html

  • Leydesdorff L, Meyer M (2003) The triple helix of university-industry-government relations. Scientometrics 58(2):191–203

    Article  Google Scholar 

  • Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    Book  MATH  Google Scholar 

  • Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Fisher DH (ed) Proceedings of ICML-97, 14th international conference on machine learning, Nashville. Morgan Kaufmann Publishers, San Francisco, pp 412–420

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kei Kurakawa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kurakawa, K., Sun, Y., Yamashita, N., Baba, Y. (2014). An Automatic Extraction of Academia-Industry Collaborative Research and Development Documents on the Web. In: Gaul, W., Geyer-Schulz, A., Baba, Y., Okada, A. (eds) German-Japanese Interchange of Data Analysis Results. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01264-3_18

Download citation

Publish with us

Policies and ethics