Skip to main content

Extraction of Bi-graph Structures Among Multilingual Financial Words Using Text-Mining Methods

  • Chapter
  • First Online:
  • 751 Accesses

Part of the book series: Evolutionary Economics and Social Complexity Science ((EESCS,volume 9))

Abstract

Vector representation of words such as word2vec is an efficient method used in text mining. However, few papers have focused on multilingual studies. In this chapter, we present a comparative study on English and Japanese text data, investigating possible relationships between the two vector models in two languages. We first extract two word2vec models by using news resources spanning ten years and then cluster them on the basis of their cosine similarities for both Japanese and English. Second, we extract the words related to finance and create a dictionary in two languages based on the models obtained. Finally, we compare cross-lingual clusters with the help of the dictionary and attempt to establish relationships between English clusters and Japanese clusters.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Google Translate Service: https://translate.google.com/

  2. 2.

    http://stocktwits.com/

  3. 3.

    http://finance.yahoo.co.jp/

  4. 4.

    http://www.reuters.com/

  5. 5.

    Neologism dictionary implementation on Mecab-ipadic: https://github.com/neologd/mecab-ipadic-neologd

  6. 6.

    https://radimrehurek.com/gensim/

  7. 7.

    http://scikit-learn.org/stable/index.html

References

  • D. Arthur, S. Vassilvitskii, k-means++: the advantages of careful seeding, in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Society for Industrial and Applied Mathematics, 2007)

    Google Scholar 

  • A. Banerjee et al., Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)

    Google Scholar 

  • I.S. Dhillon, Y. Guan, J. Kogan, Refining clusters in high-dimensional text data, in Proceedings of the Workshop on Clustering High Dimensional Data and Its Applications at the Second SIAM International Conference on Data Mining (2002)

    Google Scholar 

  • T. Kudo, K. Yamamoto, Y. Matsumoto, Applying conditional random fields to Japanese morphological analysis, in EMNLP, vol. 4 (2004)

    Google Scholar 

  • T. Mikolov, W.-T. Yih, G. Zweig, Linguistic regularities in continuous space word representations, in HLT-NAACL, vol. 13 (2013)

    Google Scholar 

  • K. Taghva, R. Elkhoury, J.S. Coombs, Arabic stemming without a root dictionary, in ITCC, vol 1 (2005)

    Google Scholar 

  • M. Tomas, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, in Proceedings of Workshop at ICLR (2013a)

    Google Scholar 

  • M. Tomas, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Proceedings of NIPS (2013b)

    Google Scholar 

  • K. Toutanova et al., Feature-rich part-of-speech tagging with a cyclic dependency network, in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1 (Association for Computational Linguistics, 2003)

    Google Scholar 

  • T.-T. Vu et al., An experiment in integrating sentiment features for tech stock prediction in Twitter (2012), pp. 23–38

    Google Scholar 

  • X. Wan, Co-training for cross-lingual sentiment classification, in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1–vol. 1 (Association for Computational Linguistics, 2009)

    Google Scholar 

Download references

Acknowledgements

We are grateful to StockTwits Inc. and Yahoo Japan Corporation for providing the textual data. This study is partially supported by JSPS KAKENHI Grant Numbers JP26282089 and JP15H02745.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kiyoshi Izumi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this chapter

Cite this chapter

Liu, E., Ito, T., Izumi, K., Tsubouchi, K., Yamashita, T. (2017). Extraction of Bi-graph Structures Among Multilingual Financial Words Using Text-Mining Methods. In: Aruka, Y., Kirman, A. (eds) Economic Foundations for Social Complexity Science. Evolutionary Economics and Social Complexity Science, vol 9. Springer, Singapore. https://doi.org/10.1007/978-981-10-5705-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-5705-2_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-5704-5

  • Online ISBN: 978-981-10-5705-2

  • eBook Packages: Economics and FinanceEconomics and Finance (R0)

Publish with us

Policies and ethics