Extraction of Bi-graph Structures Among Multilingual Financial Words Using Text-Mining Methods

Liu, Enda; Ito, Tomoki; Izumi, Kiyoshi; Tsubouchi, Kota; Yamashita, Tatsuo

doi:10.1007/978-981-10-5705-2_9

Extraction of Bi-graph Structures Among Multilingual Financial Words Using Text-Mining Methods

Enda Liu¹⁷,
Tomoki Ito¹⁷,
Kiyoshi Izumi¹⁷,
Kota Tsubouchi¹⁸ &
…
Tatsuo Yamashita¹⁸

Chapter
First Online: 28 September 2017

751 Accesses

Part of the book series: Evolutionary Economics and Social Complexity Science ((EESCS,volume 9))

Abstract

Vector representation of words such as word2vec is an efficient method used in text mining. However, few papers have focused on multilingual studies. In this chapter, we present a comparative study on English and Japanese text data, investigating possible relationships between the two vector models in two languages. We first extract two word2vec models by using news resources spanning ten years and then cluster them on the basis of their cosine similarities for both Japanese and English. Second, we extract the words related to finance and create a dictionary in two languages based on the models obtained. Finally, we compare cross-lingual clusters with the help of the dictionary and attempt to establish relationships between English clusters and Japanese clusters.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Google Translate Service: https://translate.google.com/
2.
http://stocktwits.com/
3.
http://finance.yahoo.co.jp/
4.
http://www.reuters.com/
5.
Neologism dictionary implementation on Mecab-ipadic: https://github.com/neologd/mecab-ipadic-neologd
6.
https://radimrehurek.com/gensim/
7.
http://scikit-learn.org/stable/index.html

References

D. Arthur, S. Vassilvitskii, k-means++: the advantages of careful seeding, in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Society for Industrial and Applied Mathematics, 2007)
Google Scholar
A. Banerjee et al., Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)
Google Scholar
I.S. Dhillon, Y. Guan, J. Kogan, Refining clusters in high-dimensional text data, in Proceedings of the Workshop on Clustering High Dimensional Data and Its Applications at the Second SIAM International Conference on Data Mining (2002)
Google Scholar
T. Kudo, K. Yamamoto, Y. Matsumoto, Applying conditional random fields to Japanese morphological analysis, in EMNLP, vol. 4 (2004)
Google Scholar
T. Mikolov, W.-T. Yih, G. Zweig, Linguistic regularities in continuous space word representations, in HLT-NAACL, vol. 13 (2013)
Google Scholar
K. Taghva, R. Elkhoury, J.S. Coombs, Arabic stemming without a root dictionary, in ITCC, vol 1 (2005)
Google Scholar
M. Tomas, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, in Proceedings of Workshop at ICLR (2013a)
Google Scholar
M. Tomas, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Proceedings of NIPS (2013b)
Google Scholar
K. Toutanova et al., Feature-rich part-of-speech tagging with a cyclic dependency network, in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1 (Association for Computational Linguistics, 2003)
Google Scholar
T.-T. Vu et al., An experiment in integrating sentiment features for tech stock prediction in Twitter (2012), pp. 23–38
Google Scholar
X. Wan, Co-training for cross-lingual sentiment classification, in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1–vol. 1 (Association for Computational Linguistics, 2009)
Google Scholar

Download references

Acknowledgements

We are grateful to StockTwits Inc. and Yahoo Japan Corporation for providing the textual data. This study is partially supported by JSPS KAKENHI Grant Numbers JP26282089 and JP15H02745.

Author information

Authors and Affiliations

Department of Systems Innovation, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo, 113-8656, Tokyo, Japan
Enda Liu, Tomoki Ito & Kiyoshi Izumi
Yahoo Japan Corporation, Tokyo, Japan
Kota Tsubouchi & Tatsuo Yamashita

Authors

Enda Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tomoki Ito
View author publications
You can also search for this author in PubMed Google Scholar
Kiyoshi Izumi
View author publications
You can also search for this author in PubMed Google Scholar
Kota Tsubouchi
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuo Yamashita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kiyoshi Izumi .

Editor information

Editors and Affiliations

Faculty of Commerce, Chuo University, Hachioji, Tokyo, Japan
Yuji Aruka
Directeur d’études à l’EHESS, Paris, Professeur Emerite Aix-Marseille Université, Aix-en-Provence, France
Alan Kirman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Liu, E., Ito, T., Izumi, K., Tsubouchi, K., Yamashita, T. (2017). Extraction of Bi-graph Structures Among Multilingual Financial Words Using Text-Mining Methods. In: Aruka, Y., Kirman, A. (eds) Economic Foundations for Social Complexity Science. Evolutionary Economics and Social Complexity Science, vol 9. Springer, Singapore. https://doi.org/10.1007/978-981-10-5705-2_9

Download citation

DOI: https://doi.org/10.1007/978-981-10-5705-2_9
Published: 28 September 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5704-5
Online ISBN: 978-981-10-5705-2
eBook Packages: Economics and FinanceEconomics and Finance (R0)

Publish with us

Policies and ethics