Abstract
Vector representation of words such as word2vec is an efficient method used in text mining. However, few papers have focused on multilingual studies. In this chapter, we present a comparative study on English and Japanese text data, investigating possible relationships between the two vector models in two languages. We first extract two word2vec models by using news resources spanning ten years and then cluster them on the basis of their cosine similarities for both Japanese and English. Second, we extract the words related to finance and create a dictionary in two languages based on the models obtained. Finally, we compare cross-lingual clusters with the help of the dictionary and attempt to establish relationships between English clusters and Japanese clusters.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Google Translate Service: https://translate.google.com/
- 2.
- 3.
- 4.
- 5.
Neologism dictionary implementation on Mecab-ipadic: https://github.com/neologd/mecab-ipadic-neologd
- 6.
- 7.
References
D. Arthur, S. Vassilvitskii, k-means++: the advantages of careful seeding, in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (Society for Industrial and Applied Mathematics, 2007)
A. Banerjee et al., Clustering on the unit hypersphere using von Mises-Fisher distributions. J. Mach. Learn. Res. 6, 1345–1382 (2005)
I.S. Dhillon, Y. Guan, J. Kogan, Refining clusters in high-dimensional text data, in Proceedings of the Workshop on Clustering High Dimensional Data and Its Applications at the Second SIAM International Conference on Data Mining (2002)
T. Kudo, K. Yamamoto, Y. Matsumoto, Applying conditional random fields to Japanese morphological analysis, in EMNLP, vol. 4 (2004)
T. Mikolov, W.-T. Yih, G. Zweig, Linguistic regularities in continuous space word representations, in HLT-NAACL, vol. 13 (2013)
K. Taghva, R. Elkhoury, J.S. Coombs, Arabic stemming without a root dictionary, in ITCC, vol 1 (2005)
M. Tomas, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, in Proceedings of Workshop at ICLR (2013a)
M. Tomas, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Proceedings of NIPS (2013b)
K. Toutanova et al., Feature-rich part-of-speech tagging with a cyclic dependency network, in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1 (Association for Computational Linguistics, 2003)
T.-T. Vu et al., An experiment in integrating sentiment features for tech stock prediction in Twitter (2012), pp. 23–38
X. Wan, Co-training for cross-lingual sentiment classification, in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1–vol. 1 (Association for Computational Linguistics, 2009)
Acknowledgements
We are grateful to StockTwits Inc. and Yahoo Japan Corporation for providing the textual data. This study is partially supported by JSPS KAKENHI Grant Numbers JP26282089 and JP15H02745.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Liu, E., Ito, T., Izumi, K., Tsubouchi, K., Yamashita, T. (2017). Extraction of Bi-graph Structures Among Multilingual Financial Words Using Text-Mining Methods. In: Aruka, Y., Kirman, A. (eds) Economic Foundations for Social Complexity Science. Evolutionary Economics and Social Complexity Science, vol 9. Springer, Singapore. https://doi.org/10.1007/978-981-10-5705-2_9
Download citation
DOI: https://doi.org/10.1007/978-981-10-5705-2_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5704-5
Online ISBN: 978-981-10-5705-2
eBook Packages: Economics and FinanceEconomics and Finance (R0)