Improvement Method for Topic-Based Path Model by Using Word2vec
Studying purchasing factor for product developers in the market place is important. Using text data, such as comments from consumers, for factor analysis is a valid method. However, previous research show that generating a stable model for factor analysis using text data is difficult. We assume that if the target text data are handled well, then the analysis can progress smoothly. This study proposes pre-processing text data by word2vec for factor analysis to improve the analysis. Word2vec regards words as vectors in text. Our proposed process is effective, because variables are expressed as the frequency of words in the analysis model. Experiment results also show that our proposed method is helpful in generating an analytical model.
KeywordsCausal analysis Data ming Text mining Topic model Structural equation modeling Word2vec
This work was supported by KAKENHI 25240049.
- 1.S. Kawanaka, A. Miyata, R. Higashinaka, T. Hoshide, K. Fujimura, Computer analysis of consumer situations utilizing topic model, in 25th Annual Conference of the Japanese Society for Article Intelligence (2011)Google Scholar
- 2.K. Wajima, T. Ogawa, T. Furukawa, S. Shimoda, Specific Negative Factors Using Latent Dirichlet Allocation, DEIM Forum, A9–3 (2014)Google Scholar
- 4.R. Saga, S. Nohara, Factor analysis of investment judgment in crowdfunding using structural equation modeling, in The Fourth Asian Conference on Information Systems (2015)Google Scholar
- 5.S. Nohara, R. Saga, Preprocessing method topic-based path model by using Word2vec, in Proceedings of The International MultiConference of Engineers and Computer Scientists 2017. Lecture Notes in Engineering and Computer Science, pp. 15–17, Mar 2017, Hong Kong, pp. 317–320Google Scholar
- 6.R. Saga, T. Fujita, K. Kitami, K. Matsumoto, Improvement of factor model with text information based on factor model construction process, in IIMSS, 2013, pp 222–230Google Scholar
- 8.T. Mikolov, K. Chen, G.S. Corrado, J. Dean, Efficient Estimation of Word Representations in Vector Space, CoRR (2013). arXiv:1301.3781
- 9.T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Districted representations of words and phrases and their compositionality, in 27th Annual Conference on Neural Information Processing Systems. Advances in Neural Information Processing Systems 26. Proceeding of a meeting held December 5–8, Lake Tahoe, Nevada, United States (2013), pp. 3111–3119Google Scholar
- 10.Kickstarter, https://www.kickstarter.com/
- 11.MALLET: A Machine Learning for Language Toolkit, http://mallet.cs.umass.edu
- 12.The R Project for Statistical Computing, http://www.r-project.org/
- 14.Genism: A Topic Modeling Free Python Library, https://radimrehurek.com/gensim/index.html