Abstract
The present paper deals with a special Random Forest Data Mining technique, designed to alleviate the significant issue of high dimensionality in volatile and complex domains, such as stock market prediction. Since it has been widely acceptable that media affect the behavior of investors, information from both technical analysis as well as textual data from various on-line financial news resources are considered. Different experiments are carried out to evaluate different aspects of the problem, returning satisfactory results. The results show that the trading strategies guided by the proposed data mining approach generate higher profits than the buy-and-hold strategy, as well as those guided by the level-estimation based forecasts of standard linear regression models and other machine learning classifiers such as Support Vector Machines, ordinary Random Forests and Neural Networks.
Chapter PDF
References
Technical-Analysis. The Trader’s Glossary of Technical Terms and Topics (2005), http://www.traders.com
Ng, A., Fu, A.W.: Mining Frequent Episodes for Relating Financial Events and Stock Trends. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 27–39. Springer, Heidelberg (2003)
Breiman, L.: Random forests. Machine Learning Journal 45, 532 (2001)
Chung, F., Fu, T., Luk, R., Ng, V.: Evolutionary Time Series Segmentation for Stock Data Mining. In: Proceedings of IEEE International Conference on Data Mining, pp. 83–91 (2002)
Klibanoff, P., Laymont, O., Wizman, T.A.: Investor reaction to Salient News in Closed-end Country Funds. Journal of Finance 53(2), 673–699 (1998)
Chan, Y., John-Wei, K.C.: Political Risk and Stock Price Volatility: The Case of Hong-Kong. Pacific-Basin Finance Journal 4(2-3), 259–275 (1996)
Mitchell, M.L., Mulherin, J.H.: The Impact of Public Information on the Stock Market. Journal of Finance 49(3), 923–950
Mittermayer, M.A.: Forecasting Intraday Stock Price Trends with Text Mining Techniques. In: Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICS), vol. 3(3), p. 30064.2. IEEE Computer Society, Los Alamitos (2004)
Shumaker, R.P., Chen, H.: Textual Analysis of Stock Market Prediction Using Financial News Articles. In: On the 12th American Conference on Information Systems, AMCIS (2006)
Díaz-Uriarte, R., de Andrés, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)
Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence journal, special issue on relevance 97(1-2), 273–324 (1997)
Cooper, G.F., Herskovits, E.: A Bayesian Method for the Induction of Probabilistic Networks from Data. In: Machine Learning, vol. 9, pp. 309–347. Kluwer Academic Publishers, Boston (1992)
Strobl, C., et al.: Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics 8, 25 (2007)
Lyras, D.P., Sgarbas, K.N., Fakotakis, N.D.: Using the Levenshtein Edit Distance for Automatic Lemmatization: A Case Study for Modern Greek and English. In: 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), vol. 2, pp. 428–435 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 IFIP
About this paper
Cite this paper
Maragoudakis, M., Serpanos, D. (2010). Towards Stock Market Data Mining Using Enriched Random Forests from Textual Resources and Technical Indicators. In: Papadopoulos, H., Andreou, A.S., Bramer, M. (eds) Artificial Intelligence Applications and Innovations. AIAI 2010. IFIP Advances in Information and Communication Technology, vol 339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16239-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-16239-8_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16238-1
Online ISBN: 978-3-642-16239-8
eBook Packages: Computer ScienceComputer Science (R0)