Abstract
In this paper, We propose a formalized method to improve the performance of Contextual Anomaly Detection (CAD) for detecting stock market manipulation using Big Data techniques. The method aims to improve the CAD algorithm by capturing the expected behaviour of stocks through sentiment analysis of tweets about stocks. The extracted insights are aggregated per day for each stock and transformed to a time series. The time series is used to eliminate false positives from anomalies that are detected by CAD. We present a case study and explore developing sentiment analysis models to improve anomaly detection in the stock market. The experimental results confirm the proposed method is effective in improving CAD through removing irrelevant anomalies by correctly identifying 28% of false positives.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
Bill C-46 (Criminal Code, RSC 1985, c C-46, s 382, 1985).
- 3.
Section 9(a)(2) of the Securities Exchange Act (SECURITIES EXCHANGE ACT OF 1934, 2012).
- 4.
The firehose access on Streaming API provides access to all tweets. This is very expensive and available upon case-by-case requests from Twitter.
- 5.
- 6.
- 7.
\({TP}/({TP+FP})\).
- 8.
\(TP/(TP+FN)\).
- 9.
- 10.
References
Antenucci, D., Cafarella, M., Levenstein, M., Ré, C., Shapiro, M.D.: Using social media to measure labor market flows. Technical report, National Bureau of Economic Research (2014)
Asur, S., Huberman, B.A.: Predicting the future with social media. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 1, pp. 492–499. IEEE (2010)
Bartov, E., Faurel, L., Mohanram, P.S.: Can twitter help predict firm-level earnings and stock returns? Available at SSRN 2782236 (2016)
Bing, L.: Sentiment Analysis: A Fascinating Problem, pp. 7–143. Morgan and Claypool Publishers (2012)
Bollen, J., Mao, H., Pepe, A.: Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. ICWSM 11, 450–453 (2011)
Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection for discrete sequences: a survey. IEEE Trans. Knowl. Data Eng. 24(5), 823–839 (2012). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5645624
Daumé III, H.: Notes on CG and LM-BFGS optimization of logistic regression, pp. 1–7 (2004). https://www.umiacs.umd.edu/hal/docs/daume04cg-bfgs.pdf
Dillon, M.: Introduction to Modern Information Retrieval: G. Salton and M. Mcgill (1983)
Feldman, R., Rosenfeld, B., Bar-Haim, R., Fresko, M.: The stock sonar—sentiment analysis of stocks based on a hybrid approach. In: Twenty-Third IAAI Conference, pp. 1642–1647 (2011)
Ferdousi, Z., Maeda, A.: Unsupervised Outlier Detection in Time Series Data, p. 121. IEEE (2006). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1623916
Golmohammadi, K., Zaiane, O.R.: Time series contextual anomaly detection for detecting market manipulation in stock market. In: The 2015 Data Science and Advanced Analytics (DSAA 2015), pp. 1–10. IEEE (2015)
Graham, M., Hale, S.A., Gaffney, D.: Where in the world are you? Geolocation and language identification in Twitter. Prof. Geogr. 66(4), 568–578 (2014)
King, G.: Ensuring the data-rich future of the social sciences. Science 331(6018), 719–721 (2011)
Lin, J., Keogh, E., Fu, A., Herle, H.: Approximations to magic: finding unusual medical time series, pp. 329–334. IEEE (2005)
Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Tech. 5(1), 1–167 (2012)
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland (2011). http://www.aclweb.org/anthology/pp. 11-1015
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval, pp. 405–416. Cambridge University Press (2008). Chap. 20
Mao, Y., Wei, W., Wang, B., Liu, B.: Correlating S&P 500 stocks with Twitter data. In: Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research, pp. 69–72. ACM (2012)
Morinaga, S., Yamanishi, K., Tateishi, K., Fukushima, T.: Mining product reputations on the web. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 341–349. ACM (2002)
Ruiz, E.J., Hristidis, V., Castillo, C., Gionis, A., Jaimes, A.: Correlating financial time series with micro-blogging activity. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 513–522. ACM (2012)
Schumaker, R.P., Chen, H.: Textual analysis of stock market prediction using breaking financial news: the Azfin text system. ACM Trans. Inf. Syst. (TOIS) 27(2), 12 (2009)
Song, Y., Cao, L., Wu, X., Wei, G., Ye, W., Ding, W.: Coupled behavior analysis for capturing coupling relationships in group-based market manipulations. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 976–984. ACM (2012)
Sriastava, A., et al.: Discovering system health anomalies using data mining techniques, pp. 1–7 (2005)
Wei, L., Keogh, E., Xi, X.: Saxually explicit images: finding unusual shapes. In: 2006 Proceedings of the Sixth International Conference on Data Mining, ICDM 2006, pp. 711–720. IEEE (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Golmohammadi, K., Zaiane, O.R. (2017). Sentiment Analysis on Twitter to Improve Time Series Contextual Anomaly Detection for Detecting Stock Market Manipulation. In: Bellatreche, L., Chakravarthy, S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science(), vol 10440. Springer, Cham. https://doi.org/10.1007/978-3-319-64283-3_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-64283-3_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64282-6
Online ISBN: 978-3-319-64283-3
eBook Packages: Computer ScienceComputer Science (R0)