Skip to main content

Sentiment Analysis on Twitter to Improve Time Series Contextual Anomaly Detection for Detecting Stock Market Manipulation

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10440))

Abstract

In this paper, We propose a formalized method to improve the performance of Contextual Anomaly Detection (CAD) for detecting stock market manipulation using Big Data techniques. The method aims to improve the CAD algorithm by capturing the expected behaviour of stocks through sentiment analysis of tweets about stocks. The extracted insights are aggregated per day for each stock and transformed to a time series. The time series is used to eliminate false positives from anomalies that are detected by CAD. We present a case study and explore developing sentiment analysis models to improve anomaly detection in the stock market. The experimental results confirm the proposed method is effective in improving CAD through removing irrelevant anomalies by correctly identifying 28% of false positives.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://data.worldbank.org/indicator/CM.MKT.LCAP.CD.

  2. 2.

    Bill C-46 (Criminal Code, RSC 1985, c C-46, s 382, 1985).

  3. 3.

    Section 9(a)(2) of the Securities Exchange Act (SECURITIES EXCHANGE ACT OF 1934, 2012).

  4. 4.

    The firehose access on Streaming API provides access to all tweets. This is very expensive and available upon case-by-case requests from Twitter.

  5. 5.

    https://dev.twitter.com/rest/reference/get/search/tweets.

  6. 6.

    http://stocktwits.com/.

  7. 7.

    \({TP}/({TP+FP})\).

  8. 8.

    \(TP/(TP+FN)\).

  9. 9.

    http://www.imdb.com/reviews/.

  10. 10.

    http://stocktwits.com/.

References

  1. Antenucci, D., Cafarella, M., Levenstein, M., Ré, C., Shapiro, M.D.: Using social media to measure labor market flows. Technical report, National Bureau of Economic Research (2014)

    Google Scholar 

  2. Asur, S., Huberman, B.A.: Predicting the future with social media. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 1, pp. 492–499. IEEE (2010)

    Google Scholar 

  3. Bartov, E., Faurel, L., Mohanram, P.S.: Can twitter help predict firm-level earnings and stock returns? Available at SSRN 2782236 (2016)

    Google Scholar 

  4. Bing, L.: Sentiment Analysis: A Fascinating Problem, pp. 7–143. Morgan and Claypool Publishers (2012)

    Google Scholar 

  5. Bollen, J., Mao, H., Pepe, A.: Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. ICWSM 11, 450–453 (2011)

    Google Scholar 

  6. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)

    Article  Google Scholar 

  7. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection for discrete sequences: a survey. IEEE Trans. Knowl. Data Eng. 24(5), 823–839 (2012). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5645624

    Article  Google Scholar 

  8. Daumé III, H.: Notes on CG and LM-BFGS optimization of logistic regression, pp. 1–7 (2004). https://www.umiacs.umd.edu/hal/docs/daume04cg-bfgs.pdf

  9. Dillon, M.: Introduction to Modern Information Retrieval: G. Salton and M. Mcgill (1983)

    Google Scholar 

  10. Feldman, R., Rosenfeld, B., Bar-Haim, R., Fresko, M.: The stock sonar—sentiment analysis of stocks based on a hybrid approach. In: Twenty-Third IAAI Conference, pp. 1642–1647 (2011)

    Google Scholar 

  11. Ferdousi, Z., Maeda, A.: Unsupervised Outlier Detection in Time Series Data, p. 121. IEEE (2006). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1623916

  12. Golmohammadi, K., Zaiane, O.R.: Time series contextual anomaly detection for detecting market manipulation in stock market. In: The 2015 Data Science and Advanced Analytics (DSAA 2015), pp. 1–10. IEEE (2015)

    Google Scholar 

  13. Graham, M., Hale, S.A., Gaffney, D.: Where in the world are you? Geolocation and language identification in Twitter. Prof. Geogr. 66(4), 568–578 (2014)

    Article  Google Scholar 

  14. King, G.: Ensuring the data-rich future of the social sciences. Science 331(6018), 719–721 (2011)

    Article  Google Scholar 

  15. Lin, J., Keogh, E., Fu, A., Herle, H.: Approximations to magic: finding unusual medical time series, pp. 329–334. IEEE (2005)

    Google Scholar 

  16. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Tech. 5(1), 1–167 (2012)

    MathSciNet  Google Scholar 

  17. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. Association for Computational Linguistics, Portland (2011). http://www.aclweb.org/anthology/pp. 11-1015

  18. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval, pp. 405–416. Cambridge University Press (2008). Chap. 20

    Google Scholar 

  19. Mao, Y., Wei, W., Wang, B., Liu, B.: Correlating S&P 500 stocks with Twitter data. In: Proceedings of the First ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research, pp. 69–72. ACM (2012)

    Google Scholar 

  20. Morinaga, S., Yamanishi, K., Tateishi, K., Fukushima, T.: Mining product reputations on the web. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 341–349. ACM (2002)

    Google Scholar 

  21. Ruiz, E.J., Hristidis, V., Castillo, C., Gionis, A., Jaimes, A.: Correlating financial time series with micro-blogging activity. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 513–522. ACM (2012)

    Google Scholar 

  22. Schumaker, R.P., Chen, H.: Textual analysis of stock market prediction using breaking financial news: the Azfin text system. ACM Trans. Inf. Syst. (TOIS) 27(2), 12 (2009)

    Article  Google Scholar 

  23. Song, Y., Cao, L., Wu, X., Wei, G., Ye, W., Ding, W.: Coupled behavior analysis for capturing coupling relationships in group-based market manipulations. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 976–984. ACM (2012)

    Google Scholar 

  24. Sriastava, A., et al.: Discovering system health anomalies using data mining techniques, pp. 1–7 (2005)

    Google Scholar 

  25. Wei, L., Keogh, E., Xi, X.: Saxually explicit images: finding unusual shapes. In: 2006 Proceedings of the Sixth International Conference on Data Mining, ICDM 2006, pp. 711–720. IEEE (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Koosha Golmohammadi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Golmohammadi, K., Zaiane, O.R. (2017). Sentiment Analysis on Twitter to Improve Time Series Contextual Anomaly Detection for Detecting Stock Market Manipulation. In: Bellatreche, L., Chakravarthy, S. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2017. Lecture Notes in Computer Science(), vol 10440. Springer, Cham. https://doi.org/10.1007/978-3-319-64283-3_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64283-3_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64282-6

  • Online ISBN: 978-3-319-64283-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics