Data Mining and Knowledge Discovery

, Volume 18, Issue 3, pp 419–445 | Cite as

A link mining algorithm for earnings forecast and trading

  • Germán Creamer
  • Sal Stolfo


The objective of this paper is to present and discuss a link mining algorithm called CorpInterlock and its application to the financial domain. This algorithm selects the largest strongly connected component of a social network and ranks its vertices using several indicators of distance and centrality. These indicators are merged with other relevant indicators in order to forecast new variables using a boosting algorithm. We applied the algorithm CorpInterlock to integrate the metrics of an extended corporate interlock (social network of directors and financial analysts) with corporate fundamental variables and analysts’ predictions (consensus). CorpInterlock used these metrics to forecast the trend of the cumulative abnormal return and earnings surprise of S&P 500 companies. The rationality behind this approach is that the corporate interlock has a direct effect on future earnings and returns because these variables affect directors and managers’ compensation. The financial analysts engage in what the agency theory calls the “earnings game”: Managers want to meet the financial forecasts of the analysts and analysts want to increase their compensation or business of the company that they follow. Following the CorpInterlock algorithm, we calculated a group of well-known social network metrics and integrated with economic variables using Logitboost. We used the results of the CorpInterlock algorithm to evaluate several trading strategies. We observed an improvement of the Sharpe ratio (risk-adjustment return) when we used “long only” trading strategies with the extended corporate interlock instead of the basic corporate interlock before the regulation Fair Disclosure (FD) was adopted (1998–2001). There was no major difference among the trading strategies after 2001. Additionally, the CorpInterlock algorithm implemented with Logitboost showed a significantly lower test error than when the CorpInterlock algorithm was implemented with logistic regression. We conclude that the CorpInterlock algorithm showed to be an effective forecasting algorithm and supported profitable trading strategies.


Link mining Social network Machine learning Computational finance Trading strategies Data mining applications 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abarbanell J (1991) Do analysts earnings forecasts incorporate information in prior stock price changes? J Account Econ 14: 147–165CrossRefGoogle Scholar
  2. Abarnabell J, Bernard V (1992) Tests of analysts’ overreaction/underreaction to earnings information as an explanation for anomalous stock price behavior. J Finance 47: 1181–1207CrossRefGoogle Scholar
  3. Asquith P, Mikhail MB, Au AS (2005) Information content of equity analyst reports. J Financ Econ 75: 245–282CrossRefGoogle Scholar
  4. Barabasi A (2002) Linked: the new science of networks. Perseus, Cambridge, MAGoogle Scholar
  5. Barber B, Lehavy R, McNichols M, Trueman B (2001) Can investors profit from the prophets? Security analysts recommendations and stock returns. J Finance 56: 531–563CrossRefGoogle Scholar
  6. Beckers S, Steliaros M, Thomson A (2004) Bias in European analysts’ earnings forecasts. Financ Anal J 60: 74–85CrossRefGoogle Scholar
  7. Bernard VL, Thomas JK (1990) Evidence that stock prices do not fully reflect the implications of current earnings for future earnings. J Account Econ 13Google Scholar
  8. Borgatti SP, Everett M (2006) A graph-theoretic perspective on centrality. Soc Netw 28: 466–484CrossRefGoogle Scholar
  9. Breiman L (2001) Random forests. Mach Learn 45: 5–32MATHCrossRefGoogle Scholar
  10. Breusch TS, Pagan A (1979) A simple test for heteroscedasticity and random coefficient variation. Econometrica 47: 1287–1294MATHCrossRefMathSciNetGoogle Scholar
  11. Brown LD (2000) I/B/E/S Research Bibliography, 6th edn. I/B/E/S International Incorporated.
  12. Brown LD (2001) How important is past analyst forecast accuracy?. Financ Anal J 57: 44–49CrossRefGoogle Scholar
  13. Brown LD, Han JCY, Keon EF Jr, Quinn WH (1996) Predicting analysts’ earnings surprise. J Invest 5: 17–23CrossRefGoogle Scholar
  14. Cessie SL, Houwelingen JCV (1992) Ridge estimators in logistic regression. Appl Stat 41: 191–201MATHCrossRefGoogle Scholar
  15. Clement M, Tse S (2005) Financial analyst characteristics and herding behavior in forecasting. J Finance 40: 307–341CrossRefGoogle Scholar
  16. Cohen L, Frazzini A, Malloy C (2008) Sell side school ties. Working paper, Harvard Business SchoolGoogle Scholar
  17. Collins M, Schapire RE, Singer Y (2004) Logistic regression, adaboost and Bregman distances. Mach Learn 48: 253–285CrossRefGoogle Scholar
  18. Creamer G, Freund Y (2004) Predicting performance and quantifying corporate governance risk for latin american adrs and banks. In: I Proceedings of the financial engineering and applications conference, MIT-CambridgeGoogle Scholar
  19. Creamer G, Freund Y (2005) Using adaboost for an equity investment/board balanced scorecard. In: Machine learning in finance workshop in NIPS 2005, Whistler, B.CGoogle Scholar
  20. Creamer G, Freund Y (2007) A boosting approach for automated trading. J Trading (Summer 2007):84–95Google Scholar
  21. Creamer G, Stolfo S (2006) A link mining algorithm for earnings forecast using boosting. In: Proceedings of the link analysis: dynamics and statics of large networks workshop on international conference on knowledge discovery and data mining (KDD), Philadelphia, PAGoogle Scholar
  22. Davis CE, Hyde JE, Bangdiwala S, Nelson J (1986) Modern statistical methods in chronic disease epidemiology, chapter An example of dependencies among variables in a conditional logistic regression. Wiley, New YorkGoogle Scholar
  23. Davis G (1991) Agents without principles? The spread of the poison pill through the intercorporate network. Adm Sci Q 36: 586–613CrossRefGoogle Scholar
  24. Davis G, Yoo M, Baker W (2003) The small world of the american corporate elite, 1982–2001. Strateg Organ 1: 301–326CrossRefGoogle Scholar
  25. de Nooy W, Mrvar A, Batagelj V (2005) Exploratory social network analysis with Pajek. Cambridge University Press, New YorkGoogle Scholar
  26. Dhar V, Chou D (2001) A comparison of nonlinear methods for predicting earnings surprises and returns. IEEE Trans Neural Netw 12: 907–921CrossRefGoogle Scholar
  27. Domingos P, Richardson M (2001) Mining the network value of customers. In: KDD ’01: proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, New York, NY, USA, pp 57–66Google Scholar
  28. Elton JE, Gruber MJ, Grossman S (1986) Discrete expectational data and portfolio performance. J Finance 41: 699–714CrossRefGoogle Scholar
  29. Fawcett T, Provost F (1999) Activity monitoring: noticing interesting changes in behavior. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (KDD-99), pp 53–62Google Scholar
  30. Finger CA, Landsman WR (1999) What do analysts’ stock recommendations really mean?, Working paper, University of Illinois and U.N.C., Chapel HillGoogle Scholar
  31. Freeman L (1979) Centrality in networks: I. conceptual clarification. Soc Netw 1: 215–239CrossRefGoogle Scholar
  32. Freund Y, Mason L (1999) The alternating decision tree learning algorithm. In: Machine learning: proceedings of the sixteenth international conference, pp 124–133Google Scholar
  33. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comp Sys Sci 55: 119–139MATHCrossRefMathSciNetGoogle Scholar
  34. Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 38: 337–374CrossRefMathSciNetGoogle Scholar
  35. Getoor L, Diehl CP (2005) Link mining: a survey. SIGKDD Explorations 7: 3–12CrossRefGoogle Scholar
  36. Goldberg HG, Kirkland JD, Lee D, Shyr P, Thakker D (2003) The NASD securities observation, news analysis and regulation system (sonar). In: IAAI 2003, Acapulco, MexicoGoogle Scholar
  37. Greene W (2007) Econometric analysis, 6th edn. Prentice Hall, Upper Saddle River, NJGoogle Scholar
  38. Hill S, Provost F, Volinsky C (2006) Network-based marketing: identifying likely adopters via consumer networks. Stat Sci 21: 256–276MATHCrossRefMathSciNetGoogle Scholar
  39. Hong HG, Kubik JD (2003) Analyzing the analysts: career concerns and biased earnings forecasts. J Finance 58: 313–351CrossRefGoogle Scholar
  40. Ivkovic Z, Jegadeesh N (2004) The timing and value of forecast and recommendation revisions. J Financ Econ 73: 433–463CrossRefGoogle Scholar
  41. Jegadeesh N, Kim J, Krische SD, Lee CMC (2004) Analyzing the analysts: when do recommendations add value?. J Finance 59: 1083–1124CrossRefGoogle Scholar
  42. Kirkland JD, Senator TE, Hayden JJ, Dybala TG, Goldberg H, Shyr P (1999) The nasd regulation advanced detection system (ads). AI Mag 20: 55–67Google Scholar
  43. Krische SD, Lee CMC (2000) The information content of analyst stock recommendations. Working paper, Cornell UniversityGoogle Scholar
  44. Larcker DF, Richardson SA, Seary AJ, Tuna I (2005) Back door links between directors and executive compensation. Working paperGoogle Scholar
  45. Lee CI, Rosenthal L, Gleason KC (2004) Effect of regulation FD on asymmetric information. Financ Anal J 60: 79–89CrossRefGoogle Scholar
  46. Leskovec J, Adamic LA, Huberman BA (2006) The dynamics of viral marketing. In: EC ’06: proceedings of the 7th ACM conference on electronic commerce, pp 228–237, ACM, New York, NY, USAGoogle Scholar
  47. Mendenhall RR (1991) Evidence on the possible underweighting of earnings information. J Account Res 29: 170–179CrossRefGoogle Scholar
  48. Mikhail MB, Walther B, Willis R (2002) Do security analysts exhibit persistent differences in stock picking ability?. J Financ Econ 74: 67–91CrossRefGoogle Scholar
  49. Milgram S (1967) The small world problem. Psychol Today 2: 60–67Google Scholar
  50. Mills C (1956) The power elite. Oxford Press, New YorkGoogle Scholar
  51. Mintz B, Schwartz M (1985) The power structure of American business. University of Chicago Press, ChicagoGoogle Scholar
  52. Mizruchi M (1992) The structure of corporate political action: interfirm relations and their consequences. Harvard University Press, Cambridge, MAGoogle Scholar
  53. Moreno J (1932) Application of the group method to classification. National committee on prisons and prison labor, New YorkGoogle Scholar
  54. Newman M, Strogatz S, Watts D (2001) Random graphs with arbitrary degree distributions and their applications. Phys Rev E 64Google Scholar
  55. Newman MEJ, Watts DJ, Strogatz SH (2002) Random graph models of social networks. Proc Natl Acad Sci USA 99(Suppl 1):2566–2572. doi: 10.1073/pnas.012582999 Google Scholar
  56. Ou JA, Penman SH (1989) Accounting measurement, price-earnings ratios, and the information content of security prices. J Account Res 27Google Scholar
  57. Peters D (1993a) Are earnings surprises predictable?. J Invest 2: 47–51CrossRefGoogle Scholar
  58. Peters D (1993b) The influences of size on earnings surprise predictability. J Invest 2: 54–59CrossRefGoogle Scholar
  59. Peterson D, Peterson P (1995) Abnormal returns and analysts earnings forecast revisions associated with the publication of ’stock highlights’ by value line investment survey. J Financ Res 18: 465–477Google Scholar
  60. Rao H, Davis G, Ward A (2000) Embeddedness, social identity and mobility: why firms leave the NASDAQ and join the New York Stock Exchange. Adm Sci 45: 268–292CrossRefGoogle Scholar
  61. Richardson M, Domingos P (2006) Markov logic networks. Mach Learn 62: 107–136CrossRefGoogle Scholar
  62. Senator TE (2005) Link mining applications: progress and challenges. SIGKDD Explor 7: 76–83CrossRefGoogle Scholar
  63. Sparrow M (1991) The application of network analysis to criminal intelligence: an assessment of the prospects. Soc Netw 13: 251–274CrossRefGoogle Scholar
  64. Stickel SE (1995) The anatomy of the performance of buy and sell recommendations. Financ Anal J 51: 25–39CrossRefGoogle Scholar
  65. Stober T (1992) Summary financial statements measures and analysts’ forecasts of earnings. J Account Econ 15: 347–372CrossRefGoogle Scholar
  66. Stolfo S, Creamer G, Hershkop S (2006) A temporal based forensic discovery of electronic communication. In: Proceedings of the national conference on digital government research, San Diego, CaliforniaGoogle Scholar
  67. Thaler R (2005) Advances in behavioral finance II. Princeton University Press, Princeton, NJGoogle Scholar
  68. Watts D (1999) Networks, dynamics, and the small-world phenomenon. Am J Sociol 105: 493–527CrossRefGoogle Scholar
  69. Watts D, Strogatz S (1998) Collective dynamics of small world networks. Nature 393: 440–442CrossRefGoogle Scholar
  70. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San FranciscoMATHGoogle Scholar
  71. Womack K (1996) Do brokerage analysts’ recommendations have investment value?. J Finance 51: 137–167CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Department of Computer ScienceColumbia UniversityNew YorkUSA
  2. 2.Centrum Catolica, Pontificia Universidad Católica del PeruLimaPeru

Personalised recommendations