Bootstrapping Yahoo! Finance by Wikipedia for Competitor Mining

  • Tong RuanEmail author
  • Lijuan Xue
  • Haofen Wang
  • Jeff Z. Pan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9544)


Competitive intelligence, one of the key factors of enterprise risk management and decision support, depends on knowledge bases that contain a large amount of competitive information. A variety of finance websites have collected competitive information manually, which can be used as knowledge bases. Yahoo! Finance is one of the largest and most successful finance websites among them. However, they have problems of incompleteness, lack of competitive domain, and not-in-time updating. Wikipedia, which was built with collective wisdom and contains plenty of useful information in various forms, can solve the above-mentioned problems effectively, thus helping build a more comprehensive knowledge base. In this paper, we propose a novel semi-supervised approach to identify competitor information and competitive domain from Wikipedia based on a multi-strategy learning algorithm. More precisely, we leverage seeds of competition between companies and competition between products to distantly supervise the learning process to find text patterns in free texts. Considering that competitive information can be inferred from events, we design a learning-based method to determine event description sentences. The whole process is iteratively performed. The experimental results show the effectiveness of our approach. Moreover, the results extracted from Wikipedia supplement 14,000 competitor pairs and 8,000 competitive domains between rival companies to Yahoo! Finance.


Competitor mining Multi-strategy learning Distant supervision Relation reasoning 


  1. 1.
    Ma, Z., Pant, G., Sheng, O.R.: Mining competitor relationships from online news: a network-based approach. Electron. Commer. Res. Appl. 10(4), 418–427 (2011)CrossRefGoogle Scholar
  2. 2.
    Bao, S., Li, R., Yu, Y., Cao, Y.: Competitor mining with the web. IEEE Trans. Knowl. Data Eng. 20(10), 1297–1310 (2008)CrossRefGoogle Scholar
  3. 3.
    Xu, K., Liao, S.S., Li, J., Song, Y.: Mining comparative opinions from customer reviews for competitive intelligence. Decis. Support Syst. 50(4), 743–754 (2011)CrossRefGoogle Scholar
  4. 4.
    Lappas, T., Valkanas, G., Gunopulos, D.: Efficient and domain-invariant competitor mining. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 408–416. ACM (2012)Google Scholar
  5. 5.
    Wan, Q., Wong, R.C.W., Peng, Y.: Finding top-k profitable products. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 1055–1066. IEEE (2011)Google Scholar
  6. 6.
    Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction for the web. IJCAI 7, 2670–2676 (2007)Google Scholar
  7. 7.
    Wu, F., Weld, D.S.: Open information extraction using wikipedia. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 118–127. Association for Computational Linguistics (2010)Google Scholar
  8. 8.
    Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics (2011)Google Scholar
  9. 9.
    Schmitz, M., Bart, R., Soderland, S., Etzioni, O., et al.: Open language learning for information extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 523–534. Association for Computational Linguistics (2012)Google Scholar
  10. 10.
    Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. AAAI 5, 3 (2010)Google Scholar
  11. 11.
    Suchanek, F.M., Sozio, M., Weikum, G.: SOFIE: a self-organizing framework for information extraction. In: Proceedings of the 18th International Conference on World Wide Web, pp. 631–640. ACM (2009)Google Scholar
  12. 12.
    Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 227–236. ACM (2011)Google Scholar
  13. 13.
    Gentile, A.L., Zhang, Z., Ciravegna, F.: Web scale information extraction with lodie. In: 2013 AAAI Fall Symposium Series (2013)Google Scholar
  14. 14.
    Ruan, T., Lin, Y., Wang, H., Pan, J.Z.: A multi-strategy learning approach to competitor identification. In: Supnithi, T., Yamaguchi, T., Pan, J.Z., Wuwongse, V., Buranarach, M. (eds.) JIST 2014. LNCS, vol. 8943, pp. 197–212. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  15. 15.
    Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, pp. 1003–1011. Association for Computational Linguistics (2009)Google Scholar
  16. 16.
    Roth, B., Barth, T., Wiegand, M., Singh, M., Klakow, D.: Effective slot filling based on shallow distant supervision methods (2014). arXiv preprint arXiv:1401.1158

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Tong Ruan
    • 1
    Email author
  • Lijuan Xue
    • 1
  • Haofen Wang
    • 1
  • Jeff Z. Pan
    • 2
  1. 1.East China University of Science and TechnologyShanghaiChina
  2. 2.The University of AberdeenAberdeenScotland

Personalised recommendations