Skip to main content

Competitor Mining from Web Encyclopedia: A Graph Embedding Approach

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2020 (WISE 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12342))

Included in the following conference series:

Abstract

Mining competitors from the web has been a valuable and emerging topic in big data and business analytics. While normal web pages may include incredible information like fake news, in this paper, we aim to extract competitors from web encyclopedia like Wikipedia and DBpedia, which provide more credible information. We notice that the entities in web encyclopedia can form graph structures. Motivated by this observation, we propose to extract competitors by employing a graph embedding approach. We first present a general framework for mining competitors from web encyclopedia. Then, we propose to mine competitors based on the similarity among graph nodes and further present a similarity computation method combing graph-node similarity and textual relevance. We implement the graph-embedding-based algorithm and compare the proposed method with four existing algorithms on the real data sets crawled from Wikipedia and DBpedia. The results in terms of precision, recall, and F1-measure suggest the effectiveness of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bao, S., Li, R., Yu, Y., Cao, Y.: Competitor Mining with the Web. IEEE Trans. Knowl. Data Eng. 20(10), 1297–1310 (2008)

    Article  Google Scholar 

  2. Bondarenko, A., et al.: Comparative web search questions. WSDM, 52–60 (2020)

    Google Scholar 

  3. Zhao, J., Jin, P.: Conceptual modeling for competitive intelligence hiding in the internet. J. Softw. 5(4), 378–386 (2010)

    Google Scholar 

  4. Zhao, J., Jin, P.: Towards the extraction of intelligence about competitor from the web. In: Lytras, M.D., et al. (eds.) Visioning and Engineering the Knowledge Society. A Web Science Perspective. Lecture Notes in Computer Science, vol. 5736, pp. 118–127. Springer, Berlin, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04754-1_13

    Chapter  Google Scholar 

  5. Chen, X., Wu, Y.: Web mining from competitors’ websites. In: KDD, pp. 550–555 (2005)

    Google Scholar 

  6. Li, S., Lin, C., Song, Y., Li, Z.: Comparable Entity mining from comparative questions. In: ACL, pp. 650–658 (2010)

    Google Scholar 

  7. Ruan, T., Xue, L., Wang, H., Pan, J.: Bootstrapping yahoo! finance by wikipedia for competitor mining. In: Qi, G., Kozaki, K., Pan, J., Yu, S. (eds.) Semantic Technology. Lecture Notes in Computer Science, vol. 9544, pp. 108–126. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-31676-5_8

    Chapter  Google Scholar 

  8. Lange, D., Böhm, C., Naumann, F.: Extracting Structured Information from Wikipedia Articles to Populate Infoboxes. CIKM, pp. 1661–1664 (2010)

    Google Scholar 

  9. Haidar-Ahmad, L., Zouaq, A., Gagnon, M.: Automatic extraction of axioms from wikipedia using SPARQL. In: Sack, H., et al. (eds.) The Semantic Web. ESWC 2016. Lecture Notes in Computer Science, vol. 9989, pp. 60–64. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_13

    Chapter  Google Scholar 

  10. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. KDD, 701–710 (2014)

    Google Scholar 

  11. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. KDD, 855–864 (2016)

    Google Scholar 

  12. Dong, Y., Chawla, N., Swami, A.: metapath2vec: scalable representation learning for heterogeneous networks. KDD, 135–144 (2017)

    Google Scholar 

  13. Haghighat, M., Li, J.: Toward fast regex pattern matching using simple patterns. In: ICPADS, pp. 662–670 (2018)

    Google Scholar 

  14. Hill, B., Shaw, A.: Consider the redirect: a missing dimension of wikipedia research. OpenSym 28(1–28), 4 (2014)

    Google Scholar 

  15. Tamir, R.: A random walk through human associations. In: ICDM, pp. 442–449 (2005)

    Google Scholar 

  16. Pickhardt, R., et al.: A generalized language model as the combination of skipped n-grams and modified Kneser Ney smoothing. In: ACL, pp. 1145–1154 (2014)

    Google Scholar 

  17. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)

    Google Scholar 

  18. Sun, Y., Han, J., Yan, X., Yu, P., Wu, T.: PathSim: meta path-based top-K similarity search in heterogeneous information networks. PVLDB 4(11), 992–1003 (2011)

    Google Scholar 

  19. Ni, C., Liu, K., Torzec, N.: Layered graph embedding for entity recommendation using wikipedia in the yahoo! knowledge graph. In: WWW, pp. 811–818 (2020)

    Google Scholar 

  20. Fu, T., Lee, W., Lei, Z.: HIN2Vec: explore meta-paths in heterogeneous information networks for representation learning. In: CIKM, pp. 1786–1806 (2017)

    Google Scholar 

  21. Zhao, J., Jin, P., Liu, Y.: Business relations in the web: semantics and a case study. J. Softw. 5(8), 826–833 (2010)

    Article  Google Scholar 

  22. Zhao, J., Jin, P.: Extraction and credibility evaluation of web-based competitive intelligence. J. Softw. 6(8), 1513–1520 (2011)

    Google Scholar 

Download references

Acknowledgement

This study is supported by the National Key Research and Development Program of China (2018YFB0704404) and the National Science Foundation of China (61672479). Peiquan Jin is the corresponding author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peiquan Jin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hong, X., Jin, P., Mu, L., Zhao, J., Wan, S. (2020). Competitor Mining from Web Encyclopedia: A Graph Embedding Approach. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2020. WISE 2020. Lecture Notes in Computer Science(), vol 12342. Springer, Cham. https://doi.org/10.1007/978-3-030-62005-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62005-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62004-2

  • Online ISBN: 978-3-030-62005-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics