Abstract
Mining competitors from the web has been a valuable and emerging topic in big data and business analytics. While normal web pages may include incredible information like fake news, in this paper, we aim to extract competitors from web encyclopedia like Wikipedia and DBpedia, which provide more credible information. We notice that the entities in web encyclopedia can form graph structures. Motivated by this observation, we propose to extract competitors by employing a graph embedding approach. We first present a general framework for mining competitors from web encyclopedia. Then, we propose to mine competitors based on the similarity among graph nodes and further present a similarity computation method combing graph-node similarity and textual relevance. We implement the graph-embedding-based algorithm and compare the proposed method with four existing algorithms on the real data sets crawled from Wikipedia and DBpedia. The results in terms of precision, recall, and F1-measure suggest the effectiveness of our proposal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bao, S., Li, R., Yu, Y., Cao, Y.: Competitor Mining with the Web. IEEE Trans. Knowl. Data Eng. 20(10), 1297–1310 (2008)
Bondarenko, A., et al.: Comparative web search questions. WSDM, 52–60 (2020)
Zhao, J., Jin, P.: Conceptual modeling for competitive intelligence hiding in the internet. J. Softw. 5(4), 378–386 (2010)
Zhao, J., Jin, P.: Towards the extraction of intelligence about competitor from the web. In: Lytras, M.D., et al. (eds.) Visioning and Engineering the Knowledge Society. A Web Science Perspective. Lecture Notes in Computer Science, vol. 5736, pp. 118–127. Springer, Berlin, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04754-1_13
Chen, X., Wu, Y.: Web mining from competitors’ websites. In: KDD, pp. 550–555 (2005)
Li, S., Lin, C., Song, Y., Li, Z.: Comparable Entity mining from comparative questions. In: ACL, pp. 650–658 (2010)
Ruan, T., Xue, L., Wang, H., Pan, J.: Bootstrapping yahoo! finance by wikipedia for competitor mining. In: Qi, G., Kozaki, K., Pan, J., Yu, S. (eds.) Semantic Technology. Lecture Notes in Computer Science, vol. 9544, pp. 108–126. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-31676-5_8
Lange, D., Böhm, C., Naumann, F.: Extracting Structured Information from Wikipedia Articles to Populate Infoboxes. CIKM, pp. 1661–1664 (2010)
Haidar-Ahmad, L., Zouaq, A., Gagnon, M.: Automatic extraction of axioms from wikipedia using SPARQL. In: Sack, H., et al. (eds.) The Semantic Web. ESWC 2016. Lecture Notes in Computer Science, vol. 9989, pp. 60–64. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_13
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. KDD, 701–710 (2014)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. KDD, 855–864 (2016)
Dong, Y., Chawla, N., Swami, A.: metapath2vec: scalable representation learning for heterogeneous networks. KDD, 135–144 (2017)
Haghighat, M., Li, J.: Toward fast regex pattern matching using simple patterns. In: ICPADS, pp. 662–670 (2018)
Hill, B., Shaw, A.: Consider the redirect: a missing dimension of wikipedia research. OpenSym 28(1–28), 4 (2014)
Tamir, R.: A random walk through human associations. In: ICDM, pp. 442–449 (2005)
Pickhardt, R., et al.: A generalized language model as the combination of skipped n-grams and modified Kneser Ney smoothing. In: ACL, pp. 1145–1154 (2014)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)
Sun, Y., Han, J., Yan, X., Yu, P., Wu, T.: PathSim: meta path-based top-K similarity search in heterogeneous information networks. PVLDB 4(11), 992–1003 (2011)
Ni, C., Liu, K., Torzec, N.: Layered graph embedding for entity recommendation using wikipedia in the yahoo! knowledge graph. In: WWW, pp. 811–818 (2020)
Fu, T., Lee, W., Lei, Z.: HIN2Vec: explore meta-paths in heterogeneous information networks for representation learning. In: CIKM, pp. 1786–1806 (2017)
Zhao, J., Jin, P., Liu, Y.: Business relations in the web: semantics and a case study. J. Softw. 5(8), 826–833 (2010)
Zhao, J., Jin, P.: Extraction and credibility evaluation of web-based competitive intelligence. J. Softw. 6(8), 1513–1520 (2011)
Acknowledgement
This study is supported by the National Key Research and Development Program of China (2018YFB0704404) and the National Science Foundation of China (61672479). Peiquan Jin is the corresponding author.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Hong, X., Jin, P., Mu, L., Zhao, J., Wan, S. (2020). Competitor Mining from Web Encyclopedia: A Graph Embedding Approach. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2020. WISE 2020. Lecture Notes in Computer Science(), vol 12342. Springer, Cham. https://doi.org/10.1007/978-3-030-62005-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-62005-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62004-2
Online ISBN: 978-3-030-62005-9
eBook Packages: Computer ScienceComputer Science (R0)