Skip to main content

Heterogeneous Information Network Hashing for Fast Nearest Neighbor Search

  • Conference paper
  • First Online:
  • 3604 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11446))

Abstract

Heterogeneous information networks (HINs) are widely used to model real-world information systems due to their strong capability of capturing complex and diverse relations between multiple entities in real situations. For most of the analytical tasks in HINs (e.g., link prediction and node recommendation), network embedding techniques are prevalently used to project the nodes into real-valued feature vectors, based on which we can calculate the proximity between node pairs with nearest neighbor search (NNS) algorithms. However, the extensive usage of real-valued vector representation in existing network embedding methods imposes overwhelming computational and storage challenges, especially when the scale of the network is large. To tackle this issue, in this paper, we conduct an initial investigation of learning binary hash codes for nodes in HINs to obtain the remarkable acceleration of the NNS algorithms. Specifically, we propose a novel heterogeneous information network hashing algorithm based on collective matrix factorization. Through fully characterizing various types of relations among nodes and designing a principled optimization procedure, we successfully project the nodes in HIN into a unified Hamming space, with which the computational and storage burden of NNS can be significantly alleviated. The experimental results demonstrate that the proposed algorithm can indeed lead to faster NNS and requires lower memory usage than several state-of-the-art network embedding methods while showing comparable performance in typical learning tasks on HINs, including link prediction and cross-type node similarity search.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://aminer.org/.

  2. 2.

    http://snap.stanford.edu/data/.

  3. 3.

    http://www.levmuchnik.net/Content/Networks/NetworkData.html.

References

  1. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)

    Google Scholar 

  2. Chen, C., Tong, H., Xie, L., Ying, L., He, Q.: FASCINATE: fast cross-layer dependency inference on multi-layered networks. In: KDD (2016)

    Google Scholar 

  3. Cui, P., Wang, X., Pei, J., Zhu, W.: A survey on network embedding. arXiv preprint arXiv:1711.08752 (2017)

  4. Davis, A.P., et al.: The comparative toxicogenomics database’s 10th year anniversary: update 2015. Nucleic Acids Res. 43(D1), D914–D920 (2014)

    Article  Google Scholar 

  5. Dong, Y., Chawla, N.V., Swami, A.: Metapath2vec: scalable representation learning for heterogeneous networks. In: KDD (2017)

    Google Scholar 

  6. Eldén, L., Park, H.: A Procrustes problem on the Stiefel manifold. Numerische Mathematik 82(4), 599–619 (1999)

    Article  MathSciNet  Google Scholar 

  7. Fu, T., Lee, W.C., Lei, Z.: HIN2Vec: explore meta-paths in heterogeneous information networks for representation learning. In: CIKM (2017)

    Google Scholar 

  8. Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: KDD (2016)

    Google Scholar 

  9. Hamilton, W.L., Ying, R., Leskovec, J.: Representation learning on graphs: methods and applications. arXiv preprint arXiv:1709.05584 (2017)

  10. Håstad, J.: Some optimal inapproximability results. J. ACM 48(4), 798–859 (2001)

    Article  MathSciNet  Google Scholar 

  11. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (1990)

    MATH  Google Scholar 

  12. Li, J., Chen, C., Tong, H., Liu, H.: Multi-layered network embedding. In: SDM (2018)

    Google Scholar 

  13. Lian, D., et al.: High-order proximity preserving information network hashing. In: KDD (2018)

    Google Scholar 

  14. Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol. 58(7), 1019–1031 (2007)

    Article  Google Scholar 

  15. Ma, H., Zhou, D., Liu, C., Lyu, M.R., King, I.: Recommender systems with social regularization. In: WSDM (2011)

    Google Scholar 

  16. Opsahl, T., Panzarasa, P.: Clustering in weighted networks. Soc. Netw. 31(2), 155–163 (2009)

    Article  Google Scholar 

  17. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: KDD (2014)

    Google Scholar 

  18. Qiu, J., Dong, Y., Ma, H., Li, J., Wang, K., Tang, J.: Network embedding as matrix factorization: unifying DeepWalk, LINE, PTE, and node2vec. In: WSDM (2018)

    Google Scholar 

  19. Razick, S., Magklaras, G., Donaldson, I.M.: iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinform. 9(1), 405 (2008)

    Article  Google Scholar 

  20. Shen, X., Pan, S., Liu, W., Ong, Y.S., Sun, Q.S.: Discrete network embedding. In: IJCAI (2018)

    Google Scholar 

  21. Singh, A.P., Gordon, G.J.: Relational learning via collective matrix factorization. In: KDD (2008)

    Google Scholar 

  22. Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: PathSim: meta path-based top-k similarity search in heterogeneous information networks. Proc. VLDB Endow. 4(11), 992–1003 (2011)

    Google Scholar 

  23. Tang, J., Qu, M., Mei, Q.: PTE: predictive text embedding through large-scale heterogeneous text networks. In: KDD (2015)

    Google Scholar 

  24. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE: large-scale information network embedding. In: WWW (2015)

    Google Scholar 

  25. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: KDD (2008)

    Google Scholar 

  26. Wang, J., Zhang, T., Sebe, N., Shen, H.T., et al.: A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 769–790 (2018)

    Article  Google Scholar 

  27. Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘Small-World’ networks. Nature 393(6684), 440 (1998)

    Article  Google Scholar 

  28. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: NIPS (2009)

    Google Scholar 

  29. Willett, P., Barnard, J.M., Downs, G.M.: Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38(6), 983–996 (1998)

    Article  Google Scholar 

  30. Zhang, H., Shen, F., Liu, W., He, X., Luan, H., Chua, T.S.: Discrete collaborative filtering. In: SIGIR (2016)

    Google Scholar 

Download references

Acknowledgements

This work is supported by National Key Research and Development Program of China (2016YFB1000903), National Nature Science Foundation of China (61872287, 61532015 and 61672418), Innovative Research Group of the National Natural Science Foundation of China (61721002), Innovation Research Team of Ministry of Education (IRT_17R86), Project of China Knowledge Center for Engineering Science and Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minnan Luo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peng, Z., Luo, M., Li, J., Chen, C., Zheng, Q. (2019). Heterogeneous Information Network Hashing for Fast Nearest Neighbor Search. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11446. Springer, Cham. https://doi.org/10.1007/978-3-030-18576-3_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18576-3_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18575-6

  • Online ISBN: 978-3-030-18576-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics