Skip to main content

A Hierarchical Structure-Aware Embedding Method for Predicting Phenotype-Gene Associations

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12712))

Included in the following conference series:

  • 3674 Accesses

Abstract

Identifying potential causal genes for disease phenotypes is essential for disease treatment and facilitates drug development. Inspired by existing random-walk based embedding methods and the hierarchical structure of Human Phenotype Ontology (HPO), this work presents a Hierarchical Structure-Aware Embedding Method (HSAEM) for predicting phenotype-gene associations, which explicitly incorporates node type information and node individual difference into random walks. Unlike existing meta-path-guided heterogeneous network embedding techniques, HSAEM estimates an individual jumping probability for each node learned from hierarchical structures of phenotypes and different node influences among genes. The jumping probability guides the current node to select either a heterogeneous neighborhood or a homogeneous neighborhood as the next node, when performing random walks over the heterogeneous network including HPO, phenotype-gene and Protein-Protein Interaction (PPI) networks. The generated node sequences are then fed into a heterogeneous SkipGram model to perform node representations. By defining the individual jumping probability based on hierarchical structure, HSAEM can effectively capture co-occurrence of nodes in the heterogeneous network. HSAEM yields its extraordinary performance not only in the statistical evaluation metrics compared to baselines but also in the practical effectiveness of prioritizing causal genes for Parkinson’s Disease.

L. Wang and M. Liu—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Andréasson, M., Zetterström, R.H., von Döbeln, U., Wedell, A., Svenningsson, P.: MCEE mutations in an adult patient with Parkinson’s disease, dementia, stroke and elevated levels of methylmalonic acid. Int. J. Mol. Sci. 20(11), 2631 (2019)

    Article  Google Scholar 

  2. Bohush, A., Niewiadomska, G., Filipek, A.: Role of mitogen activated protein kinase signaling in Parkinson’s disease. Int. J. Mol. Sci. 19(10) (2018)

    Google Scholar 

  3. Bonne, G., Rivier, F., Hamroun, D.: The 2019 version of the gene table of neuromuscular disorders (nuclear genome). Neuromuscul. Disord. 28(12), 1031–1063 (2018)

    Article  Google Scholar 

  4. Botstein, D., Risch, N.: Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat. Genet. 33(3), 228–237 (2003)

    Article  Google Scholar 

  5. Cheng, W., Greaves, C., Warren, M.: From n-gram to skipgram to concgram. Int. J. Corpus Linguist. 11(4), 411–433 (2006)

    Article  Google Scholar 

  6. Dong, Y., Chawla, N.V., Swami, A.: Metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 135–144 (2017)

    Google Scholar 

  7. Estrada, E.: Generalized walks-based centrality measures for complex biological networks. J. Theor. Biol. 263(4), 556–565 (2010)

    Article  MathSciNet  Google Scholar 

  8. Franco, R., Sánchez-Arias, J.A., Navarro, G., Lanciego, J.L.: Glucocerebrosidase mutations and synucleinopathies. potential role of sterylglucosides and relevance of studying both GBA1 and GBA2 genes. Front. Neuroanat. 12, 52 (2018)

    Google Scholar 

  9. Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)

    Google Scholar 

  10. Hussein, R., Yang, D., Cudré-Mauroux, P.: Are meta-paths necessary? Revisiting heterogeneous graph embeddings. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 437–446 (2018)

    Google Scholar 

  11. Kim, S., Xing, E.P., et al.: Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eqtl mapping. Ann. Appl. Stat. 6(3), 1095–1117 (2012)

    Article  MathSciNet  Google Scholar 

  12. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1188–1196 (2014)

    Google Scholar 

  13. Li, Y., Patra, J.C.: Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network. Bioinformatics 26(9), 1219–1224 (2010)

    Article  Google Scholar 

  14. Luo, Y., et al.: A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat. Commun. 8(1), 1–13 (2017)

    Article  Google Scholar 

  15. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 701–710 (2014)

    Google Scholar 

  16. Petegrosso, R., Park, S., Hwang, T.H., Kuang, R.: Transfer learning across ontologies for phenome-genome association prediction. Bioinformatics 33(4), 529–536 (2017)

    Google Scholar 

  17. Pyle, A., Ramesh, V., Bartsakoulia, M., Boczonadi, V., Horvath, R.: Behr’s syndrome is typically associated with disturbed mitochondrial translation and mutations in the c12orf65 gene. J. Neuromuscul. Dis. 1(1), 55–63 (2014)

    Article  Google Scholar 

  18. Robinson, P.N., Mundlos, S.: The human phenotype ontology. Clin. Genet. 77(6), 525–534 (2010)

    Article  Google Scholar 

  19. Sun, Y., Sukumaran, P., Schaar, A., Singh, B.B.: TRPM7 and its role in neurodegenerative diseases. Channels 9(5), 253–261 (2015)

    Article  Google Scholar 

  20. Talebi, R., Ahmadi, A., Afraz, F., Abdoli, R.: Parkinson’s disease and lactoferrin: analysis of dependent protein networks. Gene Rep. 4, 177–183 (2016)

    Article  Google Scholar 

  21. Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., Sharan, R.: Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6(1), e1000641 (2010)

    Google Scholar 

  22. Westin, J.E., Andersson, M., Lundblad, M., Cenci, M.A.: Persistent changes in striatal gene expression induced by long-term L-DOPA treatment in a rat model of Parkinson’s disease. Eur. J. Neurosci. 14(7), 1171–1176 (2010)

    Article  Google Scholar 

  23. Wolfe, C.J., Kohane, I.S., Butte, A.J.: Systematic survey reveals general applicability of “guilt-by-association’’ within gene coexpression networks. BMC Bioinformat. 6(1), 1–10 (2005)

    Article  Google Scholar 

  24. Xie, M., Xu, Y., Zhang, Y., Hwang, T., Kuang, R.: Network-based phenome-genome association prediction by bi-random walk. PloS One 10(5), e0125138 (2015)

    Google Scholar 

  25. Zhang, Y., Wang, Y., Liu, J., Huang, Y., Xie, M.: Weighted graph constraint and group centric non-negative matrix factorization for gene-phenotype association prediction. In: Proceedings of the 22nd IEEE Symposium on Computers and Communications, pp. 943–950 (2017)

    Google Scholar 

  26. Zong, N., Kim, H., Ngo, V., Harismendy, O.: Deep mining heterogeneous networks of biomedical linked data to predict novel drug-target associations. Bioinformatics 33(15), 2337–2344 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of Tianjin (No. 18JCYBJC15700), the National Natural Science Foundation of China (No. 81171407) and National Key R&D Program of China(2018YFB0204304).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maoqiang Xie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, L., Liu, M., He, W., Jin, X., Xie, M., Huang, Y. (2021). A Hierarchical Structure-Aware Embedding Method for Predicting Phenotype-Gene Associations. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75762-5_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75761-8

  • Online ISBN: 978-3-030-75762-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics