Abstract
Identifying potential causal genes for disease phenotypes is essential for disease treatment and facilitates drug development. Inspired by existing random-walk based embedding methods and the hierarchical structure of Human Phenotype Ontology (HPO), this work presents a Hierarchical Structure-Aware Embedding Method (HSAEM) for predicting phenotype-gene associations, which explicitly incorporates node type information and node individual difference into random walks. Unlike existing meta-path-guided heterogeneous network embedding techniques, HSAEM estimates an individual jumping probability for each node learned from hierarchical structures of phenotypes and different node influences among genes. The jumping probability guides the current node to select either a heterogeneous neighborhood or a homogeneous neighborhood as the next node, when performing random walks over the heterogeneous network including HPO, phenotype-gene and Protein-Protein Interaction (PPI) networks. The generated node sequences are then fed into a heterogeneous SkipGram model to perform node representations. By defining the individual jumping probability based on hierarchical structure, HSAEM can effectively capture co-occurrence of nodes in the heterogeneous network. HSAEM yields its extraordinary performance not only in the statistical evaluation metrics compared to baselines but also in the practical effectiveness of prioritizing causal genes for Parkinson’s Disease.
L. Wang and M. Liu—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andréasson, M., Zetterström, R.H., von Döbeln, U., Wedell, A., Svenningsson, P.: MCEE mutations in an adult patient with Parkinson’s disease, dementia, stroke and elevated levels of methylmalonic acid. Int. J. Mol. Sci. 20(11), 2631 (2019)
Bohush, A., Niewiadomska, G., Filipek, A.: Role of mitogen activated protein kinase signaling in Parkinson’s disease. Int. J. Mol. Sci. 19(10) (2018)
Bonne, G., Rivier, F., Hamroun, D.: The 2019 version of the gene table of neuromuscular disorders (nuclear genome). Neuromuscul. Disord. 28(12), 1031–1063 (2018)
Botstein, D., Risch, N.: Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat. Genet. 33(3), 228–237 (2003)
Cheng, W., Greaves, C., Warren, M.: From n-gram to skipgram to concgram. Int. J. Corpus Linguist. 11(4), 411–433 (2006)
Dong, Y., Chawla, N.V., Swami, A.: Metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 135–144 (2017)
Estrada, E.: Generalized walks-based centrality measures for complex biological networks. J. Theor. Biol. 263(4), 556–565 (2010)
Franco, R., Sánchez-Arias, J.A., Navarro, G., Lanciego, J.L.: Glucocerebrosidase mutations and synucleinopathies. potential role of sterylglucosides and relevance of studying both GBA1 and GBA2 genes. Front. Neuroanat. 12, 52 (2018)
Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Hussein, R., Yang, D., Cudré-Mauroux, P.: Are meta-paths necessary? Revisiting heterogeneous graph embeddings. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 437–446 (2018)
Kim, S., Xing, E.P., et al.: Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eqtl mapping. Ann. Appl. Stat. 6(3), 1095–1117 (2012)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1188–1196 (2014)
Li, Y., Patra, J.C.: Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network. Bioinformatics 26(9), 1219–1224 (2010)
Luo, Y., et al.: A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat. Commun. 8(1), 1–13 (2017)
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 701–710 (2014)
Petegrosso, R., Park, S., Hwang, T.H., Kuang, R.: Transfer learning across ontologies for phenome-genome association prediction. Bioinformatics 33(4), 529–536 (2017)
Pyle, A., Ramesh, V., Bartsakoulia, M., Boczonadi, V., Horvath, R.: Behr’s syndrome is typically associated with disturbed mitochondrial translation and mutations in the c12orf65 gene. J. Neuromuscul. Dis. 1(1), 55–63 (2014)
Robinson, P.N., Mundlos, S.: The human phenotype ontology. Clin. Genet. 77(6), 525–534 (2010)
Sun, Y., Sukumaran, P., Schaar, A., Singh, B.B.: TRPM7 and its role in neurodegenerative diseases. Channels 9(5), 253–261 (2015)
Talebi, R., Ahmadi, A., Afraz, F., Abdoli, R.: Parkinson’s disease and lactoferrin: analysis of dependent protein networks. Gene Rep. 4, 177–183 (2016)
Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., Sharan, R.: Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6(1), e1000641 (2010)
Westin, J.E., Andersson, M., Lundblad, M., Cenci, M.A.: Persistent changes in striatal gene expression induced by long-term L-DOPA treatment in a rat model of Parkinson’s disease. Eur. J. Neurosci. 14(7), 1171–1176 (2010)
Wolfe, C.J., Kohane, I.S., Butte, A.J.: Systematic survey reveals general applicability of “guilt-by-association’’ within gene coexpression networks. BMC Bioinformat. 6(1), 1–10 (2005)
Xie, M., Xu, Y., Zhang, Y., Hwang, T., Kuang, R.: Network-based phenome-genome association prediction by bi-random walk. PloS One 10(5), e0125138 (2015)
Zhang, Y., Wang, Y., Liu, J., Huang, Y., Xie, M.: Weighted graph constraint and group centric non-negative matrix factorization for gene-phenotype association prediction. In: Proceedings of the 22nd IEEE Symposium on Computers and Communications, pp. 943–950 (2017)
Zong, N., Kim, H., Ngo, V., Harismendy, O.: Deep mining heterogeneous networks of biomedical linked data to predict novel drug-target associations. Bioinformatics 33(15), 2337–2344 (2017)
Acknowledgements
This work is supported by the Natural Science Foundation of Tianjin (No. 18JCYBJC15700), the National Natural Science Foundation of China (No. 81171407) and National Key R&D Program of China(2018YFB0204304).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, L., Liu, M., He, W., Jin, X., Xie, M., Huang, Y. (2021). A Hierarchical Structure-Aware Embedding Method for Predicting Phenotype-Gene Associations. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-75762-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75761-8
Online ISBN: 978-3-030-75762-5
eBook Packages: Computer ScienceComputer Science (R0)