Abstract
Disease gene identification is of great significance for the treatment of genetic disorders. In recent years, the rapid development of high-throughput sequencing technologies has brought great revolution for disease gene identification methods. Network-based methods are now the most efficient component for disease gene identification, while the most of current methods pay only attention to the local topological attributes regardless of the global distribution. In this paper, we proposed to apply the random walk algorithm to extract global features for each gene and finally used binary logistic regression model to identify whether a gene belongs to the given disease. We also integrate the local features and global features into a complex feature vector to improve the identification performance. The experimental results show that the global feature is of great efficiency for disease gene identification. We organize the global feature into different kinds of feature vectors and we can get higher AUC scores than other state-of-the-art methods for all these feature vectors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang, K., Li, M., Hakonarson, H.: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38(16), e164 (2010)
Pan, Q., Shai, O., Lee, L.J., et al.: Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40(12), 1413–1415 (2008)
Stelzl, U., Worm, U., Lalowski, M., et al.: A human protein-protein interaction network: a resource for annotating the proteome. Cell 122(6), 957–968 (2005)
Simonis, N., Rual, J., Carvunis, A., et al.: Empirically controlled mapping of the Caenorhabditis elegans protein-protein interactome network. Nat. Methods 6(1), 47–54 (2009)
Consortium A I M: Evidence for network evolution in an Arabidopsis interactome map. Science 333(6042), 601–607 (2011)
Gavin, A.C., Aloy, P., Grandi, P., et al.: Proteome survey reveals modularity of the yeast cell machinery. Nature 440(7084), 631–636 (2006)
Krogan, N.J., Cagney, G., Yu, H., et al.: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(7084), 637–643 (2006)
Hawkins, R.D., Hon, G.C., Ren, B.: Next-generation genomics: an integrative approach. Nat. Rev. Genet. 11(7), 476–486 (2010)
Nielsen, R., Paul, J.S., Albrechtsen, A., et al.: Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12(6), 443–451 (2011)
Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2(6), 418–427 (2001)
Dahlquist, K.D., Salomonis, N., Vranizan, K., et al.: GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat. Genet. 31(1), 19–20 (2002)
Marioni, J.C., Mason, C.E., Mane, S.M., et al.: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18(9), 1509–1517 (2008)
Mortazavi, A., Williams, B.A., Mccue, K., et al.: Mapping and quantifying Mammalian transcriptomes by RNA-Seq. Nat. Methods 5(7), 621–628 (2008)
Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2008)
Köhler, S., Bauer, S., Horn, D., et al.: Walking the interactome for prioritization of candidate disease genes. AIDS Res. Hum. Retroviruses 21(4), 314–318 (2005)
Wu, X., Jiang, R., Zhang, M.Q., et al.: Network-based global inference of human disease genes. Mol. Syst. Biol. 4(1), 189 (2008)
Vanunu, O., Magger, O., Ruppin, E., et al.: Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6(1), e1000641 (2010)
Vidal, M., Cusick, M.E., Barabási, A.L.: Interactome networks and human disease: cell. Cell 144(6), 986–998 (2011)
Aittokallio, T., Schwikowski, B.: Graph-based methods for analysing networks in cell biology. Briefings Bioinf. 7(3), 243–255 (2006)
Pržulj, N.: Protein-protein interactions: making sense of networks via graph-theoretic modeling. Bioessays News Rev. Mol. Cell. Dev. Biol 33(2), 115–123 (2011)
Hakes, L., Pinney, J.W., Robertson, D.L., et al.: Protein-protein interaction networks and biology–what’s the connection? Nat. Biotechnol. 26(1), 69–72 (2008)
Lesage, G., Bader, G.D., Ding, H., et al.: Global mapping of the yeast genetic interaction network: discovering gene and drug function. Science 303(5659), 808–813 (2004)
Dixon, S.J., Costanzo, M., Baryshnikova, A., et al.: Systematic mapping of genetic interaction networks. Annu. Rev. Genet. 43(43), 601–625 (2009)
Costanzo, M., Baryshnikova, A., Bellay, J., et al.: The genetic landscape of a cell. Science 327(5964), 425–431 (2010)
Tanaka, R.: Scale-rich metabolic networks. Phys. Rev. Lett. 94(16), 168101 (2005)
Ravasz, E., Somera, A.L., Mongru, D.A., et al.: Hierarchical organization of modularity in metabolic networks. Science 297(5586), 1551–1555 (2002)
Ma, H., Zeng, A.P.: Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics 19(2), 270–277 (2003)
Prieto, C., Risueño, A., Fontanillo, C., et al.: Human gene coexpression landscape: confident network derived from tissue transcriptomic profiles. PLoS ONE 3(12), e3911 (2008)
Stuart, J.M., Segal, E., Koller, D., et al.: A gene-coexpression network for global discovery of conserved genetic modules. Science 302(5643), 249–255 (2003)
Guo, X., Gao, L., Wei, C., et al.: A computational method based on the integration of heterogeneous networks for predicting disease-gene associations. PLoS ONE 6(9), e24171 (2011). [SCI:000294686100018] [SCI IF = 4.092, JCR = 2]
Chen, B., Li, M., Wang, J., et al.: A logistic regression based algorithm for identifying human disease genes. In: IEEE International Conference on Bioinformatics and Biomedicine. IEEE (2014)
Chen, B., Wang, J., Li, M., et al.: Identifying disease genes by integrating multiple data sources. BMC Med. Genomics 7(Suppl 2), S2 (2014)
Chen, Y., Wang, W., Zhou, Y., et al.: In silico gene prioritization by integrating multiple data sources. PLoS ONE 6(6), e21137 (2011)
Burton, P.R., Clayton, D.G., Cardon, L.R., et al.: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145), 661–678 (2007)
Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C., Stuart, J.M.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013). Cancer Genome Atlas Research Network
Emmertstreib, F., Tripathi, S., Simoes, R.D.M., et al.: The human disease network. Proc. Natl. Acad. Sci. 1(1), 20–28 (2014)
Acknowledgment
This work is surported by JCYJ20140904154645958 and CXZZ20140904154910774.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wei, L., Wu, S., Zhang, J., Xu, Y. (2016). Random Walk Based Global Feature for Disease Gene Identification. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_38
Download citation
DOI: https://doi.org/10.1007/978-981-10-3005-5_38
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3004-8
Online ISBN: 978-981-10-3005-5
eBook Packages: Computer ScienceComputer Science (R0)