Skip to main content

Random Walk Based Global Feature for Disease Gene Identification

  • Conference paper
  • First Online:
Pattern Recognition (CCPR 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 663))

Included in the following conference series:

Abstract

Disease gene identification is of great significance for the treatment of genetic disorders. In recent years, the rapid development of high-throughput sequencing technologies has brought great revolution for disease gene identification methods. Network-based methods are now the most efficient component for disease gene identification, while the most of current methods pay only attention to the local topological attributes regardless of the global distribution. In this paper, we proposed to apply the random walk algorithm to extract global features for each gene and finally used binary logistic regression model to identify whether a gene belongs to the given disease. We also integrate the local features and global features into a complex feature vector to improve the identification performance. The experimental results show that the global feature is of great efficiency for disease gene identification. We organize the global feature into different kinds of feature vectors and we can get higher AUC scores than other state-of-the-art methods for all these feature vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang, K., Li, M., Hakonarson, H.: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38(16), e164 (2010)

    Article  Google Scholar 

  2. Pan, Q., Shai, O., Lee, L.J., et al.: Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40(12), 1413–1415 (2008)

    Article  Google Scholar 

  3. Stelzl, U., Worm, U., Lalowski, M., et al.: A human protein-protein interaction network: a resource for annotating the proteome. Cell 122(6), 957–968 (2005)

    Article  Google Scholar 

  4. Simonis, N., Rual, J., Carvunis, A., et al.: Empirically controlled mapping of the Caenorhabditis elegans protein-protein interactome network. Nat. Methods 6(1), 47–54 (2009)

    Article  Google Scholar 

  5. Consortium A I M: Evidence for network evolution in an Arabidopsis interactome map. Science 333(6042), 601–607 (2011)

    Article  Google Scholar 

  6. Gavin, A.C., Aloy, P., Grandi, P., et al.: Proteome survey reveals modularity of the yeast cell machinery. Nature 440(7084), 631–636 (2006)

    Article  Google Scholar 

  7. Krogan, N.J., Cagney, G., Yu, H., et al.: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(7084), 637–643 (2006)

    Article  Google Scholar 

  8. Hawkins, R.D., Hon, G.C., Ren, B.: Next-generation genomics: an integrative approach. Nat. Rev. Genet. 11(7), 476–486 (2010)

    Google Scholar 

  9. Nielsen, R., Paul, J.S., Albrechtsen, A., et al.: Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12(6), 443–451 (2011)

    Article  Google Scholar 

  10. Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2(6), 418–427 (2001)

    Article  Google Scholar 

  11. Dahlquist, K.D., Salomonis, N., Vranizan, K., et al.: GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat. Genet. 31(1), 19–20 (2002)

    Article  Google Scholar 

  12. Marioni, J.C., Mason, C.E., Mane, S.M., et al.: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18(9), 1509–1517 (2008)

    Article  Google Scholar 

  13. Mortazavi, A., Williams, B.A., Mccue, K., et al.: Mapping and quantifying Mammalian transcriptomes by RNA-Seq. Nat. Methods 5(7), 621–628 (2008)

    Article  Google Scholar 

  14. Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2008)

    Article  Google Scholar 

  15. Köhler, S., Bauer, S., Horn, D., et al.: Walking the interactome for prioritization of candidate disease genes. AIDS Res. Hum. Retroviruses 21(4), 314–318 (2005)

    Article  Google Scholar 

  16. Wu, X., Jiang, R., Zhang, M.Q., et al.: Network-based global inference of human disease genes. Mol. Syst. Biol. 4(1), 189 (2008)

    Google Scholar 

  17. Vanunu, O., Magger, O., Ruppin, E., et al.: Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6(1), e1000641 (2010)

    Article  MathSciNet  Google Scholar 

  18. Vidal, M., Cusick, M.E., Barabási, A.L.: Interactome networks and human disease: cell. Cell 144(6), 986–998 (2011)

    Article  Google Scholar 

  19. Aittokallio, T., Schwikowski, B.: Graph-based methods for analysing networks in cell biology. Briefings Bioinf. 7(3), 243–255 (2006)

    Article  Google Scholar 

  20. Pržulj, N.: Protein-protein interactions: making sense of networks via graph-theoretic modeling. Bioessays News Rev. Mol. Cell. Dev. Biol 33(2), 115–123 (2011)

    Google Scholar 

  21. Hakes, L., Pinney, J.W., Robertson, D.L., et al.: Protein-protein interaction networks and biology–what’s the connection? Nat. Biotechnol. 26(1), 69–72 (2008)

    Article  Google Scholar 

  22. Lesage, G., Bader, G.D., Ding, H., et al.: Global mapping of the yeast genetic interaction network: discovering gene and drug function. Science 303(5659), 808–813 (2004)

    Article  Google Scholar 

  23. Dixon, S.J., Costanzo, M., Baryshnikova, A., et al.: Systematic mapping of genetic interaction networks. Annu. Rev. Genet. 43(43), 601–625 (2009)

    Article  Google Scholar 

  24. Costanzo, M., Baryshnikova, A., Bellay, J., et al.: The genetic landscape of a cell. Science 327(5964), 425–431 (2010)

    Article  Google Scholar 

  25. Tanaka, R.: Scale-rich metabolic networks. Phys. Rev. Lett. 94(16), 168101 (2005)

    Article  Google Scholar 

  26. Ravasz, E., Somera, A.L., Mongru, D.A., et al.: Hierarchical organization of modularity in metabolic networks. Science 297(5586), 1551–1555 (2002)

    Article  Google Scholar 

  27. Ma, H., Zeng, A.P.: Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics 19(2), 270–277 (2003)

    Article  Google Scholar 

  28. Prieto, C., Risueño, A., Fontanillo, C., et al.: Human gene coexpression landscape: confident network derived from tissue transcriptomic profiles. PLoS ONE 3(12), e3911 (2008)

    Article  Google Scholar 

  29. Stuart, J.M., Segal, E., Koller, D., et al.: A gene-coexpression network for global discovery of conserved genetic modules. Science 302(5643), 249–255 (2003)

    Article  Google Scholar 

  30. Guo, X., Gao, L., Wei, C., et al.: A computational method based on the integration of heterogeneous networks for predicting disease-gene associations. PLoS ONE 6(9), e24171 (2011). [SCI:000294686100018] [SCI IF = 4.092, JCR = 2]

    Article  Google Scholar 

  31. Chen, B., Li, M., Wang, J., et al.: A logistic regression based algorithm for identifying human disease genes. In: IEEE International Conference on Bioinformatics and Biomedicine. IEEE (2014)

    Google Scholar 

  32. Chen, B., Wang, J., Li, M., et al.: Identifying disease genes by integrating multiple data sources. BMC Med. Genomics 7(Suppl 2), S2 (2014)

    Article  Google Scholar 

  33. Chen, Y., Wang, W., Zhou, Y., et al.: In silico gene prioritization by integrating multiple data sources. PLoS ONE 6(6), e21137 (2011)

    Article  Google Scholar 

  34. Burton, P.R., Clayton, D.G., Cardon, L.R., et al.: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145), 661–678 (2007)

    Article  Google Scholar 

  35. Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C., Stuart, J.M.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013). Cancer Genome Atlas Research Network

    Article  Google Scholar 

  36. Emmertstreib, F., Tripathi, S., Simoes, R.D.M., et al.: The human disease network. Proc. Natl. Acad. Sci. 1(1), 20–28 (2014)

    Google Scholar 

Download references

Acknowledgment

This work is surported by JCYJ20140904154645958 and CXZZ20140904154910774.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuai Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Wei, L., Wu, S., Zhang, J., Xu, Y. (2016). Random Walk Based Global Feature for Disease Gene Identification. In: Tan, T., Li, X., Chen, X., Zhou, J., Yang, J., Cheng, H. (eds) Pattern Recognition. CCPR 2016. Communications in Computer and Information Science, vol 663. Springer, Singapore. https://doi.org/10.1007/978-981-10-3005-5_38

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3005-5_38

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3004-8

  • Online ISBN: 978-981-10-3005-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics