Integrating Protein Family Sequence Similarities with Gene Expression to Find Signature Gene Networks in Breast Cancer Metastasis

  • Sepideh Babaei
  • Erik van den Akker
  • Jeroen de Ridder
  • Marcel Reinders
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7036)


Finding robust marker genes is one of the key challenges in breast cancer research. Significant signatures identified in independent datasets often show little to no overlap, possibly due to small sample size, noise in gene expression measurements, and heterogeneity across patients. To find more robust markers, several studies analyzed the gene expression data by grouping functionally related genes using pathways or protein interaction data. Here we pursue a protein similarity measure based on Pfam protein family information to aid the identification of robust subnetworks for prediction of metastasis. The proposed protein-to-protein similarities are derived from a protein-to-family network using family HMM profiles. The gene expression data is overlaid with the obtained protein-protein sequence similarity network on six breast cancer datasets. The results indicate that the captured protein similarities represent interesting predictive capacity that aids interpretation of the resulting signatures and improves robustness.


protein-to-family distance matrix protein-to-protein sequence similarity concordant signature breast cancer markers 


  1. 1.
    Weigelt, B., et al.: Breast cancer metastasis: markers and models. Nat. Rev. Cancer 5(8), 591–602 (2005)CrossRefGoogle Scholar
  2. 2.
    Veer, L.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)CrossRefGoogle Scholar
  3. 3.
    Vijver, M.J., et al.: A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347(25), 1999–2009 (2002)CrossRefGoogle Scholar
  4. 4.
    van Vliet, M.H., et al.: Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability. BMC Genomics 9, 375 (2008)CrossRefGoogle Scholar
  5. 5.
    Ein-Dor, L., et al.: Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21(2), 171–178 (2005)CrossRefGoogle Scholar
  6. 6.
    Hua, J., Tembe, W.D.: Performance of feature-selection methods in the classification of high-dimension data. Pattern Recog. 42(3), 409–424 (2009)CrossRefzbMATHGoogle Scholar
  7. 7.
    Symmans, W.F., et al.: Breast cancer heterogeneity: evaluation of clonality in primary and metastatic lesions. Hum. Pathol. 26(2), 210–216 (1995)CrossRefGoogle Scholar
  8. 8.
    Shen, R., et al.: Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data. BMC Genomics 5(1), 94 (2004)CrossRefGoogle Scholar
  9. 9.
    Pujana, M.A., et al.: Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat. Genet. 39(11), 1338–1349 (2007)CrossRefGoogle Scholar
  10. 10.
    Chuang, H.Y., et al.: Network-based classification of breast cancer metastasis. Mol. Sys. Bio. 3, 140 (2007)Google Scholar
  11. 11.
    van den Akker, E., et al.: Integrating protein-protein interaction networks with gene-gene co-expression networks improves gene signatures for classifying breast cancer metastasis (submitted)Google Scholar
  12. 12.
    Rigden, D.: From protein structure to function with bioinformatics. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  13. 13.
    Finn, R.D., et al.: The Pfam protein families database. Nucleic Acids Res. 38, D211–D222 (2010)CrossRefGoogle Scholar
  14. 14.
    Eddy, S.R.: A probabilistic model of local sequence alignment that simplifies statistical significance estimation. PLoS Comp. Bio. 4(5), e1000069 (2008)Google Scholar
  15. 15.
    von Mering, C., et al.: STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31(1), 258–261 (2003)Google Scholar
  16. 16.
  17. 17.
    van der Maaten, L.J.P., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. Journal of Machine Learning Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  18. 18.
    Goeman, J.J., et al.: A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 20(1), 93–99 (2004)CrossRefGoogle Scholar
  19. 19.
    Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et des Jura, Bulletin de la Société Vaudoise de Sciences. Naturelles 37, 547–579 (1901)Google Scholar
  20. 20.
    Edwards, A.W.F.: The measure of association in a 2×2 table. JSTOR 126(1), 1–28 (1968)Google Scholar
  21. 21.
    Huang, D.W., et al.: Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protoc. 4(1), 44–57 (2009)CrossRefGoogle Scholar
  22. 22.
    Ingenuity Pathways Analysis software,
  23. 23.
    Deblois, G., et al.: Genome-wide identification of direct target genes implicates estrogen-related receptor alpha as a determinant of breast cancer heterogeneity. Cancer Res. 69(15), 6149–6157 (2009)CrossRefGoogle Scholar
  24. 24.
    Yumei, F.: KNSL4 is a novel molecular marker for diagnosis and prognosis of breast cancer. American Assoc. for Cancer Res. (AACR) Meeting Abstracts, 1809 (2008)Google Scholar
  25. 25.
    Diarra-Mehrpour, M., et al.: Prion protein prevents human breast carcinoma cell line from tumor necrosis factor alpha-induced cell death. Cancer Res. 64(2), 719–727 (2004)CrossRefGoogle Scholar
  26. 26.
    Tripathi, A., et al.: Gene expression abnormalities in histologically normal breast epithelium of breast cancer patients. Int. J. Cancer 122(7), 1557–1566 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Sepideh Babaei
    • 1
    • 2
  • Erik van den Akker
    • 1
    • 3
  • Jeroen de Ridder
    • 1
    • 2
  • Marcel Reinders
    • 1
    • 2
  1. 1.Delft Bioinformatics LabDelft University of TechnologyThe Netherlands
  2. 2.Netherlands Bioinformatics CentreThe Netherlands
  3. 3.Molecular EpidemiologyLeiden University Medical CentreLeidenThe Netherlands

Personalised recommendations