Identification of Biomarkers for Prostate Cancer Prognosis Using a Novel Two-Step Cluster Analysis

  • Xin Chen
  • Shizhong Xu
  • Yipeng Wang
  • Michael McClelland
  • Zhenyu Jia
  • Dan Mercola
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7036)


Prognosis of Prostate cancer is challenging due to incomplete assessment by clinical variables such as Gleason score, metastasis stage, surgical margin status, seminal vesicle invasion status and preoperative prostate-specific antigen level. The whole-genome gene expression assay provides us with opportunities to identify molecular indicators for predicting disease outcomes. However, cell composition heterogeneity of the tissue samples usually generates inconsistent results for cancer profile studies. We developed a two-step strategy to identify prognostic biomarkers for prostate cancer by taking into account the variation due to mixed tissue samples. In the first step, an unsupervised EM clustering analysis was applied to each gene to cluster patient samples into subgroups based on the expression values of the gene. In the second step, genes were selected based on χ 2 correlation analysis between the cluster indicators obtained in the first step and the observed clinical outcomes. Two simulation studies showed that the proposed method identified  30% more prognostic genes than the traditional differential expression analysis methods such as SAM and LIMMA. We also analyzed a real prostate cancer expression data set using the new method and the traditional methods. The pathway assay showed that the genes identified with the new method are significantly enriched by prostate cancer relevant pathways such as the wnt signaling pathway and TGF-β signaling pathway. Nevertheless, these genes were not detected by the traditional methods.


Prostate Cancer Bayesian Information Criterion Score Unsupervised Cluster Analysis Predict Disease Outcome Prostate Cancer Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    A.C.S: American Cancer Society: Cancer Facts and Figures 2011 [online] (2011)Google Scholar
  2. 2.
    Barwick, B.G., Abramovitz, M., Kodani, M., Moreno, C.S., Nam, R., Tang, W., Bouzyk, M., Seth, A., Leyland-Jones, B.: Prostate cancer genes associated with TMPRSS2-ERG gene fusion and prognostic of biochemical recurrence in multiple cohorts. Br. J. Cancer 102, 570–576 (2010)CrossRefGoogle Scholar
  3. 3.
    Bibikova, M., Chudin, E., Arsanjani, A., Zhou, L., Garcia, E.W., Modder, J., Kostelec, M., Barker, D., Downs, T., Fan, J.B.: Expression signatures that correlated with Gleason score and relapse in prostate cancer. Genomics 89, 666–672 (2007)CrossRefGoogle Scholar
  4. 4.
    Bickers, B., Aukim-Hastie, C.: New molecular biomarkers for the prognosis and management of prostate cancer-the post PSA era. Anticancer Res. 29, 3289–3298 (2009)Google Scholar
  5. 5.
    Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U. S. A. 98, 5116–5121 (2001)CrossRefzbMATHGoogle Scholar
  6. 6.
    Smyth, G.K.: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, Article3 (2004)Google Scholar
  7. 7.
    Ibrahim, J., Chen, M.-H., Gray, R.: Bayesian models for gene expression with dna microarray data. J. Am. Stat. Assoc. 97, 88–99 (2002)CrossRefzbMATHGoogle Scholar
  8. 8.
    Ishwaran, H., Rao, J.: Detecting differentially expressed gene in microarrays using bayesian model selection. J. Am. Stat. Assoc. 98, 438–455 (2003)CrossRefzbMATHGoogle Scholar
  9. 9.
    Lewin, A., Bochkina, N., Richardson, S.: Fully Bayesian mixture model for differential gene expression: Simulations and model checks. Stat. Appl. Genet. Mol. Biol. 6, 1–36 (2007)zbMATHGoogle Scholar
  10. 10.
    Fan, C., Oh, D.S., Wessels, L., Weigelt, B., Nuyten, D.S., Nobel, A.B., Van’t Veer, L.J., Perou, C.M.: Concordance among gene-expression-based predictors for breast cancer. N. Engl. J. Med. 355, 560–569 (2006)CrossRefGoogle Scholar
  11. 11.
    Wang, Y., Xia, X.Q., Jia, Z., Sawyers, A., Yao, H., Wang-Rodriquez, J., Mercola, D., McClelland, M.: In silico Estimates of Tissue Components in Surgical Samples Based on Expression Profiling Data. Cancer Res. 70, 6448–6455 (2010)CrossRefGoogle Scholar
  12. 12.
    Stuart, R.O., Wachsman, W., Berry, C.C., Wang-Rodriguez, J., Wasserman, L., Klacansky, I., Masys, D., Arden, K., Goodison, S., McClelland, M.: In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proc. Natl. Acad. Sci. U. S. A. 101, 615–620 (2004)CrossRefGoogle Scholar
  13. 13.
    Jia, Z., Wang, Y., Sawyers, A., Yao, H., Rahmatpanah, F., Xia, X.Q., Xu, Q., Pio, R., Turan, T., Koziol, J.A.: Diagnosis of prostate cancer using differentially expressed genes in stroma. Cancer Res. 71, 2476–2487 (2011)CrossRefGoogle Scholar
  14. 14.
    Dennis Jr, G., Sherman, B.T., Hosack, D.A., Yang, J., Gao, W., Lane, H.C., Lempicki, R.A.: DAVID: database for annotation, visualization, and integrated discovery. Genome Biol. 4, R60 (2003)CrossRefGoogle Scholar
  15. 15.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B. 39, 1–38 (1977)zbMATHGoogle Scholar
  16. 16.
    Woodson, K., Tangrea, J.A., Pollak, M., Copeland, T.D., Taylor, P.R., Virtamo, J., Albanes, D.: Serum insulin-like growth factor I: tumor marker or etiologic factor? A prospective study of prostate cancer among Finnish men. Cancer Res. 63, 3991–3994 (2003)Google Scholar
  17. 17.
    Xu, L., Tan, A.C., Naiman, D.Q., Geman, D., Winslow, R.L.: Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data. Bioinformatics 21, 3905–3911 (2005)CrossRefGoogle Scholar
  18. 18.
    Sutcliffe, P., Hummel, S., Simpson, E., et al.: Use of classical and novel biomarkers as prognostic risk factors for localised prostate cancer: a systematic review. Health Technol. Assess 13, 5 (2009)CrossRefGoogle Scholar
  19. 19.
    Mucci, L.A., Pawitan, Y., Demichelis, F., et al.: Testing a multigene signature of prostate cancer death in the Swedish Watchful Waiting Cohort. Cancer Epidemiol. Biomarkers Prev. 17, 1682–1688 (2008)CrossRefGoogle Scholar
  20. 20.
    Tomlins, S.A., Bjartell, A., Chinnaiyan, A.M., et al.: ETS gene fusions in prostate cancer: from discovery to daily clinical practice. Eur. Urol. 56, 275–286 (2009)CrossRefGoogle Scholar
  21. 21.
    Díaz-Uriarte, R., de Andrés, A.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Xin Chen
    • 1
  • Shizhong Xu
    • 2
  • Yipeng Wang
    • 3
  • Michael McClelland
    • 1
    • 4
  • Zhenyu Jia
    • 1
  • Dan Mercola
    • 1
  1. 1.Department of Pathology and Laboratory MedicineUniversity of CaliforniaIrvineUSA
  2. 2.Department of Botany and Plant SciencesUniversity of CaliforniaRiversideUSA
  3. 3.AltheaDx Inc.San DiegoUSA
  4. 4.Vaccine Research Institute of San DiegoUSA

Personalised recommendations