Pareto-Gamma Statistic Reveals Global Rescaling in Transcriptomes of Low and High Aggressive Breast Cancer Phenotypes

  • Alvin L. -S. Chua
  • Anna V. Ivshina
  • Vladimir A. Kuznetsov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4146)


We propose a novel mixture probability model for the probability distribution function (PDF) of microarray signals, which comprises a noise and a signal component. The noise term, due to non-specific mRNA hybridization, is given by a lognormal distribution; and the true signal, from specific mRNA hybridization, is described by the generalized Pareto-gamma (GPG) function. The model, applied to expression data of 251 human breast cancer tumors on the Affymetrix microarray platform, yields accurate fits for all tumor samples. We observe that (i) high aggressive cancers have, in general, broader right tails in the GPG than low aggressive cancers; (ii) the exponent parameter value of the GPG distribution is not constant and correlates strongly with ~4000 expressed genes and several "gold standard" clinical risk factors. These results can not be obtained from so-called “scale-free network” models. We conclude that an accurate parameterization of scale-dependent GPG function could provide robust prognostic benefits for cancer patients.


Lognormal Distribution Probability Distribution Function Clinical Risk Factor Pareto Distribution Empirical Distribution Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Anderson, T.W., Darling, D.A.: Asymptotic theory of certain ‘goodness-of-fit’ criteria based on stochastic processes. Ann. Math. Stat. 23, 193–212 (1952)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Furusawa, C., Kaneko, K.: Zipf’s law in gene expression. Phys. Rev. Lett. 90(8), 88–102 (2003)CrossRefGoogle Scholar
  3. 3.
    Hoyle, D.C., Rattray, M., Jupp, R., Brass, A.: Making sense of microarray data distributions. Bioinformatics 18(4), 576–584 (2002)CrossRefGoogle Scholar
  4. 4.
    Dozmorov, I., et al.: Neurokinin 1 receptors and neprilysin modulation of mouse bladder gene regulation. Physiol. Genomics 12, 239–250 (2003)Google Scholar
  5. 5.
    Ivshina, A.V., et al.: Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. In: Liu, E.T., Colman, A.,C., Harris, C., Nishikawa, S.-I., Reddel, R. (eds.) Stem cells, Senescence and Cancer. Keystone Symposia on Mol. Biol., Singapore, p. 76 (October 2005)Google Scholar
  6. 6.
    Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, 2nd edn., vol. 1 and 2. Wiley-Interscience, Chichester (1993)Google Scholar
  7. 7.
    Konishi, T.: Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment. BMC Bioinformatics 13(5), 5 (2004)CrossRefGoogle Scholar
  8. 8.
    Kuznetsov, V.A.: Distribution associated with stochastic processes of gene expression in a single eukaryotic. EURASIP J. App. Signal Processing 4, 258–296 (2001)Google Scholar
  9. 9.
    Kuznetsov, V.A.: Mathematical Analysis and Modeling of SAGE Transcriptome, pp. 139–179. Horizon Science Press (2005)Google Scholar
  10. 10.
    Kuznetsov, V.A., Knott, G.D., Bonner, R.F.: General statistics of stochastic process of gene expression in eukaryotic cells. Genetics 161(3), 1321–1332 (2002)Google Scholar
  11. 11.
    Li, W., Yang, Y.: Zipf’s law in importance of genes for cancer classification using microarray data. J. Theor. Biol. 219(4), 539–551 (2002)CrossRefGoogle Scholar
  12. 12.
    Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.H., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)CrossRefGoogle Scholar
  13. 13.
    Michiels, S., Koscielny, S., Hill, C.: Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365(9458), 488–492 (2005)CrossRefGoogle Scholar
  14. 14.
    Pareto, V.: Cours d’economie Politique, vol. II. F. Rouge, Lausanne (1897)Google Scholar
  15. 15.
    Reis-Filho, J.S., Westbury, C., Pierga, J.Y.: The impact of expression profiling on prognostic and predictive testing in breast cancer. J. Clin. Pathol. 59(3), 225–231 (2006)CrossRefGoogle Scholar
  16. 16.
    Stephens, M.A.: Statistics for goodness of fit and some comparisons. J. Amer. Stat. Ass. 23, 193–197 (1974)Google Scholar
  17. 17.
    Ueda, H.R., et al.: Universality and flexibility in gene expression from bacteria to human. PNAS 101(11), 3765–3769 (2004)CrossRefGoogle Scholar
  18. 18.
    Zucchi, I., Mento, E., Kuznetsov, V.A., et al.: Gene expression profiles of epithelial cells microscopically isolated from a breast-invasive ductal carcinoma and a nodal metastasis. Proc. Natl. Acad. Sci. USA 101(52), 18147–18152 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Alvin L. -S. Chua
    • 1
  • Anna V. Ivshina
    • 1
  • Vladimir A. Kuznetsov
    • 1
  1. 1.Genome Institute of SingaporeSingapore

Personalised recommendations