A Symmetric Length-Aware Enrichment Test

  • David Manescu
  • Uri KeichEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9029)


Young et al. [14] showed that due to gene length bias the popular Fisher Exact Test should not be used to study the association between a group of differentially expressed (DE) genes and a specific Gene Ontology (GO) category. Instead they suggest a test where one conditions on the genes in the GO category and draws the pseudo DE expressed genes according to a length-dependent distribution. The same model was presented in a different context by Kazemian et al. who went on to offer a dynamic programming (DP) algorithm to exactly estimate the significance of the proposed test [8]. Here we point out that while valid, the test proposed by these authors is no longer symmetric as Fisher’s Exact Test is: one gets different answers if one conditions on the observed GO category than on the DE set. As an alternative we offer a symmetric generalization of Fisher’s Exact Test and provide efficient algorithms to evaluate its significance.


Gene Ontology Monte Carlo Conditional Moment Saddlepoint Approximation Probability Weighting Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agresti, A.: A survey of exact inference for contingency tables. Statistical Science 7, 131–153 (1992)CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Butler, R.W.: Saddlepoint Approximations with Applications. University Press, Cambridge (2007)CrossRefzbMATHGoogle Scholar
  3. 3.
    Cleveland, W.S., Devlin, S.J.: Locally-weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association 83, 596–610 (1988)CrossRefzbMATHGoogle Scholar
  4. 4.
    The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000)Google Scholar
  5. 5.
    Cowell, W.R. (ed.): Sources and Development of Mathematical Software. Prentice-Hall Series in Computational Mathematics, Cleve Moler, Advisor. Prentice-Hall, Upper Saddle River, NJ 07458, USA (1984)Google Scholar
  6. 6.
    Fisher, R.A.: Statistical methods for research workers. Oliver & Boyd, London, 14th ed. edition (1970)Google Scholar
  7. 7.
    Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open source scientific tools for Python (2001)Google Scholar
  8. 8.
    Kazemian, M., Zhu, Q., Halfon, M.S., Sinha, S.: Improved accuracy of supervised crm discovery with interpolated markov models and cross-species comparison. Nucleic Acids Research 39(22), 9463–9472 (2011)CrossRefGoogle Scholar
  9. 9.
    Nieduszynski, C.A., Hiraga, S., Ak, P., Benham, C.J., Donaldson, A.D.: Oridb: a dna replication origin database. Nucleic. Acids Res. 35(Database issue), D40–D46 (2007)Google Scholar
  10. 10.
    R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2006). ISBN 3-900051-07-0Google Scholar
  11. 11.
    Scannell, D.R., Zill, O.A., Rokas, A., Payen, C., Dunham, M.J., Eisen, M.B., Rine, J., Johnston, M., Hittinger, C.T.: The awesome power of yeast evolutionary genetics: New genome sequences and strain resources for the saccharomyces sensu stricto genus. G3 (Bethesda) 1(1), 11–25 (2011)CrossRefGoogle Scholar
  12. 12.
    Skovgaard, I.M.: Saddlepoint expansions for conditional distributions. J. Appl. Prob. 24, 875–87 (1987)CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Wallenius, K.T.: Biased sampling: the non-central hypegeometric probability distribution. PhD thesis, Stanford University (1963)Google Scholar
  14. 14.
    Young, M.D., Wakefield, M.J., Smyth, G.K., Oshlack, A.: Gene ontology analysis for rna-seq: accounting for selection bias. Genome Biology 11(R14), 11 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.School of Mathematics and StatisticsUniversity of SydneySydneyAustralia

Personalised recommendations