A Symmetric Length-Aware Enrichment Test
Young et al.  showed that due to gene length bias the popular Fisher Exact Test should not be used to study the association between a group of differentially expressed (DE) genes and a specific Gene Ontology (GO) category. Instead they suggest a test where one conditions on the genes in the GO category and draws the pseudo DE expressed genes according to a length-dependent distribution. The same model was presented in a different context by Kazemian et al. who went on to offer a dynamic programming (DP) algorithm to exactly estimate the significance of the proposed test . Here we point out that while valid, the test proposed by these authors is no longer symmetric as Fisher’s Exact Test is: one gets different answers if one conditions on the observed GO category than on the DE set. As an alternative we offer a symmetric generalization of Fisher’s Exact Test and provide efficient algorithms to evaluate its significance.
KeywordsGene Ontology Monte Carlo Conditional Moment Saddlepoint Approximation Probability Weighting Function
Unable to display preview. Download preview PDF.
- 4.The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000)Google Scholar
- 5.Cowell, W.R. (ed.): Sources and Development of Mathematical Software. Prentice-Hall Series in Computational Mathematics, Cleve Moler, Advisor. Prentice-Hall, Upper Saddle River, NJ 07458, USA (1984)Google Scholar
- 6.Fisher, R.A.: Statistical methods for research workers. Oliver & Boyd, London, 14th ed. edition (1970)Google Scholar
- 7.Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open source scientific tools for Python (2001)Google Scholar
- 9.Nieduszynski, C.A., Hiraga, S., Ak, P., Benham, C.J., Donaldson, A.D.: Oridb: a dna replication origin database. Nucleic. Acids Res. 35(Database issue), D40–D46 (2007)Google Scholar
- 10.R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2006). ISBN 3-900051-07-0Google Scholar
- 11.Scannell, D.R., Zill, O.A., Rokas, A., Payen, C., Dunham, M.J., Eisen, M.B., Rine, J., Johnston, M., Hittinger, C.T.: The awesome power of yeast evolutionary genetics: New genome sequences and strain resources for the saccharomyces sensu stricto genus. G3 (Bethesda) 1(1), 11–25 (2011)CrossRefGoogle Scholar
- 13.Wallenius, K.T.: Biased sampling: the non-central hypegeometric probability distribution. PhD thesis, Stanford University (1963)Google Scholar
- 14.Young, M.D., Wakefield, M.J., Smyth, G.K., Oshlack, A.: Gene ontology analysis for rna-seq: accounting for selection bias. Genome Biology 11(R14), 11 (2010)Google Scholar