Skip to main content

Extended Spearman and Kendall Coefficients for Gene Annotation List Correlation

  • Conference paper
  • First Online:
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2014)

Abstract

Gene annotations are a key concept in bioinformatics and computational methods able to predict them are a fundamental contribution to the field. Several machine learning algorithms are available in this domain; they include relevant parameters that might influence the output list of predicted gene annotations. The amount that the variation of these key parameters affect the output gene annotation lists remains an open aspect to be evaluated. Here, we provide support for such evaluation by introducing two list correlation measures; they are based on and extend the Spearman ρ correlation coefficient and Kendall τ distance, respectively. The application of these measures to some gene annotation lists, predicted from Gene Ontology annotation datasets of different organisms’ genes, showed interesting patterns between the predicted lists. Additionally, they allowed expressing some useful considerations about the prediction parameters and algorithms used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Karp, P.D.: What we do not know about sequence analysis and sequence databases. Bioinformatics 14(9), 753–754 (1998)

    Article  Google Scholar 

  2. Pandey, G., Kumar, V., Steinbach, M.: Computational approaches for protein function prediction: A survey. Twin Cities: Department of Computer Science and Engineering, University of Minnesota (2006)

    Google Scholar 

  3. Khatri, P., Done, B., Rao, A., Done, A., Draghici, S.: A semantic analysis of the annotations of the human genome. Bioinformatics 21(16), 3416–3421 (2005)

    Article  Google Scholar 

  4. Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische Mathematik 14(5), 403–420 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  5. Consortium, G.O., et al.: Creating the gene ontology resource: design and implementation. Genome Research 11(8), 1425–1433 (2001)

    Article  Google Scholar 

  6. Chicco, D., Masseroli, M.: A discrete optimization approach for svd best truncation choice based on roc curves. In: 2013 IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1–4. IEEE (2013)

    Google Scholar 

  7. Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Machine Learning 56(1-3), 9–33 (2004)

    Article  MATH  Google Scholar 

  8. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007 (1995)

    Google Scholar 

  9. Chicco, D., Tagliasacchi, M., Masseroli, M.: Genomic annotation prediction based on integrated information. In: Biganzoli, E., Vellido, A., Ambrogi, F., Tagliaferri, R. (eds.) CIBB 2011. LNCS, vol. 7548, pp. 238–252. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Done, B., Khatri, P., Done, A., Draghici, S.: Semantic analysis of genome annotations using weighting schemes. In: IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, CIBCB 2007, pp. 212–218. IET (2007)

    Google Scholar 

  11. Done, B., Khatri, P., Done, A., Draghici, S.: Predicting novel human gene ontology annotations using semantic analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 7(1), 91–99 (2010)

    Article  Google Scholar 

  12. Pinoli, P., Chicco, D., Masseroli, M.: Enhanced probabilistic latent semantic analysis with weighting schemes to predict genomic annotations. In: 2013 IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1–4. IEEE (2013)

    Google Scholar 

  13. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)

    Google Scholar 

  14. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. the Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  15. Masseroli, M., Chicco, D., Pinoli, P.: Probabilistic latent semantic analysis for prediction of gene ontology annotations. In: The 2012 International Joint Conference on eural Networks (IJCNN), pp. 1–8. IEEE (2012)

    Google Scholar 

  16. Pinoli, P., Chicco, D., Masseroli, M.: Latent dirichlet allocation based on gibbs sampling for gene function prediction. In: 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1–8. IEEE (2014)

    Google Scholar 

  17. Chicco, D., Sadowski, P., Baldi, P.: Deep autoencoder neural networks for gene ontology annotation predictions. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 533–540. ACM (2014)

    Google Scholar 

  18. Goodman, L.A., Kruskal, W.H.: Measures of association for cross classifications*. Journal of the American Statistical Association 49(268), 732–764 (1954)

    MATH  Google Scholar 

  19. Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. SIAM Journal on Discrete Mathematics 17(1), 134–160 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  20. Spearman, C.: The proof and measurement of association between two things. The American Journal of Psychology 15(1), 72–101 (1904)

    Article  Google Scholar 

  21. Kendall, M.G.: A new measure of rank correlation. Biometrika, 81–93 (1938)

    Google Scholar 

  22. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys (CSUR) 40(4), 11 (2008)

    Article  Google Scholar 

  23. Kumar, R., Vassilvitskii, S.: Generalized distances between rankings. In: Proceedings of the 19th International Conference on World Wide Web, pp. 571–580. ACM (2010)

    Google Scholar 

  24. Bertin-Mahieux, T., Eck, D., Maillet, F., Lamere, P.: Autotagger: A model for predicting social tags from acoustic features on large music databases. Journal of New Music Research 37(2), 115–135 (2008)

    Article  Google Scholar 

  25. Chen, Q., Aickelin, U.: Movie recommendation systems using an artificial immune system. arXiv preprint arXiv:0801.4287 (2008)

    Google Scholar 

  26. Payne, J.S., Stonbam, T.J.: Can texture and image content retrieval methods match human perception?. In: Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing, pp. 154–157. IEEE (2001)

    Google Scholar 

  27. Ciceri, E., Fraternali, P., Martinenghi, D., Tagliasacchi, M.: Crowdsourcing for Top-K Query Processing over Uncertain Data. IEEE Transactions on Knowledge and Data Engineering (TKDE), 1–14 (preprint) (2015)

    Google Scholar 

  28. Fawcett, T.: Roc graphs: Notes and practical considerations for researchers. Machine Learning 31, 1–38 (2004)

    MathSciNet  Google Scholar 

  29. Canakoglu, A., Masseroli, M., Ceri, S., Tettamanti, L., Ghisalberti, G., Campi, A.: Integrative warehousing of biomolecular information to support complex multi-topic queries for biomedical knowledge discovery. In: 2013 IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 1–4. IEEE (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davide Chicco .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Chicco, D., Ciceri, E., Masseroli, M. (2015). Extended Spearman and Kendall Coefficients for Gene Annotation List Correlation. In: DI Serio, C., Liò, P., Nonis, A., Tagliaferri, R. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2014. Lecture Notes in Computer Science(), vol 8623. Springer, Cham. https://doi.org/10.1007/978-3-319-24462-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24462-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24461-7

  • Online ISBN: 978-3-319-24462-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics