Skip to main content

Validation Pipeline for Computational Prediction of Genomics Annotations

  • Conference paper
  • First Online:
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2015)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9874))

Abstract

Controlled biomolecular annotations are key concepts in computational genomics and proteomics, since they can describe the functional features of genes and their products in both a simple and computational way. Despite the importance of these annotations, many of them are missing, and the available ones contain errors and inconsistencies; furthermore, the discovery and validation of new annotations are very time-consuming tasks. For these reasons, recently many computer scientists developed several machine-learning algorithms able to computationally predict new gene-function relationships. While several of these methods have been easily adapted from different domains to bioinformatics, their validation remains a challenging aspect of a computational pipeline. Here, we propose a validation procedure based upon three different sub-phases, which is able to assess the precision of any algorithm predictions with a reliable degree of accuracy. We show some validation results obtained for Gene Ontology annotations of Homo sapiens genes that demonstrate the effectiveness of our validation approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. The Gene Ontology Consortium, Creating the Gene Ontology resource: Designand implementation. Genome Res. 11(8), 1425–1433 (2001)

    Google Scholar 

  2. Karp, P.D.: What we do not know about sequence analysis and sequence databases. Bioinformatics 14(9), 753–754 (1998)

    Article  Google Scholar 

  3. Pandey, G., Kumar, V., Steinbach, M.: Computational Approaches for Protein Function Prediction: A Survey. Department of Computer Science and Engineering, University of Minnesota, Twin Cities (2006)

    Google Scholar 

  4. Chicco, D., Tagliasacchi, M., Masseroli, M.: Biomolecular annotation prediction through information integration. In: Proceedings of CIBB 2011 - 8th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, Gargnagno sul Garda, Italy, pp. 1–9 (2011)

    Google Scholar 

  5. Chicco, D., Masseroli, M.: A discrete optimization approach for SVD best truncation choice based on ROC curves. In: Proceedings of IEEE BIBE - the 13th IEEE International Conference on Bioinformatics and Bioengineering, pp. 1–8. IEEE, Chania (2013)

    Google Scholar 

  6. Pinoli, P., Chicco, D., Masseroli, M.: Improved biomolecular annotation prediction through weighting scheme methods. In: Proceedings of CIBB - 10th International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics, Nice, France, pp. 1–9 (2013)

    Google Scholar 

  7. Pinoli, P., Chicco, D., Masseroli, M.: Weighting scheme methods for enhanced genomic annotation prediction. In: Formenti, E., Tagliaferri, R., Wit, E. (eds.) CIBB 2013. LNCS, vol. 8452, pp. 76–89. Springer, Heidelberg (2014)

    Google Scholar 

  8. Pinoli, P., Chicco, D., Masseroli, M.: Enhanced probabilistic latent semantic analysis with weighting schemes to predict genomic annotations. In: Proceedings of IEEE BIBE - the 13th IEEE International Conference on Bioinformatics and Bioengineering, pp. 1–8. IEEE, Chania (2013)

    Google Scholar 

  9. Pinoli, P., Chicco, D., Masseroli, M.: Latent Dirichlet allocation based on Gibbs sampling for gene function prediction. In: Proceedings of CIBCB - the IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1–8. IEEE (2014)

    Google Scholar 

  10. Chicco, D., Sadowski, P., Baldi, P.: Deep autoencoder neural networks for Gene Ontology annotation predictions. In: Proceedings of ACM BCB, pp. 533–540. ACM (2014)

    Google Scholar 

  11. Pinoli, P., Chicco, D., Masseroli, M.: Computational algorithms to predict Gene Ontology annotations. BMC Bioinformatics 16(Suppl. 6), S4, 1–15 (2015)

    Google Scholar 

  12. Chicco, D., Masseroli, M.: Ontology-based prediction and prioritization of gene function annotations. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(2), 248–260 (2016). IEEE

    Article  Google Scholar 

  13. Khatri, P., Done, B., Rao, A., Done, A., Draghici, S.: A semantic analysis of the annotations of the human genome. Bioinformatics 21(16), 3416–3421 (2005)

    Article  Google Scholar 

  14. Done, B., Khatri, P., Done, A., Draghici, S.: Semantic analysis of genome annotations using weighting schemes. In: Proceedings of CIBCB - the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp. 212–218. IET, Honolulu (2007)

    Google Scholar 

  15. Done, B., Khatri, P., Done, A., Draghici, S.: Predicting novel human Gene Ontology annotations using semantic analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 7(1), 91–99 (2010)

    Article  Google Scholar 

  16. King, O.D., Foulger, R.E., Dwight, S.S., White, J.V., Roth, F.P.: Predicting gene function from patterns of annotation. Genome Res. 13(5), 896–904 (2003)

    Article  Google Scholar 

  17. Tao, Y., Sam, L., Li, J., Friedman, C., Lussier, Y.A.: Information theory applied to the sparse Gene Ontology annotation network to predict novel gene function. Bioinformatics 23(13), 529–538 (2007)

    Article  Google Scholar 

  18. Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)

    Article  Google Scholar 

  19. Chicco, D.: Computational Prediction of Gene Functions through Machine Learning methods and Multiple Validation Procedures, Doctoral Thesis, Politecnico di Milano (2014)

    Google Scholar 

  20. Fawcett, T.: ROC graphs: notes and practical considerations for researchers. ReCALL 31(HPL–2003–4), 1–38 (2004)

    Google Scholar 

  21. Canakoglu, A., Ghisalberti, G., Masseroli, M.: Integration of biomolecular interaction data in a genomic and proteomic data warehouse to support biomedical knowledge discovery. In: Biganzoli, E., Vellido, A., Ambrogi, F., Tagliaferri, R. (eds.) CIBB 2011. LNCS, vol. 7548, pp. 112–126. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  22. Masseroli, M., Canakoglu, A., Ceri, S.: Integration and querying of genomic and proteomic semantic annotations for biomedical knowledge extraction. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(2), 209–219 (2016). IEEE

    Article  Google Scholar 

  23. Canakoglu, A., Ceri, S., Masseroli, M.: Biomolecular annotation integration and querying to help unveiling new biomedical knowledge. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2016. LNCS, vol. 9656, pp. 802–813. Springer, Heidelberg (2016)

    Chapter  Google Scholar 

  24. Genomic and Proteomic Knowledge Base (GPKB). http://www.bioinformatics.deib.polimi.it/GPKB/

  25. NCBI PubMed. http://www.ncbi.nlm.nih.gov/pubmed/

  26. Carbon, S., Ireland, A., Mungall, C.J., Shu, S., Marshall, B., Lewis, S.: AmiGO: online access to ontology and annotation data. Bioinformatics 25(2), 288–289 (2009)

    Article  Google Scholar 

  27. Rebhan, M., Chalifa-Caspi, V., Prilusky, J., Lancet, D.: GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14(88), 656–664 (1998)

    Article  Google Scholar 

  28. Chicco, D., Masseroli, M.: Software suite for gene and protein annotation prediction and similarity search. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 12(4), 837–843 (2015)

    Article  Google Scholar 

  29. Chicco, D.: Integration of bioinformatics web services through the Search Computing technology. Technical Report, TR 2012/02, Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milan, Italy

    Google Scholar 

  30. Masseroli, M., Picozzi, M., Ghisalberti, G., Ceri, S.: Explorative search of distributed bio-data to answer complex biomedical questions. BMC Bioinformatics 15(Suppl. 1), S3, 1–14 (2014)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by the “Data–Driven Genomic Computing (GenData 2020)” PRIN project (2013–2015), funded by Italy’s Ministry of Education, Universities and Research (MIUR). Authors thank Coby Viner (University of Toronto) for his help in the English proof-reading of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davide Chicco .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Chicco, D., Masseroli, M. (2016). Validation Pipeline for Computational Prediction of Genomics Annotations. In: Angelini, C., Rancoita, P., Rovetta, S. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2015. Lecture Notes in Computer Science(), vol 9874. Springer, Cham. https://doi.org/10.1007/978-3-319-44332-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44332-4_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44331-7

  • Online ISBN: 978-3-319-44332-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics