Skip to main content

Rule Mining Techniques to Predict Prokaryotic Metabolic Pathways

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1613))

Abstract

It is becoming more evident that computational methods are needed for the identification and the mapping of pathways in new genomes. We introduce an automatic annotation system (ARBA4Path Association Rule-Based Annotator for Pathways) that utilizes rule mining techniques to predict metabolic pathways across wide range of prokaryotes. It was demonstrated that specific combinations of protein domains (recorded in our rules) strongly determine pathways in which proteins are involved and thus provide information that let us very accurately assign pathway membership (with precision of 0.999 and recall of 0.966) to proteins of a given prokaryotic taxon. Our system can be used to enhance the quality of automatically generated annotations as well as annotating proteins with unknown function. The prediction models are represented in the form of human-readable rules, and they can be used effectively to add absent pathway information to many proteins in UniProtKB/TrEMBL database.

*These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Kretschmann E, Fleischmann W, Apweiler R (2001) Automatic rule generation for protein annotation with the c4.5 data mining algorithm applied on swiss-prot. Bioinformatics 17(10):920–926. doi:10.1093/bioinformatics/17.10.920

    Article  CAS  PubMed  Google Scholar 

  2. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA

    Google Scholar 

  3. The UniProt Consortium (2015) Uniprot: a hub for protein information. Nucleic Acids Res 43(D1):D204–D212. doi:10.1093/nar/gku989

    Article  Google Scholar 

  4. Biswas M, O’Rourke JF, Camon E, Fraser G, Kanapin A, Karavidopoulou Y, Kersey P, Kriventseva E, Mittard V, Mulder N, Phan I, Servant F, Apweiler R (2002) Applications of interpro in protein annotation and genome analysis. Brief Bioinform 3(3):285–295. doi:10.1093/bib/3.3.285

    Article  CAS  PubMed  Google Scholar 

  5. Pedruzzi I, Rivoire C, Auchincloss AH, Coudert E, Keller G, de Castro E, Baratin D, Cuche BA, Bougueleret L, Poux S, Redaschi N, Xenarios I, Bridge A, The UniProt Consortium (2013) Hamap in 2013, new developments in the protein family classification and annotation system. Nucleic Acids Res 41(D1):D584–D589. doi:10.1093/nar/gks1157

    Article  CAS  PubMed  Google Scholar 

  6. Muller S, Leser U, Fleischmann W, Apweiler R (1999) Edittotrembl: a distributed approach to high-quality automated protein sequence annotation. Bioinformatics 15(3):219–227. doi:10.1093/bioinformatics/15.3.219

    Article  Google Scholar 

  7. Wu CH, Huang H, Arminski L, Castro-Alvear J, Chen Y, Hu ZZ, Ledley RS, Lewis KC, Mewes HW, Orcutt BC, Suzek BE, Tsugita A, Vinayaka CR, Yeh LSL, Zhang J, Barker WC (2002) The protein information resource: an integrated public resource of functional annotation of proteins. Nucleic Acids Res 30(1):35–37. doi:10.1093/nar/30.1.35

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Campbell N, Reece J (2002) Biology. In: Addison-Wesley world student series, vol 1. Benjamin Cummings, San Francisco, CA, USA

    Google Scholar 

  9. Chen X, Xu J, Huang B, Li J, Wu X, Ma L, Jia X, Bian X, Tan F, Liu L, Chen S, Li X (2011) A sub-pathway-based approach for identifying drug response principal network. Bioinformatics 27(5):649–654. doi:10.1093/bioinformatics/btq714

    Article  CAS  PubMed  Google Scholar 

  10. Chen Y, Hu Y, Zhou T, Zhou KK, Mott R, Wu M, Boulton M, Lyons TJ, Gao G, Ma JX (2009) Activation of the wnt pathway plays a pathogenic role in diabetic retinopathy in humans and animal models. Am J Pathol 175(6):2676–2685. doi:10.2353/ajpath.2009.080945

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Silberberg Y, Gottlieb A, Kupiec M, Ruppin E, Sharan R (2012) Large-scale elucidation of drug response pathways in humans. J Comput Biol 19(2):163–174. doi:10.1089/cmb.2011.0264

    Article  CAS  PubMed Central  Google Scholar 

  12. Parkes M, Cortes A, van Heel DA, Brown MA (2013) Genetic insights into common pathways and complex relationships among immune-mediated diseases. Nat Rev Genet 14(9):661–673. doi:10.1038/nrg3502

    Article  CAS  PubMed  Google Scholar 

  13. Bebek G, Yang J (2007) Pathfinder: mining signal transduction pathway segments from protein-protein interaction networks. BMC Bioinformatics 8(1):335. doi:10.1186/1471-2105-8-335

    Article  PubMed  PubMed Central  Google Scholar 

  14. Klopman G, Tu M, Talafous J (1997) Meta. 3. A genetic algorithm for metabolic transform priorities optimization. J Chem Inf Comput Sci 37(2):329–334. doi:10.1021/ci9601123

    Article  CAS  PubMed  Google Scholar 

  15. Jaworska J, Dimitrov S, Nikolova N, Mekenyan O (2002) Probabilistic assessment of biodegradability based on metabolic pathways: catabol system. SAR QSAR Environ Res 13(2):307–323. doi:10.1080/10629360290002794

    Article  CAS  PubMed  Google Scholar 

  16. Hou B, Ellis L, Wackett L (2004) Encoding microbial metabolic logic: predicting biodegradation. J Ind Microbiol Biotechnol 31(6):261–272. doi:10.1007/s10295-004-0144-7

    Article  CAS  PubMed  Google Scholar 

  17. Button WG, Judson PN, Long A, Vessey JD (2003) Using absolute and relative reasoning in the prediction of the potential metabolism of xenobiotics. J Chem Inf Comput Sci 43(5):1371–1377. doi:10.1021/ci0202739

    Article  CAS  PubMed  Google Scholar 

  18. Karp P, Latendresse M, Caspi R (2011) The pathway tools pathway prediction algorithm. Stand Genomic Sci 5(3):424–429

    Article  PubMed Central  Google Scholar 

  19. Karp PD, Riley M, Saier M, Paulsen IT, Paley SM, Pellegrini-Toole A (2000) The ecocyc and metacyc databases. Nucleic Acids Res 28(1):56–59. doi:10.1093/nar/28.1

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Dale J, Popescu L, Karp P (2010) Machine learning methods for metabolic pathway prediction. BMC Bioinformatics 11(1):15. doi:10.1186/1471-2105-11-15

    Article  PubMed  PubMed Central  Google Scholar 

  21. Creighton C, Hanash S (2003) Mining gene expression databases for association rules. Bioinformatics 19(1):79–86. doi:10.1093/bioinformatics/19.1.79

    Article  CAS  PubMed  Google Scholar 

  22. Georgii E, Richter L, Rckert U, Kramer S (2005) Analyzing microarray data using quantitative association rules. Bioinformatics 21(suppl 2):ii123–ii129. doi:10.1093/bioinformatics/bti1121

    Article  CAS  PubMed  Google Scholar 

  23. Bodenreider O, Aubry M, Burgun A (2005) Non-lexical approaches to identifying associative relations in the gene ontology. In: Altman RB, Jung TA, Klein TE, Dunker AK, Hunter L (eds) Pacific symposium on biocomputing, World Scientific, pp 104–115

    Google Scholar 

  24. Artamonova II, Frishman G, Gelfand MS, Frishman D (2005) Mining sequence annotation databanks for association patterns. Bioinformatics 21(Suppl 3):iii49–iii57. doi:10.1093/bioinformatics/bti1206

    Article  CAS  PubMed  Google Scholar 

  25. Boudellioua I, Saidi R, Hoehndorf R, Martin MJ, Solovyev V (2016) Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining. PLOS ONE 11(7)

    Google Scholar 

  26. The InterPro Consortium, Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley R, Courcelle E, Durbin R, Falquet L, Fleischmann W, Gouzy J, Griffith-Jones S, Haft D, Hermjakob H, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Orchard S, Pagni M, Peyruc D, Ponting CP, Servant F, Sigrist CJA (2002) Interpro: an integrated documentation resource for protein families, domains and functional sites. Brief Bioinform 3(3):225–235. doi:10.1093/bib/3.3.225

    Article  Google Scholar 

  27. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Bocca JB, Jarke M, Zaniolo C (eds) VLDB 94, proceedings of 20th international conference on very large data bases, September 12–15, 1994, Morgan Kaufmann, Santiago de Chile, Chile, pp 487–499

    Google Scholar 

  28. Bouker S, Saidi R, Yahia SB, Nguifo EM (2012) Ranking and selecting association rules based on dominance relationship. In: IEEE 24th international conference on tools with artificial intelligence, ICTAI 2012, Athens, Greece, November 7–9, 2012, pp 658–665. doi:10.1109/ICTAI.2012.94

  29. Bouker S, Saidi R, Yahia SB, Nguifo EM (2014) Mining undominated association rules through interestingness measures. Int J Artif Intell Tools 23(4). doi:10.1142/S0218213014600112

  30. Borgelt C, Kruse R (2002) Induction of association rules: apriori implementation. In: Proceedings of the 15th conference on computational statistics (COMPSTAT), Physica Verlag, pp 395–400

    Google Scholar 

  31. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, VLDB 94, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 487–499

    Google Scholar 

  32. Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proceedings of the 1st IEEE ICDM workshop on frequent item set mining implementations (FIMI 2003, Melbourne, FL). CEUR workshop proceedings 90, p 90

    Google Scholar 

  33. Borgelt C (2004) Recursion pruning for the apriori algorithm. In: Bayardo RJ Jr., Goethals B, Zaki MJ (eds) FIMI, CEUR workshop proceedings, vol. 126. CEUR-WS.org

  34. Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. In: Proceedings of the 1997 ACM SIGMOD international conference on management of data, SIGMOD 97, ACM, New York, NY, pp 265–276. doi:10.1145/253260.253327

  35. Kirsch A, Mitzenmacher M, Pietracaprina A, Pucci G, Upfal E, Vandin F (2009) An efficient rigorous approach for identifying statistically significant frequent itemsets. In: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS 09, ACM, New York, NY, pp 117–126. doi:10.1145/1559795.1559814

  36. Huntley RP, White O, Blake JA, Lewis SE, Giglio M (2014) Standardized description of scientific evidence using the evidence ontology (eco). Database 2014. doi:10.1093/database/bau075

  37. Pesquita C, Faria D, Falco AO, Lord P, Couto FM (2009) Semantic similarity in biomedical ontologies. PLoS Comput Biol 5(7):e1000443. doi:10.1371/journal.pcbi.1000443

    Article  PubMed  PubMed Central  Google Scholar 

  38. The Gene Ontology Consortium (2015) Gene ontology consortium: going forward. Nucleic Acids Res 43(D1):D1049–D1056. doi:10.1093/nar/gku1179

    Article  Google Scholar 

  39. Harispe S, Ranwez S, Janaqi S, Montmain J (2014) The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies. Bioinformatics 30(5):740–742. doi:10.1093/bioinformatics/btt581

    Article  CAS  PubMed  Google Scholar 

  40. Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th international joint conference on artificial intelligence, IJCAI’95, vol 1, Morgan Kaufmann Publishers Inc., San Francisco, CA, pp. 448–453

    Google Scholar 

Download references

Acknowledgments

The second author conducted this work as part of a research internship at the European Bioinformatics Institute, UniProt team. The funding for this internship was provided by King Abdullah University of Science and Technology. The authors would also like to thank UniProt Consortium for their valuable support and feedback on the development of this work.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rabie Saidi or Victor Solovyev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Saidi, R., Boudellioua, I., Martin, M.J., Solovyev, V. (2017). Rule Mining Techniques to Predict Prokaryotic Metabolic Pathways. In: Tatarinova, T., Nikolsky, Y. (eds) Biological Networks and Pathway Analysis. Methods in Molecular Biology, vol 1613. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7027-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7027-8_12

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7025-4

  • Online ISBN: 978-1-4939-7027-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics