Efficient Examination of Soil Bacteria Using Probabilistic Graphical Models

  • Cory J. ButzEmail author
  • André E. dos Santos
  • Jhonatan S. Oliveira
  • John Stavrinides
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10868)


This paper describes a novel approach to study bacterial relationships in soil datasets using probabilistic graphical models. We demonstrate how to access and reformat publicly available datasets in order to apply machine learning techniques. We first learn a Bayesian network in order to read independencies in linear time between bacterial community characteristics. These independencies are useful in understanding the semantic relationships between bacteria within communities. Next, we learn a Sum-Product network in order to perform inference in linear time. Here, inference can be conducted to answer traditional queries, involving posterior probabilities, or MPE queries, requesting the most likely values of the non-evidence variables given evidence. Our results extend the literature by showing that known relationships between soil bacteria holding in one or a few datasets in fact hold across at least 3500 diverse datasets. This study paves the way for future large-scale studies of agricultural, health, and environmental applications, for which data are publicly available.


Probabilistic graphical models Deep learning Soil Bacteria 


  1. 1.
    de Araújo, F., de Araújo, A., Figueiredo, M.: Role of plant growth-promoting bacteria in sustainable agriculture. In: Sustainable Agriculture: Technology, Planning and Management. Nova Science Publishers, New York (2011)Google Scholar
  2. 2.
    Arndt, D., Xia, J., Liu, Y., Zhou, Y., Guo, A., Cruz, J., Sinelnikov, I., Budwill, K., Nesbø, C., Wishart, D.: Metagenassist: a comprehensive web server for comparative metagenomics. Nucleic Acids Res. 40(W1), W88–W95 (2012)CrossRefGoogle Scholar
  3. 3.
    Bäckhed, F., Ley, R., Sonnenburg, J., Peterson, D., Gordon, J.: Long-term follow-up of colonoscopic fecal microbiota transplant for recurrent Clostridium difficile infection. Science 307(5717), 1915–1920 (2005)CrossRefGoogle Scholar
  4. 4.
    Bai, Y., Zhou, X., Smith, D.: Enhanced soybean plant growth resulting from coinoculation of bacillus strains with Bradyrhizobium japonicum. Crop Sci. 43(5), 1774–1781 (2003)CrossRefGoogle Scholar
  5. 5.
    Brandt, L., Aroniadis, O., Mellow, M., Kanatzar, A., Kelly, C., Park, T., Stollman, N., Rohlke, F., Surawicz, C.: Long-term follow-up of colonoscopic fecal microbiota transplant for recurrent Clostridium difficile infection. Am. J. Gastroenterol. 107, 1079–1087(2012)Google Scholar
  6. 6.
    Butz, C., Oliveira, J., dos Santos, A.: On learning the structure of sum-product networks. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 2997–3004 (2017)Google Scholar
  7. 7.
    Cooper, G.: The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42(2–3), 393–405 (1990)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9(4), 309–347 (1992)zbMATHGoogle Scholar
  9. 9.
    Cover, T., Thomas, J.: Elements of Information Theory, 2nd edn. Wiley (2012)Google Scholar
  10. 10.
    Dagum, P., Luby, M.: Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artif. Intell. 60(1), 141–153 (1993)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Darwiche, A.: Modeling and Reasoning with Bayesian Networks. Cambridge University Press, Cambridge (2009)CrossRefGoogle Scholar
  12. 12.
    Druzdzel, M.: SMILE: Structural modeling, inference, and learning engine and genie: a development environment for graphical decision-theoretic models (1999)Google Scholar
  13. 13.
    Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley (2012)Google Scholar
  14. 14.
    Gens, R., Domingos, P.: Learning the structure of sum-product networks. In: Proceedings of the Thirtieth International Conference on Machine Learning, pp. 873–880 (2013)Google Scholar
  15. 15.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)Google Scholar
  16. 16.
    Gouda, S., Kerry, R., Das, G., Paramithiotis, S., Shin, H.S., Patra, J.: Revitalization of plant growth promoting rhizobacteria for sustainable development in agriculture. Microbiol. Res. 206, 131–140 (2017)CrossRefGoogle Scholar
  17. 17.
    Hastie, T., Tibshirani, R., Friedman, J.: Overview of supervised learning. In: The Elements of Statistical Learning. Springer Series in Statistics, pp. 9–41. Springer, New York (2009).
  18. 18.
    Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press (2009)Google Scholar
  19. 19.
    Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., Wilkening, J., Edwards, R.: The metagenomics rast server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinform. 9(1), 386 (2008)CrossRefGoogle Scholar
  20. 20.
    Neapolitan, R.: Learning Bayesian Networks. Pearson Prentice Hall, Upper Saddle River (2004)Google Scholar
  21. 21.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann (1988)Google Scholar
  22. 22.
    Poon, H., Domingos, P.: Sum-product networks: a new deep architecture. In: Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pp. 337–346 (2011)Google Scholar
  23. 23.
    Riesenfeld, C., Schloss, P., Handelsman, J.: Metagenomics: Genomic analysis of microbial communities. Annu. Rev. Genet. 38(1), 525–552 (2004)CrossRefGoogle Scholar
  24. 24.
    Tokala, R., Strap, J., Jung, C., Crawford, D., Salove, M., Deobald, L., Bailey, J., Morra, M.: Novel plant-microbe rhizosphere interaction involving Streptomyces lydicus wyec108 and the pea plant (Pisum sativum). Appl. Environ. Microbiol. 68(5), 2161–2171 (2002)CrossRefGoogle Scholar
  25. 25.
    Vergari, A., Di Mauro, N., Esposito, F.: Simplifying, regularizing and strengthening sum-product network structure learning. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 343–358 (2015)Google Scholar
  26. 26.
    Woolf, B.: The log likelihood ratio test (the G-test). Ann. Hum. Genet. 21(4), 397–409 (1957)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Cory J. Butz
    • 1
    Email author
  • André E. dos Santos
    • 1
  • Jhonatan S. Oliveira
    • 1
  • John Stavrinides
    • 2
  1. 1.Department of Computer ScienceUniversity of ReginaReginaCanada
  2. 2.Department of BiologyUniversity of ReginaReginaCanada

Personalised recommendations