Abstract
This paper describes a novel approach to study bacterial relationships in soil datasets using probabilistic graphical models. We demonstrate how to access and reformat publicly available datasets in order to apply machine learning techniques. We first learn a Bayesian network in order to read independencies in linear time between bacterial community characteristics. These independencies are useful in understanding the semantic relationships between bacteria within communities. Next, we learn a Sum-Product network in order to perform inference in linear time. Here, inference can be conducted to answer traditional queries, involving posterior probabilities, or MPE queries, requesting the most likely values of the non-evidence variables given evidence. Our results extend the literature by showing that known relationships between soil bacteria holding in one or a few datasets in fact hold across at least 3500 diverse datasets. This study paves the way for future large-scale studies of agricultural, health, and environmental applications, for which data are publicly available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
de AraĂºjo, F., de AraĂºjo, A., Figueiredo, M.: Role of plant growth-promoting bacteria in sustainable agriculture. In: Sustainable Agriculture: Technology, Planning and Management. Nova Science Publishers, New York (2011)
Arndt, D., Xia, J., Liu, Y., Zhou, Y., Guo, A., Cruz, J., Sinelnikov, I., Budwill, K., Nesbø, C., Wishart, D.: Metagenassist: a comprehensive web server for comparative metagenomics. Nucleic Acids Res. 40(W1), W88–W95 (2012)
Bäckhed, F., Ley, R., Sonnenburg, J., Peterson, D., Gordon, J.: Long-term follow-up of colonoscopic fecal microbiota transplant for recurrent Clostridium difficile infection. Science 307(5717), 1915–1920 (2005)
Bai, Y., Zhou, X., Smith, D.: Enhanced soybean plant growth resulting from coinoculation of bacillus strains with Bradyrhizobium japonicum. Crop Sci. 43(5), 1774–1781 (2003)
Brandt, L., Aroniadis, O., Mellow, M., Kanatzar, A., Kelly, C., Park, T., Stollman, N., Rohlke, F., Surawicz, C.: Long-term follow-up of colonoscopic fecal microbiota transplant for recurrent Clostridium difficile infection. Am. J. Gastroenterol. 107, 1079–1087(2012)
Butz, C., Oliveira, J., dos Santos, A.: On learning the structure of sum-product networks. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 2997–3004 (2017)
Cooper, G.: The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42(2–3), 393–405 (1990)
Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9(4), 309–347 (1992)
Cover, T., Thomas, J.: Elements of Information Theory, 2nd edn. Wiley (2012)
Dagum, P., Luby, M.: Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artif. Intell. 60(1), 141–153 (1993)
Darwiche, A.: Modeling and Reasoning with Bayesian Networks. Cambridge University Press, Cambridge (2009)
Druzdzel, M.: SMILE: Structural modeling, inference, and learning engine and genie: a development environment for graphical decision-theoretic models (1999)
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley (2012)
Gens, R., Domingos, P.: Learning the structure of sum-product networks. In: Proceedings of the Thirtieth International Conference on Machine Learning, pp. 873–880 (2013)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
Gouda, S., Kerry, R., Das, G., Paramithiotis, S., Shin, H.S., Patra, J.: Revitalization of plant growth promoting rhizobacteria for sustainable development in agriculture. Microbiol. Res. 206, 131–140 (2017)
Hastie, T., Tibshirani, R., Friedman, J.: Overview of supervised learning. In: The Elements of Statistical Learning. Springer Series in Statistics, pp. 9–41. Springer, New York (2009). https://doi.org/10.1007/978-0-387-21606-5_2
Koller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Techniques. MIT Press (2009)
Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., Wilkening, J., Edwards, R.: The metagenomics rast server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinform. 9(1), 386 (2008)
Neapolitan, R.: Learning Bayesian Networks. Pearson Prentice Hall, Upper Saddle River (2004)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann (1988)
Poon, H., Domingos, P.: Sum-product networks: a new deep architecture. In: Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pp. 337–346 (2011)
Riesenfeld, C., Schloss, P., Handelsman, J.: Metagenomics: Genomic analysis of microbial communities. Annu. Rev. Genet. 38(1), 525–552 (2004)
Tokala, R., Strap, J., Jung, C., Crawford, D., Salove, M., Deobald, L., Bailey, J., Morra, M.: Novel plant-microbe rhizosphere interaction involving Streptomyces lydicus wyec108 and the pea plant (Pisum sativum). Appl. Environ. Microbiol. 68(5), 2161–2171 (2002)
Vergari, A., Di Mauro, N., Esposito, F.: Simplifying, regularizing and strengthening sum-product network structure learning. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 343–358 (2015)
Woolf, B.: The log likelihood ratio test (the G-test). Ann. Hum. Genet. 21(4), 397–409 (1957)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Butz, C.J., dos Santos, A.E., Oliveira, J.S., Stavrinides, J. (2018). Efficient Examination of Soil Bacteria Using Probabilistic Graphical Models. In: Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M. (eds) Recent Trends and Future Technology in Applied Intelligence. IEA/AIE 2018. Lecture Notes in Computer Science(), vol 10868. Springer, Cham. https://doi.org/10.1007/978-3-319-92058-0_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-92058-0_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92057-3
Online ISBN: 978-3-319-92058-0
eBook Packages: Computer ScienceComputer Science (R0)