Skip to main content

Model Learning from Published Aggregated Data

  • Chapter
Learning Structure and Schemas from Documents

Part of the book series: Studies in Computational Intelligence ((SCI,volume 375))

Abstract

In many application domains, particularly in healthcare, an access for individual datapoints is limited, while data aggregated in form of means and standard deviations are widely available. This limitation is a result of many factors, including privacy laws that prevent clinicians and scientists from freely sharing individual patient data, inability to share proprietary business data, and inadequate data collection methods. Consequently, it prevents the use of the traditional machine learning methods for model construction. The problem is especially important if a study involves comparisons of multiple datasets, where each is derived from different open-access publications where data are represented in an aggregated form. This chapter describes the problem of machine learning of models from aggregated data as compared to traditional learning from individual examples. It presents a method of rule induction from such data as well as an application of this method to constructing of the predictive models for diagnosing liver complications of the metabolic syndrome – one of the most common chronic diseases in humans. Other possible applications of the method are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Annas, G.J.: HIPAA Regulations — A New Era of Medical-Record Privacy? New England Journal of Medicine 348, 1486–1490 (2003)

    Article  Google Scholar 

  2. Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Müller, K.: How to Explain Individual Classification Decisions. Journal of Machine Learning Research 11, 1803–1831 (2010)

    Google Scholar 

  3. Burza, P., Weeber, M.: Literature-based Discovery. Springer, Heidelberg (2008)

    Book  Google Scholar 

  4. The Cochrane Collaboration, The Cochrane Manual 4 (2008) (updated August 14, 2008)

    Google Scholar 

  5. Davies, F., Boruch, R.: The Campbell Collaboration Does for Public Policy what Cochrane Does for Health. BMJ 323, 294–295 (2001)

    Article  Google Scholar 

  6. De Raedt, L.: Logical and Relational Learning. Springer, Heidelberg (2008)

    Book  MATH  Google Scholar 

  7. Diamond, C.C., Mostashari, F., Shirky, C.: Collecting And Sharing Data For Population Health: A New Paradigm. Health Affairs 28(2) (2009)

    Google Scholar 

  8. Dietterich, T.G., Domingos, P., Getoor, L., Muggleton, S., Tadepalli, P.: Structured machine learning: the next ten years. Machine Learning 73(1), 3–23 (2008)

    Article  Google Scholar 

  9. Farrington, D.P., Petrosino, A.: The Campbell Collaboration Crime and Justice Group. Annals of the American Academy of Political and Social Science 578, 35–49 (2001)

    Article  Google Scholar 

  10. Fürnkranz, J.: Separate-and-conquer rule learning. Artificial Intelligence Review 13, 3–54 (1999)

    Article  MATH  Google Scholar 

  11. Getoor, L., Taskar, B. (eds.): Introduction to statistical relational learning. MIT Press, Cambridge (2007)

    MATH  Google Scholar 

  12. Gordon, M., Lindsay, R.K., Fan, W.: Literature-Based Discovery on the World Wide Web. ACM Transactions on Internet Technology 2(4), 261–275 (2002)

    Article  Google Scholar 

  13. Higgins, J.P.T., Green, S. (eds.): Cochrane Handbook for Systematic Reviews of Interventions (2008), http://www.cochrane-handbook.org Version 5.0.0 (updated February 2008)

  14. Hripcsak, G.: Writing Arden Syntax medical logic modules. Computers in Biol-ogy and Medicine 24(5), 331–363 (1994)

    Article  Google Scholar 

  15. Hunter, J.E., Schmidt, F.L.: Methods of Meta-Analysis, Correcting Error and Bias in Research Findings, 2nd edn. Sage Publications Inc., Thousand Oaks (2004)

    Google Scholar 

  16. Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Ap-plications. Ellis Horwood, New York (1994)

    Google Scholar 

  17. Lipsey, M.W., Wilson, D.: Practical Meta-Analysis. Sage Publications, Thousand Oaks (2000)

    Google Scholar 

  18. Matwin, S., Kouznetsov, A., Inkpen, D., Frunza, O., O’Blenis, P.: A new algorithm for reducing the workload of experts in performing systematic re-views. Journal of the American Medical Informatics Association 17(4), 446–453 (2010)

    Article  Google Scholar 

  19. Michalski, R.S.: On the Quasi-Minimal Solution of the General Covering Prob-lem. In: Bled, Y. (ed.) Proceedings of the V International Symposium on Information Processing (FCIP 1969), vol. 3, pp. 125–128 (1969)

    Google Scholar 

  20. Michalski, R.S.: A Theory and Methodology of Inductive Learning. In: Michalski, R.S., Carbonell, T.J., Mitchell, T.M. (eds.) Machine Learning: An Artificial Intelligence Approach, pp. 83–134. TIOGA Publishing Co, Palo Alto (1983)

    Google Scholar 

  21. Michalski, R.S.: ATTRIBUTIONAL CALCULUS: A Logic and Representation Language for Natural Induction, Reports of the Machine Learning and Inference Laboratory, MLI 04-2, George Mason University. Fairfax, VA (2004)

    Google Scholar 

  22. Michalski, R.S., Wojtusiak, J.: Reasoning with Missing, Not-applicable and Irrelevant Meta-values in Concept Learning and Pattern Discovery, Technical Report 2005-02, Collaborative Research Center 637, University of Bremen, Germany (2005)

    Google Scholar 

  23. Michalski, R.S., Wojtusiak, J.: Semantic and Syntactic Attribute Types in AQ Learning, Reports of the Machine Learning and Inference Laboratory, MLI 07-1, George Mason University. Fairfax, VA (2007)

    Google Scholar 

  24. Michalski, R.S., Wojtusiak, J.: The Distribution Approximation Approach to Learning from Aggregated Data, Reports of the Machine Learning and Inference Laboratory, MLI 08-2, George Mason University. Fairfax, VA (2008)

    Google Scholar 

  25. Muggleton, S.H., De Raedt, L.: Inductive logic programming: Theory and me-thods. Journal of Logic Programming 19(20), 629–679 (1994)

    Article  MathSciNet  Google Scholar 

  26. Perlich, C., Provost, F.: Distribution-based aggregation for relational learning with identifier attributes. Machine Learning 62, 65–105 (2006)

    Article  Google Scholar 

  27. Poynard, T., Ratziu, V., Charlotte, F., Messous, D., Munteanu, M., Imbert-Bismut, F., Massard, J., Bonyhay, L., Tahiri, M., Thabut, D., Cadranel, J.F., Le Bail, B., de Ledinghen, V.: LIDO Study Group, CYTOL study group, Diagnostic value of bi-ochemical markers (NashTest) for the prediction of non alcoholo steato hepatitis in patients with non-alcoholic fatty liver disease. BMC Gastroenterology 6(34) (2006)

    Google Scholar 

  28. Vens, C.: Complex aggregates in relational learning. AI Communications 21, 219–220 (2008)

    MathSciNet  Google Scholar 

  29. Verschuuren, M., Badeyan, G., Carnicero, J., Gissler, M., Asciak, R.P., Sakkeus, L., Stenbeck, M., Devillé, W.: and For The Work Group on Confidentiality and Data Protection of the Network of Competent Authorities of the Health Information and Knowledge Strand of the EU Public Health Programme (August 2003) ; The European data protection legislation and its consequences for public health monitoring: a plea for action. European Journal of Public Health 18(6), 550–551 (2008) doi:10.1093/eurpub/ckn014

    Google Scholar 

  30. Weeber, M., Kors, J.A., Mons, B.: Online tools to support literature-based discov-ery in the life sciences. Briefings in Bioinformatics 6(3), 277–286 (2005)

    Article  Google Scholar 

  31. Wojtusiak, J.: AQ21 User’s Guide, Reports of the Machine Learning and Infe-rence Laboratory, MLI 04-3, George Mason University. Fairfax, VA (2004)

    Google Scholar 

  32. Wojtusiak, J., Michalski, R.S., Kaufman, K., Pietrzykowski, J.: The AQ21 Natural Induction Program for Pattern Discovery: Initial Version and its Novel Features. In: Proceedings of The 18th IEEE International Conference on Tools with Artificial Intelligence, Washington D.C (2006)

    Google Scholar 

  33. Wojtusiak, J., Michalski, R.S., Simanivanh, T., Baranova, A.V.: The Natural Induction System AQ21 and Its Application to Data Describing Patients with Metabolic Syndrome: Initial Results. In: Proceedings of the International Conference on Machine Learning and Applications, Cincinnati, OH (2007)

    Google Scholar 

  34. Wojtusiak, J., Michalski, R.S., Simanivanh, T., Baranova, A.V.: Towards application of rule learning to the meta-analysis of clinical data: An example of the metabolic syndrome. International Journal of Medical In-formatics 78(12), e104–e111(2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wojtusiak, J., Baranova, A. (2011). Model Learning from Published Aggregated Data. In: Biba, M., Xhafa, F. (eds) Learning Structure and Schemas from Documents. Studies in Computational Intelligence, vol 375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22913-8_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22913-8_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22912-1

  • Online ISBN: 978-3-642-22913-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics