Model Learning from Published Aggregated Data

Wojtusiak, Janusz; Baranova, Ancha

doi:10.1007/978-3-642-22913-8_17

Janusz Wojtusiak⁴ &
Ancha Baranova⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 375))

652 Accesses
3 Citations

Abstract

In many application domains, particularly in healthcare, an access for individual datapoints is limited, while data aggregated in form of means and standard deviations are widely available. This limitation is a result of many factors, including privacy laws that prevent clinicians and scientists from freely sharing individual patient data, inability to share proprietary business data, and inadequate data collection methods. Consequently, it prevents the use of the traditional machine learning methods for model construction. The problem is especially important if a study involves comparisons of multiple datasets, where each is derived from different open-access publications where data are represented in an aggregated form. This chapter describes the problem of machine learning of models from aggregated data as compared to traditional learning from individual examples. It presents a method of rule induction from such data as well as an application of this method to constructing of the predictive models for diagnosing liver complications of the metabolic syndrome – one of the most common chronic diseases in humans. Other possible applications of the method are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Annas, G.J.: HIPAA Regulations — A New Era of Medical-Record Privacy? New England Journal of Medicine 348, 1486–1490 (2003)
Article Google Scholar
Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Müller, K.: How to Explain Individual Classification Decisions. Journal of Machine Learning Research 11, 1803–1831 (2010)
Google Scholar
Burza, P., Weeber, M.: Literature-based Discovery. Springer, Heidelberg (2008)
Book Google Scholar
The Cochrane Collaboration, The Cochrane Manual 4 (2008) (updated August 14, 2008)
Google Scholar
Davies, F., Boruch, R.: The Campbell Collaboration Does for Public Policy what Cochrane Does for Health. BMJ 323, 294–295 (2001)
Article Google Scholar
De Raedt, L.: Logical and Relational Learning. Springer, Heidelberg (2008)
Book MATH Google Scholar
Diamond, C.C., Mostashari, F., Shirky, C.: Collecting And Sharing Data For Population Health: A New Paradigm. Health Affairs 28(2) (2009)
Google Scholar
Dietterich, T.G., Domingos, P., Getoor, L., Muggleton, S., Tadepalli, P.: Structured machine learning: the next ten years. Machine Learning 73(1), 3–23 (2008)
Article Google Scholar
Farrington, D.P., Petrosino, A.: The Campbell Collaboration Crime and Justice Group. Annals of the American Academy of Political and Social Science 578, 35–49 (2001)
Article Google Scholar
Fürnkranz, J.: Separate-and-conquer rule learning. Artificial Intelligence Review 13, 3–54 (1999)
Article MATH Google Scholar
Getoor, L., Taskar, B. (eds.): Introduction to statistical relational learning. MIT Press, Cambridge (2007)
MATH Google Scholar
Gordon, M., Lindsay, R.K., Fan, W.: Literature-Based Discovery on the World Wide Web. ACM Transactions on Internet Technology 2(4), 261–275 (2002)
Article Google Scholar
Higgins, J.P.T., Green, S. (eds.): Cochrane Handbook for Systematic Reviews of Interventions (2008), http://www.cochrane-handbook.org Version 5.0.0 (updated February 2008)
Hripcsak, G.: Writing Arden Syntax medical logic modules. Computers in Biol-ogy and Medicine 24(5), 331–363 (1994)
Article Google Scholar
Hunter, J.E., Schmidt, F.L.: Methods of Meta-Analysis, Correcting Error and Bias in Research Findings, 2nd edn. Sage Publications Inc., Thousand Oaks (2004)
Google Scholar
Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Ap-plications. Ellis Horwood, New York (1994)
Google Scholar
Lipsey, M.W., Wilson, D.: Practical Meta-Analysis. Sage Publications, Thousand Oaks (2000)
Google Scholar
Matwin, S., Kouznetsov, A., Inkpen, D., Frunza, O., O’Blenis, P.: A new algorithm for reducing the workload of experts in performing systematic re-views. Journal of the American Medical Informatics Association 17(4), 446–453 (2010)
Article Google Scholar
Michalski, R.S.: On the Quasi-Minimal Solution of the General Covering Prob-lem. In: Bled, Y. (ed.) Proceedings of the V International Symposium on Information Processing (FCIP 1969), vol. 3, pp. 125–128 (1969)
Google Scholar
Michalski, R.S.: A Theory and Methodology of Inductive Learning. In: Michalski, R.S., Carbonell, T.J., Mitchell, T.M. (eds.) Machine Learning: An Artificial Intelligence Approach, pp. 83–134. TIOGA Publishing Co, Palo Alto (1983)
Google Scholar
Michalski, R.S.: ATTRIBUTIONAL CALCULUS: A Logic and Representation Language for Natural Induction, Reports of the Machine Learning and Inference Laboratory, MLI 04-2, George Mason University. Fairfax, VA (2004)
Google Scholar
Michalski, R.S., Wojtusiak, J.: Reasoning with Missing, Not-applicable and Irrelevant Meta-values in Concept Learning and Pattern Discovery, Technical Report 2005-02, Collaborative Research Center 637, University of Bremen, Germany (2005)
Google Scholar
Michalski, R.S., Wojtusiak, J.: Semantic and Syntactic Attribute Types in AQ Learning, Reports of the Machine Learning and Inference Laboratory, MLI 07-1, George Mason University. Fairfax, VA (2007)
Google Scholar
Michalski, R.S., Wojtusiak, J.: The Distribution Approximation Approach to Learning from Aggregated Data, Reports of the Machine Learning and Inference Laboratory, MLI 08-2, George Mason University. Fairfax, VA (2008)
Google Scholar
Muggleton, S.H., De Raedt, L.: Inductive logic programming: Theory and me-thods. Journal of Logic Programming 19(20), 629–679 (1994)
Article MathSciNet Google Scholar
Perlich, C., Provost, F.: Distribution-based aggregation for relational learning with identifier attributes. Machine Learning 62, 65–105 (2006)
Article Google Scholar
Poynard, T., Ratziu, V., Charlotte, F., Messous, D., Munteanu, M., Imbert-Bismut, F., Massard, J., Bonyhay, L., Tahiri, M., Thabut, D., Cadranel, J.F., Le Bail, B., de Ledinghen, V.: LIDO Study Group, CYTOL study group, Diagnostic value of bi-ochemical markers (NashTest) for the prediction of non alcoholo steato hepatitis in patients with non-alcoholic fatty liver disease. BMC Gastroenterology 6(34) (2006)
Google Scholar
Vens, C.: Complex aggregates in relational learning. AI Communications 21, 219–220 (2008)
MathSciNet Google Scholar
Verschuuren, M., Badeyan, G., Carnicero, J., Gissler, M., Asciak, R.P., Sakkeus, L., Stenbeck, M., Devillé, W.: and For The Work Group on Confidentiality and Data Protection of the Network of Competent Authorities of the Health Information and Knowledge Strand of the EU Public Health Programme (August 2003) ; The European data protection legislation and its consequences for public health monitoring: a plea for action. European Journal of Public Health 18(6), 550–551 (2008) doi:10.1093/eurpub/ckn014
Google Scholar
Weeber, M., Kors, J.A., Mons, B.: Online tools to support literature-based discov-ery in the life sciences. Briefings in Bioinformatics 6(3), 277–286 (2005)
Article Google Scholar
Wojtusiak, J.: AQ21 User’s Guide, Reports of the Machine Learning and Infe-rence Laboratory, MLI 04-3, George Mason University. Fairfax, VA (2004)
Google Scholar
Wojtusiak, J., Michalski, R.S., Kaufman, K., Pietrzykowski, J.: The AQ21 Natural Induction Program for Pattern Discovery: Initial Version and its Novel Features. In: Proceedings of The 18th IEEE International Conference on Tools with Artificial Intelligence, Washington D.C (2006)
Google Scholar
Wojtusiak, J., Michalski, R.S., Simanivanh, T., Baranova, A.V.: The Natural Induction System AQ21 and Its Application to Data Describing Patients with Metabolic Syndrome: Initial Results. In: Proceedings of the International Conference on Machine Learning and Applications, Cincinnati, OH (2007)
Google Scholar
Wojtusiak, J., Michalski, R.S., Simanivanh, T., Baranova, A.V.: Towards application of rule learning to the meta-analysis of clinical data: An example of the metabolic syndrome. International Journal of Medical In-formatics 78(12), e104–e111(2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Health Administration and Policy, George Mason University Northeast Module, Room 108 4400 University Drive, MSN 1J3, Fairfax, VA, 22030, USA
Janusz Wojtusiak
The Center for Biomedical Genomics, Room 182 Discovery Hall, MSN 4D7 10900 University Blvd, Manassas, VA, 20110, USA
Ancha Baranova

Authors

Janusz Wojtusiak
View author publications
You can also search for this author in PubMed Google Scholar
Ancha Baranova
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of New York Tirana, Rr. Komuna E Parisit,, Tirana, Albania
Marenglen Biba
Technical University of Catalonia, Campus Nord, Ed. Omega, C/Jordi Girona 1-3, 08034, Barcelona, Spain
Fatos Xhafa

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wojtusiak, J., Baranova, A. (2011). Model Learning from Published Aggregated Data. In: Biba, M., Xhafa, F. (eds) Learning Structure and Schemas from Documents. Studies in Computational Intelligence, vol 375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22913-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-22913-8_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22912-1
Online ISBN: 978-3-642-22913-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics