Skip to main content

Health Services Data: Big Data Analytics for Deriving Predictive Healthcare Insights

  • Reference work entry
  • First Online:
Health Services Evaluation

Part of the book series: Health Services Research ((HEALTHSR))

Abstract

This chapter describes the application of big data analytics in healthcare, particularly on electronic healthcare records so as to make predictive models for healthcare outcomes and discover interesting insights. A typical workflow for such predictive analytics involves data collection, data transformation, predictive modeling, evaluation, and deployment, with each step tailored to the end goals of the project. To illustrate each of these steps, we shall take the example of recent advances in such predictive analytics on lung cancer data from the Surveillance, Epidemiology, and End Results (SEER) program. This includes the construction of accurate predictive models for lung cancer survival, development of a lung cancer outcome calculator deploying the predictive models, and association rule mining on that data for bottom-up discovery of interesting insights. The lung cancer outcome calculator illustrated here is available at http://info.eecs.northwestern.edu/LungCancerOutcomeCalculator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 649.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 899.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Agrawal A, Choudhary A. Association rule mining based hotspot analysis on seer lung cancer data. Int J Knowl Discov Bioinform (IJKDB). 2011a;2(2):34–54.

    Article  Google Scholar 

  • Agrawal A, Choudhary A. Identifying hotspots in lung cancer data using association rule mining. In: 2nd IEEE ICDM workshop on biological data mining and its applications in healthcare (BioDM); 2011b. p. 995–1002.

    Google Scholar 

  • Agrawal A, Choudhary A. Perspective: materials informatics and big data: realization of the fourth paradigm of science in materials science. APL Mater. 2016;4(053208):1–10.

    Google Scholar 

  • Agrawal A, Huang X. Psiblast pairwisestatsig: reordering psi-blast hits using pairwise statistical significance. Bioinformatics. 2009;25(8):1082–3.

    Article  CAS  Google Scholar 

  • Agrawal A, Huang X. Pairwise statistical significance of local sequence alignment using sequence- specific and position-specific substitution matrices. IEEE/ACM Trans Comput Biol Bioinformatics. 2011;8(1):194–205.

    Article  Google Scholar 

  • Agrawal A, Misra S, Narayanan R, Polepeddi L, Choudhary A. A lung cancer outcome calculator using ensemble data mining on seer data. In: Proceedings of the tenth international workshop on data mining in bioinformatics (BIOKDD), New York: ACM; 2011. p. 1–9.

    Google Scholar 

  • Agrawal A, Misra S, Narayanan R, Polepeddi L, Choudhary A. Lung cancer survival prediction using ensemble data mining on seer data. Sci Program. 2012;20(1):29–42.

    Google Scholar 

  • Agrawal A, Patwary M, Hendrix W, Liao WK, Choudhary A. High performance big data clustering. IOS Press; 2013a. p. 192–211.

    Google Scholar 

  • Agrawal A, Al-Bahrani R, Merkow R, Bilimoria K, Choudhary A. “Colon surgery outcome prediction using acs nsqip data,” In: Proceedings of the KDD workshop on Data Mining for Healthcare (DMH); 2013b. p. 1–6.

    Google Scholar 

  • Agrawal A, Al-Bahrani R, Raman J, Russo MJ, Choudhary A. Lung transplant outcome prediction using unos data. In: Proceedings of the IEEE big data workshop on Bioinformatics and Health Informatics (BHI); 2013c. p. 1–8.

    Google Scholar 

  • Andreu-Perez J, Leff DR, Ip H, Yang G-Z. From wearable sensors to smart implants – toward pervasive and personalized healthcare. IEEE Trans Biomed Eng. 2015;62(12):2750–62.

    Article  Google Scholar 

  • Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): the tripod statement. Ann Intern Med. 2015;162(1):55–63.

    Article  Google Scholar 

  • Ganguly AR, Kodra E, Agrawal A, Banerjee A, Boriah S, Chatterjee S, Chatterjee S, Choudhary A, Das D, Faghmous J, Ganguli P, Ghosh S, Hayhoe K, Hays C, Hendrix W, Fu Q, Kawale J, Kumar D, Kumar V, Liao WK, Liess S, Mawalagedara R, Mithal V, Oglesby R, Salvi K, Snyder PK, Steinhaeuser K, Wang D, Wuebbles D. Toward enhanced understanding and projections of climate extremes using physics-guided data mining techniques. Nonlinear Process Geophys. 2014;21:777–95.

    Article  Google Scholar 

  • Hey T, Tansley S, Tolle K, editors. The fourth paradigm: data-intensive scientific discovery. Redmond: Microsoft Research; 2009.

    Google Scholar 

  • Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, Hill DP, Kania R, Schaeffer M, Pierre SS, et al. Big data: the future of biocuration. Nature. 2008;455(7209):47–50.

    Article  CAS  Google Scholar 

  • Huang X, Madan A. Cap3: a dna sequence assembly program. Genome Res. 1999;9(9):868–77.

    Article  CAS  Google Scholar 

  • Lee K, Agrawal A, Choudhary A. Real-time disease surveillance using twitter data: demonstration on flu and cancer. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD); 2013.p. 1474–77.

    Google Scholar 

  • Lee K, Agrawal A, Choudhary A. Mining social media streams to improve public health allergy surveillance. In: Proceedings of IEEE/ACM international conference on Social Networks Analysis and Mining (ASONAM); 2015.p. 815–22.

    Google Scholar 

  • Magill SS, Edwards JR, Bamberg W, Beldavs ZG, Dumyati G, Kainer MA, Lynfield R, Maloney M, McAllister-Hollod L, Nadle J, Ray SM, Thompson DL, Wilson LE, Fridkin SK. Multistate point-prevalence survey of health care-associated infections. N Engl J Med. 2014;370(13):1198–208.

    Article  CAS  Google Scholar 

  • Marx V. Biology: the big challenges of big data. Nature. 2013;498(7453):255–60.

    Article  CAS  Google Scholar 

  • Mathias JS, Agrawal A, Feinglass J, Cooper AJ, Baker DW, Choudhary A. Development of a 5 year life expectancy index in older adults using predictive mining of electronic health record data. J Am Med Inform Assoc. 2013;20:e118–24. JSM and AA are co-first authors.

    Article  Google Scholar 

  • Misra S, Agrawal A, Liao W-k, Choudhary A. Anatomy of a hash-based long read sequence mapping algorithm for next generation dna sequencing. Bioinformatics. 2011;27(2):189–95.

    Article  CAS  Google Scholar 

  • ODriscoll A, Daugelaite J, Sleator RD. Big data, hadoop and cloud computing in genomics. J Biomed Inform. 2013;46(5):774–81.

    Article  Google Scholar 

  • Ries LAG, Eisner MP. Cancer of the lung. In: Ries LAG, Young JL, Keel GE, Eisner MP, Lin YD, Horner M-J, eds. SEER survival monograph: Cancer survival among adults: U.S. SEER program, 1988–2001, Patient and Tumor Characteristics. NIH Pub. No. 07–6215. Bethesda, Md: National Cancer Institute, SEER Program; 2007:73–80.

    Google Scholar 

  • SEER, Surveillance, epidemiology, and end results (seer) program (www.seer.cancer.gov) limited-use data (1973–2006). National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch; 2008. Released April 2009, based on the November 2008 submission.

  • Xie Y, Honbo D, Choudhary A, Zhang K, Cheng Y, Agrawal A. Voxsup: a social engagement framework. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD) (Demo paper). ACM; 2012. p. 1556–9.

    Google Scholar 

  • Xie Y, Chen Z, Zhang K, Cheng Y, Honbo DK, Agrawal A, Choudhary A. Muses: a multilingual sentiment elicitation system for social media data. IEEE Intell Syst. 2013a;99:1541–672.

    Google Scholar 

  • Xie Y, Chen Z, Cheng Y, Zhang K, Agrawal A, WK Liao, Choudhary A. Detecting and tracking disease outbreaks by mining social media data. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI); 2013b.p. 2958–60.

    Google Scholar 

  • Xie Y, Palsetia D, Trajcevski G, Agrawal A, Choudhary A. Silverback: scalable association mining for temporal data in columnar probabilistic databases. In: Proceedings of 30th IEEE International Conference on Data Engineering (ICDE), Industrial and Applications Track; 2014. p. 1072–83.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ankit Agrawal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Agrawal, A., Choudhary, A. (2019). Health Services Data: Big Data Analytics for Deriving Predictive Healthcare Insights. In: Levy, A., Goring, S., Gatsonis, C., Sobolev, B., van Ginneken, E., Busse, R. (eds) Health Services Evaluation. Health Services Research. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-8715-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8715-3_2

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-8714-6

  • Online ISBN: 978-1-4939-8715-3

  • eBook Packages: MedicineReference Module Medicine

Publish with us

Policies and ethics