State of the art review: the data revolution in critical care
- 12k Downloads
This article is one of ten reviews selected from the Annual Update in Intensive Care and Emergency Medicine 2015 and co-published as a series in Critical Care. Other articles in the series can be found online at http://ccforum.com/series/annualupdate2015. Further information about the Annual Update in Intensive Care and Emergency Medicine is available from http://www.springer.com/series/8901.
KeywordsIntensive Care Unit Concept Drift Beth Israel Deaconess Medical Mortality Probability Model Data Explosion
Acute physiology and chronic health evaluation
Area under the curve
Centers for medicare and medicaid services
Dynamic clinical data mining
Electronic medical record
Intensive care unit
Length of stay
Mortality probability model
Red blood cell
Randomized controlled trial
Simplified acute physiology score
Many recent articles highlight the data revolution in healthcare, an offshoot of the vast amount of digital medical information that has now accumulated in electronic medical records (EMRs), and present it as an opportunity to create a ‘learning healthcare system’. The generally proposed vision is for a population data-driven knowledge system that generalizes from every patient’s life, disease and treatment experiences to impute the best course of action for diagnosis, prognosis and treatment of future patients.
There have also been many articles focusing on the risk that naïve use of Big Data (or data in general) poses. As stated by Zak Kohane of Harvard Medical School, Big Data in healthcare cannot be a simple, blind application of black-box techniques: “You really need to know something about medicine. If statistics lie, then Big Data can lie in a very, very big way” .
This paper will discuss the general issue of data in critical care with a focus on the Big Data phenomenon that is sweeping healthcare. With the vast amount of digital medical information that has accumulated in EMRs, the challenge is the transformation of the copious data into usable and useful medical knowledge.
The bottom line is that pertinent quality data add tremendous value, which accounts for their ‘unreasonable effectiveness’. There is no way to minimize undesirable variability in practice without the data to substantiate the standardization. The volume and variety of increasingly available Big Data can allow us to interrogate clinical practice variation, personalize the risk-benefit score for every test and intervention, discover new knowledge to understand disease mechanisms, and optimize processes such as medical decision making, triage and resource allocation. Clinical data have been notorious for their variable interoperability and quality, but a holistic use of the massive data sources available (vital signs, clinical notes, laboratory results, treatments including medications and procedures) can lead to new perspectives on challenging problems. While the wetware of the human mind is a wonderful instrument for this purpose, we must design better data systems to support and improve those components of this data integration process that exceed human abilities .
Data in critical care
Critical care environments are intense by definition. Decisions in the intensive care unit (ICU) are frequently made in the setting of a high degree of uncertainty, and clinical staff may have only minutes or even seconds to make those decisions. The increasing need for intensive care has spiked the ratio of ICU beds to hospital beds as the ICU plays an expanding role in acute hospital care . But the value of many treatments and interventions in the ICU is unproven, with many standard treatments being ineffective, minimally effective, questionably effective, or even harmful to the patient . In a setting where the effects of every intervention are subject to patient and clinical context-specific factors, the ability to use data for decision support becomes very attractive and closer to essential as increasing complexity transcends typical cognitive capabilities.
A comparison of intensive care unit (ICU) scoring systems (from  with permission)
ICU scoring system
Timing of data collected
Other required data
Total data elements required
Original reported mortality prediction performance
Prior to and within 1 hour of ICU admission
Age, six chronic health variables, ICU admission diagnosis, ICU admission source, LOS prior to ICU admission, emergency surgery, infection on admission, four variables for surgery type
AUC = 84.8% (n = 16,784)
First ICU day (16–32 h depending on time of admission)
Age, six chronic health variables, ICU admission diagnosis, ICU admission source, LOS prior to ICU admission, emergency surgery, thrombolytic therapy, FiO2, mechanical ventilation
AUC = 88.0% (n = 52,647)
Prior to and within 1 hour of ICU admission
Age, three chronic health variables, five acute diagnosis variables, admission type (e. g., medical-surgical) and emergency surgery, CPR within 1 h of ICU admission, mechanical ventilation, code status
AUC = 82.3% (n = 50,307)
In practice, clinical prediction must be motivated by the needs of clinical staff, and this must be driven in large part by perceived utility and an increase in technical comfort amongst clinicians. Some of the biggest opportunities for Big Data to make practical gains quickly are focused on the most expensive parts of current clinical practice: Reliable, predictive alerting and retrospective reporting analytics for high-cost patients, readmissions, triage, clinical decompensation, adverse events, and treatment optimization for diseases affecting multiple organ systems .
RCTs are the gold-standard for clinical knowledge discovery. But 65 years after the first RCT was published, only 10–20% of medical decisions are based on RCT-supported evidence . When examining the validity of a variety of medical interventions, about half of systematic reviews report insufficient evidence to support the intervention in question. Most treatment comparisons of clinical interest have actually never been addressed by an RCT . The reality is that the exponential combinations of patients, conditions and treatments cannot be exhaustively explored by RCTs due to the large cost of adding even small numbers of patients. Furthermore, the process of performing RCTs often intentionally or inadvertently excludes groups of patients, such as those with particular co-morbidities or medications, or of certain ages or ethnic groups. Thus, when trying to make a real decision under practice conditions, the RCT conclusions may simply not be applicable to the patient and situation in hand. This was the driver for the concept of DCDM in which the user of an EMR would be automatically presented with prior interventions and outcomes of similar patients to support what would otherwise be a completely subjective decision (see below).
Recent observational studies on the MIMIC ICU database have yielded many interesting findings. These include the heterogeneity of treatment effect of red blood cell (RBC) transfusion , the impact of pre-admission selective serotonin reuptake inhibitors on mortality in the ICU , the interplay between clinical notes and structured data on mortality prediction , optimization of heparin dosing to minimize the probability of over- and under-anticoagulation , long-term outcomes of minor troponin elevations in the ICU  and the association between serum magnesium and blood pressure in the critically ill , to name a few. But these observations may be specific to the Beth Israel Deaconess Medical Center and need to be validated using databases from other institutions.
Others have examined institution-specific databases, and these studies have yielded findings that have been translated into practice: A recent study at Seattle Children’s compared a wide range of performance metrics and translated results into prioritized departmental and enterprise-wide improvements .
Celi, Zimolzak and Stone described an operational vision for a digitally based, generalized decision support system that they termed “Dynamic Clinical Data Mining” . The proposed system aggregates individual patient electronic health data in the course of care; queries a universal, de-identified clinical database using modified search engine technology in real time; identifies prior cases of sufficient similarity as to be instructive to the case at hand; and populates the individual patient’s EMR with pertinent decision support material such as suggested interventions and prognosis, based on prior treatments and outcomes (Figure 3).
Industry is also taking note. National pharmaceutical benefits manager, Express Scripts, can predict which patients may fail to take their medication 12 months in advance, with an accuracy rate of 98% ; IBM is modifying their famed Watson system (in tight collaboration with clinicians) for predicting different types of cancer . 23andMe’s database has already been used to find unknown genetic markers for Parkinson’s disease  and myopia , and their acquisition of $1.3 million in National Institute of Health funding has shown additional confidence in their goals .
The open data movement and medicine
More recently, the open data movement has been quietly sweeping almost every industry, including the specialized domain of healthcare. It calls for data sharing, and by its very nature, requires a degree of accountability as well as collaboration across disciplines never seen before. At the forefront of the open data movement in healthcare is the pharmaceutical industry. In October 2012, GlaxoSmithKline (GSK) announced that it would make detailed data from its clinical trials widely available to researchers outside its own walls, stunning the scientific community . For a company that spends $6.5 billion a year on research and development, it was a sharp turn away from a historic system of data secrecy. In May 2013, the company began posting its own data online. It then invited others to join ClinicalStudyDataRequest.com , where GSK and six other drug makers have already uploaded data from nearly 900 clinical trials. The following month, the medical device company, Medtronic, teamed up with Yale University and shared its clinical trials data through the Yale University Open Access Data (YODA) Project .
Hackathons are large-scale events that contemporaneously bring together (physically and/or by teleconferencing) large groups of qualified individuals to collectively contribute their expertise towards a common problem set . Crowdsourcing also focuses large groups of qualified individuals towards a common problem, but allows those individuals to do so asynchronously and in a mobile manner using phones, tablets, laptops and other devices to contribute from any location. With such tools, individual clinical encounters no longer have to be experienced in a silo-like fashion. The clinical ‘crowd’ can be leveraged to form a ‘data substrate’ available freely to clinicians and data scientists . This amalgamation of individual knowledge should allow each clinician to address gaps in their knowledge, with the confidence that their decisions are supported by evidence in clinical practice.
In January 2014, the inaugural Critical Data Marathon and Conference was held at the Massachusetts Institute of Technology . In the data marathon, physicians, nurses and pharmacists were paired with data scientists and engineers, and encouraged to investigate a variety of clinical questions that arise in the ICU. Over a 2-day period, more than 150 attendees began to answer questions, such as whether acetaminophen should be used to control fever in critically ill patients, and what the optimal blood pressure goal should be among patients with severe infection. This event fostered collaboration between clinicians and data scientists that will support ongoing research in the ICU setting. The associated Critical Data Conference addressed growing concerns that Big Data will only augment the problem of unreliable research. Thought leaders from academia, government and industry across disciplines including clinical medicine, computer science, public health, informatics, biomedical research, health technology, statistics and epidemiology gathered and discussed the pitfalls and challenges of Big Data in healthcare. The consensus seemed to be that success will require systematized and fully transparent data interrogation, where data and methods are freely shared among different groups of investigators addressing the same or similar questions . The added accuracy of the scientific findings is only one of the benefits of the systematization of the open data movement. Another will be the opportunity afforded to individuals of every educational level and area of expertise to contribute to science.
From a broader analysis of Big Data, we can try to understand larger patterns by comparing the strength of many signals in large populations. Larger data sets must also herald the advance of shared data sets. There is a critical need for collaborative research amongst many groups that explore similar questions. The association between data sharing and increased citation rate , and an increasing commitment by companies, funding agencies and investigators to more widely share clinical research data  point to the feasibility of this move. The prospect of using Big Data in an open environment may sound overwhelming, but there have been key steps to encourage this cultural transformation. For example, the Centers for Medicare and Medicaid Services (CMS) have begun to share data with providers and states . As the largest single payer for health care in the United States, CMS has used its vast store of data to track hospital readmission rates in the Medicare program (importantly finding a rapid decline in readmission rates in 2012 and 2013), and combat Medicare fraud (in its first year the system stopped, prevented, or identified an estimated $115 million in improper payments).
As large amounts of shared data become available from different geographic and academic sources, there will be the additional benefit from the collection of data from sources with different viewpoints and biases. While individual researchers may not be aware of their own biases or assumptions that may impact reported results, shared exploration of Big Data provides us with an inherent sanity check that has been sorely lacking in many fields.
Big data per se
In a recent analysis of data-driven healthcare by the MIT Technology review, the authors noted that “medicine has entered its data age” . Driven by the promise of an estimated $300 to $450 billion a year , companies of all sizes are beginning to fight in earnest to capture and tame the data explosion. Key innovations fall into three major areas: More and more data, especially resulting from mobile monitoring; better analytics using new machine learning and other techniques; and meaningful recommendations that focus on prediction, description, and prevention of poor health outcomes (that are finally captured in an easily accessible format).
The mass of new data rests primarily in the proprietary hands of large entities like insurance companies and care providers. For example, the genomics company 23andMe is famously creating a huge database of genomic data, moving from over 700,000 records towards their goal of tens of millions . Some countries with centralized healthcare systems like Denmark are also beginning to leverage that accessible data . In addition, smaller companies like WellDoc  and Ginger.io  are beginning to focus on rampant cell-phone penetration to get into the health-data market. Mobile phones can now seamlessly acquire daily patient metrics on meals, exercise, call patterns and other behaviors; WellDoc uses these data to recommend personalized insulin doses based on patients’ daily habits, and Ginger.io monitors patients with mental illnesses for the kinds of actions that might indicate a need for help. Other companies provide physical attachments to mobile devices that enrich the possible data types available: CellScope sells an attachment to support remote otoscopy; AliveCor provides electrocardiogram (EKG) signals; Propeller Health attaches to an inhaler to record pertinent data; and there are a slew of others for nearly every imaginable data need .
Along with Big Data’s promise, there have been warnings of over confidence and disaster, labelled by Lazer et al. as “Big Data hubris” . The warning parable told to illustrate this is Google’s “Flu Trends” . In 2008, Google launched its Flu Trends, which used the search terms typed into Google to track the progression of influenza epidemics over time. However, this approach was subsequently revealed to have suffered from several known data analysis pitfalls (e. g., overfitting and concept drift) so that by 2012–2013, the prevalence of flu was being greatly overestimated. Other oft-cited risks include misleading conclusions derived from spurious associations in increasingly detailed data, and biased collection of data that may make derived hypotheses difficult to validate or generalize .
But avoiding spurious conclusions from data analysis is not a challenge unique to Big Data. A 2012 Nature review of cancer research found reproducibility of findings in only 11% of 53 published papers . There is concern that Big Data will only augment this noise, but using larger datasets actually tends to help with inflated significance, as the estimated effect sizes tend to be much smaller .
The biased collection of data is a non-trivial question. If researchers have large amounts of data that severely oversample certain populations or conditions, their derived hypotheses can be incorrect or at least understandably difficult to validate. The way that current literature is designed, generated, and published creates sequential ‘statistically significant’ discoveries from restricted datasets. It is not uncommon in the scientific literature to get a different story for a variable’s (vitamin E, omega-3, coffee) relationship to outcome (mortality, Alzheimer’s, infant birth-weight) depending on what is adjusted for, or how a population was selected. There is little meaning to exploring the impact of one variable for one outcome: it is the big picture that is meaningful.
The benefits of the data explosion far outweigh the risks for the careful researcher. As target populations subdivide along combinations of comorbid conditions and countless genetic polymorphisms, as diagnostic and monitoring device including wearable sensors become more ubiquitous, and as therapeutic options expand beyond the evaluation of individual interventions including drugs and procedures, it is clear that the traditional approach to knowledge discovery cannot scale to match the exponential growth of medical complexity.
Rather than taking turns hyping and disparaging Big Data, we need organizations and researchers to create methods and processes that address some of our most pressing concerns, e. g., who is in ‘charge’ of shared data, who ‘owns’ clinical data, and how do we best combine heterogeneous and superficially non-interoperable data sources? We need to use Big Data in a different way than we have traditionally used data – collaboratively. By creating a culture of transparency and reproducibility, we can turn the hype over Big Data into big findings.
The article processing fee was funded by the National Institute of Health.
- 1.MIT editors. Business Report: Data-driven Health Care. MIT Technol Rev. 2014;117:1–19.Google Scholar
- 8.APACHE Outcomes. Available at: https://www.cerner.com/Solutions/Hospitals_and_Health_Systems/Critical_Care/APACHE_Outcomes/. Accessed Nov 2014.
- 11.Smith M, Saunders R, Stuckhardt L, McGinnis JM, Committee on the Learning Health Care System in America, Institute of Medicine. Best Care At Lower Cost: The Path To Continuously Learning Health Care In America. Washington: National Academies Press; 2013.Google Scholar
- 17.Velasquez A, Ghassemi M, Szolovits P, et al. Long-term outcomes of minor troponin elevations in the intensive care unit. Anaesth Int Care. 2014;42:356–64.Google Scholar
- 22.The Runaway Cost of Diabetes. Available from: http://lab.express-scripts.com/insights/drug-options/the-runaway-cost-of-diabetes. Accessed Sept 2014.
- 26.23andMe Scientists Receive Approximately $1.4 Million in Funding from the National Institutes of Health. http://mediacenter.23andme.com/press-releases/nih_grant_2014/. Accessed Sept 2014.
- 27.GSK announces further initiatives to advance openness and collaboration to help tackle global health challenges. Available from: http://us.gsk.com/en-us/media/pressreleases/2012/gsk-announces-further-initiatives-to-advance-openness-and-collaboration-tohelp-tackle-global-health-challenges. Accessed Sept 2014.
- 28.Clinical Study Data Request Site. Available from: https://clinicalstudydatarequest.com/ (accessed Nov 2014); 2014.
- 35.Kayyali B, Knott D, Van Kuiken S. The big-data revolution in US health care: Accelerating value and innovation. McKinsey & Company. http://www.mckinsey.com/insights/health_systems_and_services/the_big-data_revolution_in_us_health_care. Accessed Nov 2014; 2013.
- 39.M Health Health and appiness. The Economist Magazine. http://www.economist.com/news/business/21595461-those-pouring-money-health-related-mobile-gadgets-and-apps-believe-they-can-work (Created Feb 1, 2014). Accessed Nov 2014.
- 41.Bishop CM. Pattern Recognition And Machine Learning. New York: Springer; 2006. p. 740.Google Scholar
- 44.Harford T. Big Data: are we making a big mistake. Financial Times Magazine. http://www.ft.com/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html#axzz3TDz4MSnF. Accessed Nov 2014; 2014.
- 47.Mayaud L. Prediction of mortality in septic patients with hypotension. PhD Thesis, Oxford University; 2014Google Scholar
This article is co-published by agreement with Springer-Verlag. Permission for reuse must be sought from the publisher.