The Role of an Artificial Intelligence Ecosystem in Radiology
- 15k Downloads
Moving artificial intelligence tools for diagnostic imaging into routine clinical practice will require cooperation and collaboration between developers, physicians, regulators, and health system administrators. Radiologists can play an important role in promoting this AI ecosystem by delineating AI use cases for diagnostic imaging and developing standardized data elements and workflow integration interfaces. Structured AI use cases that define specific parameters for datasets for algorithm training and testing can promote multiple sites to develop training, and validation datasets, which can help ensure algorithms respect technical, geographic, and demographic diversity in patient populations and image acquisition, are free of unintended bias and are generalizable to widespread clinical practice. Medial specialty societies can play a role in protecting patients from unintended consequences of AI through use case development and developing programs for independent algorithm validation and monitoring the effectiveness and safety of AI tools in clinical practice through AI registries. The development and implementation of AI algorithms for medical imaging will benefit from the establishment of an AI ecosystem that includes physicians, researchers, software developers along with governmental regulatory agencies, the HIT industry, and hospital administrators all working to bring AI tools safely and efficiently into clinical practice.
KeywordsArtificial intelligence ecosystem Artificial intelligence use case Artificial intelligence government regulation Artificial intelligence data registries Artificial intelligence common data elements
19.1 Defining Business Ecosystems
New technologies should address a significant clinical need.
Technology must perform at least as well as the existing standard approach, i.e., demonstration of statistical non-inferiority.
Substantial clinical testing must validate the new technology under the wide range of clinical situations and varying patient populations.
New technology should provide improvements in patient outcomes, patient quality of life, practicality in use, and reduced medical system costs.
Achieving these goals will require a coordinated approach between multiple stakeholders to move safe and effective AI tools into clinical practice; defining and developing a cohesive artificial intelligence ecosystem will facilitate AI implementation into clinical practice.
An economic community supported by a foundation of interacting organizations and individuals—the organisms of the business world. The economic community produces goods and services of value to customers, who are themselves members of the ecosystem. The member organisms also include suppliers, lead producers, competitors, and other stakeholders. 
In both of these definitions, defining a community of oftentimes disparate stakeholders and understanding the role each play are critical to the success of the community as a whole. Nowhere in business is the term ecosystem more applicable than in the technology and software development industries. In their book, Software Ecosystem, Messerschmitt and Clemmons define the community for software development around six groups: users, software developers, managers, industrialists, policy experts and lawyers, and economists . At the beginning, end users of the software products must define what it is that they want the software to accomplish for them. Software developers and engineers then translate the users’ needs to program code, and then a group of managers must coordinate resources to bring the software product into the end users’ workflow. Companies must be formed to mass distribute the software product, and policy experts and legal teams must ensure there are no legal or other barriers to software implementation. Economists then offer insights into how the software market works. In modern terms, software developers also find themselves within a sub-ecosystem where the software they are writing is being built on top of platforms such as high-level coding languages and operating systems or below platforms such as web pages where their software outputs are designed to be inputs consumed by other software products. In almost all cases, the final software product employed by end users is not the code written by the developer but the results of the output of a compiler taking the instructions written by the software developer, which are then converted to lower-level machine-readable code that becomes the program executed by the computer. All of these additional interactions are continually expanding the community within the software development ecosystem .
In addition, the healthcare industry is highly regulated by national and international governing bodies worldwide . Much of this regulation, designed to promote quality and ensure patient safety and privacy, is often not encountered in other fields of software development. Finally, most of healthcare worldwide is not paid for directly by the patients themselves but rather by governmental or other third-party payers such as commercial insurance companies. In the United States, payments to providers aretypically made on a fee-for-service basis; however, there is a growing percentage of the population covered under alternate payment models such as accountable care organizations and other forms of population-based health management. Internationally, many countries have public health systems paid for from tax revenue and furnished at no cost to permanent residents. In these systems, physicians and other providers are paid salaries from the government. Many countries including the United States, have developed a hybrid system of both publicly and privately funded healthcare. However, in the United States, federally funded healthcare programs such as Medicare and Medicaid only cover about 36% of the population, whereas employer-based private insurance plans cover approximately 47% of the population . Although variable internationally, governmental programs cover the vast majority of the population in developed nations.
19.2 Artificial Intelligence Ecosystem for Healthcare and Diagnostic Imaging
However, AI applications for medical imaging will not be limited to image interpretation. AI algorithms will be able to improve patient safety by prioritizing patient imaging worklists and enhancing the communication of critical findings. Imaging protocols can be automated based on information gathered from the electronic health record (EHR) and tailored to optimize radiation exposure . AI could directly optimize the reading radiologist’s experience by mining the EHR for patient data including patient problem lists, clinical notes, laboratory data, pathology reports, vital signs, prior treatments, and prior imaging reports and generating a relevant summary to give the reading radiologist the most pertinent contextual information during the interpretation of a study. Another example of a seemingly simple application would be the optimization of hanging protocols. Hanging protocols are currently often disrupted by sequence and plane acquisition order as well as the order of manual entry into PACS by the radiology technologist. AI could be developed to classify image sequences, planes, and contrast phases, and then place them into the preferred order of the individual radiologist. Artificial intelligence tools will also be able to improve radiology department efficiency by optimizing workflow, automation of structured reporting systems, and improving patient experience by decreasing patient wait times and avoiding missed appointments [13, 14]. For all of this to happen, an artificial intelligence ecosystem specific for diagnostic imaging must be developed and supported by all stakeholders including the medical imaging community, the software development community, and the governmental agencies through regulatory processes providing an appropriate balance between fostering innovation, moving new products to market, and patient safety .
19.3 Defining an Artificial Intelligence Ecosystem in Healthcare with a Focus on Diagnostic Imaging
19.3.1 Establish Realistic Goals
When research in artificial intelligence began in the 1940s and 1950s, the goal was to create all-knowing computers that could ingest the entirety of world knowledge and totally duplicate the cognitive and reasoning activities of humans. This excitement and anticipation lead to the work of Alan Turing in the 1950s resulting in the famous Turing test, which is an assessment of a machine’s ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human . Fueled by the work of Marvin Minsky  and John McCarthy , the expectations that computers would able to mimic the tasks of the human brain were high; however, as it became clear that computer processing power was woefully inadequate to support this research, the number of investigators and interest in AI research began to wane. A second spike in AI entrepreneurialism occurred in the 1980s with early companies like Thinking Machines Corporation, founded by Minsky, were quite profitable through the early 1990s . However, despite having the highest level of processing power in the industry, these companies also failed to develop significant AI products, and enthusiasm over AI again waned. As the third AI Spring continues to gain momentum, the AI community must learn from its predecessors in order to avoid another AI Winter. However, the renewed enthusiasm for AI over a large and diverse number of industries has once again caused expectations for AI to soar. Over the last 10 years, success in these decades-old technologies, such as multilayered neural networks, has been fueled by advances in fast hardware parallel graphical process unit (GPU) computing  allowing training of more and progressively deeper neural networks . The combination of advances in technology and availability of large annotated datasets for testing has given rise to the concept of deep learning on layered neural networks termed convolution neural networks . The use of these and other modern techniques has once again escalated claims about the imminent rise of all knowing computers duplicating the cognitive activity of humans. There are also a number of additional factors that have increased the enthusiasm for AI in healthcare. Previously there were few ways applications of artificial intelligence touched the daily lives of the population in general. However, developments such as self-driving cars, mechanized manufacturing robotics, and wearable personal health monitors are paving the way for broader acceptance and applications of AI in healthcare with the ability to not only better the lives of the population as a whole but also to impact the lives of individuals . Finally, informatics solutions that can bend the cost curve in healthcare will be readily accepted as the cost of healthcare continues to rise (Fig. 19.4).
If you work as a radiologist you are like Wile E. Coyote in the cartoon. You’re already over the edge of the cliff, but you haven’t yet looked down. It’s just completely obvious that in five years deep learning is going to do better than radiologists. It might be ten years. We should stop training radiologists now. 
These are just a few examples of where some in the informatics community continue to overpromise. At this point, there is nothing to suggest that artificial intelligence will replace physicians, cure cancer, or even prolong life expectancy, but to ensure AI algorithms that will help physicians provide better patient care are adopted into clinical practice, developers should focus on specific narrow use cases with readily defined and achievable outcomes. Algorithm training and validation must occur using methods ensuring the results of the algorithm will demonstrate interoperability in widespread clinical practice, and physicians and other end users must be able to understand how the algorithm reached its conclusions in order for the efficacy of the algorithm inferences to be evaluated and communicated on a patient by patient basis.
19.3.2 Maintain a Targeted Focus
Increasing computing power and modern AI techniques such as deep neural networks have increased the ability to rapidly develop specific uses for AI that can be implemented into physician workflows. In order for AI to be successful in healthcare and medical imaging, development should continue to be focused on producing high-quality, clinically useful AI use cases where algorithms can be trained on high-quality structured data in order to assist radiologists solve specific problems.
Although a detailed discussion of the application of specific artificial intelligence techniques for medical imaging is beyond the scope of this chapter, it is important to understand some of the ways AI inference models for medical imaging will be created to inform how an AI ecosystem for medical imaging can support the development of robust AI tools for the medical imaging community. Built on a foundation of artificial neural networks, deep learning is emerging as the predominant tool for artificial intelligence applications in healthcare . Machine learning has traditionally been divided into three categories: supervised learning, reinforced learning, and unsupervised learning . In supervised learning, the goal of the machine learning algorithm is a known output, and the algorithm has been given data that has been labeled with a certain outcome or diagnosis. A widespread, familiar application of supervised learning in healthcare is the automated interpretation of an electrocardiogram to determine the presence or absence of a myocardial infarction. Examples of supervised learning in diagnostic imaging from recent machine learning competitions include lung nodule detection  and pediatric bone age determination , but the potential number of applications for these narrow AI models for segmentation, detection, quantification, classification, workflow improvements, and risk assessment is almost endless. Unlike supervised learning, reinforced learning models are not presented with a set of predetermined input/output pairs. In reinforced learning, the model determines the most effective pathway toward a goal by being rewarded for choosing different sets of actions. The system is rewarded when it achieves a certain outcome and then finds the path to the highest reward . In unsupervised learning, machine learning models are given data that has not been labeled with a specific outcome, and there are no specific outputs to predict. Instead, the model separates input source data into naturally occurring groups or patterns based on the data. While both unsupervised learning and general AI will inevitably be applied to medical imaging using untagged or only loosely tagged training data, currently, unsupervised learning is best used for clustering, feature extraction or dimensionality, and variable reduction in the analysis of large datasets. One specific application of unsupervised learning in healthcare will be in advancing precision medicine initiatives focused on the various omics-based strategies including radiomics, genomics, proteomics, metabolomics, and epigenomics. These data patterns may be able to subdivide patients into prognostic categories and moreover may predict whether an individual patient would respond to various therapies. Particularly, these various omics-based strategies show promise within the domains of oncology and autoimmune conditions in predicting whether an individual patient would benefit or not from the various emerging targeted agents .
19.3.3 Use High-Quality Data for Training and Testing
In creating datasets for training AI algorithms, a robust source of accurate information, often referred to as “ground truth,” is required for the training data. In supervised learning, the AI algorithms are trained on known cases. The source of this ground truth can come from a variety of sources but typically includes carefully annotated datasets done by expert radiologists and should be explicitly stated for each AI model. Other possibilities for establishing ground truth include pathology results or specific clinical outcomes . While using high-quality data for algorithm training data is critical in order for algorithms to be effective, the datasets used for algorithm training data must also be diverse. Tremendous variability in the methods of diagnostic imaging such as equipment manufacturer, field strength in magnetic resonance imaging, imaging protocols, and radiation dose in computed tomography exists from institution to institution, and it cannot be assumed that AI models developed by training algorithms on data from a single institution will be effective more broadly. This problem is broadly characterized within the software development ecosystem as the problem of generalizability. Therefore, in bringing applications to market, the technical diversity of the training datasets must be considered. Additionally, patients are diverse as well, and the patient populations are likely to be quite different from institution to institution. In addition to general geographic diversity, patient populations from institution to institution may be variable due to race, gender, socioeconomic background, body habitus, and prevalence of disease processes. Recent reports indicate facial recognition algorithms demonstrate considerable variability in accuracy based on skin color and highlight potential sources of bias in algorithm development . Both developers and consumers of AI applications in healthcare, and diagnostic imaging in particular, must be cognizant of the broad diversity in patient populations so that there will be similar diversity in training data so that algorithms will be free of unintended bias.
While there is critical need to provide high-quality, technically, and geographically diverse data to developers for testing and training, patient privacy must be carefully maintained. In the United States, the Health Insurance Portability and Accountability Act of 1996 (HIPAA)  required the Secretary of the US Department of Health and Human Services (HHS) to develop regulations protecting the privacy and security of certain health information. The HIPAA Privacy Rule  defines standards and safeguards that protect patients’ health records as well as personal health information that apply to all healthcare providers, insurers, and other healthcare entities. The rule sets limits and conditions on the uses and disclosures that may be made of such information without patient authorization. The HIPAA Security Rule  establishes national standards to protect individuals’ electronic personal health information that is created, received, used, or maintained by a covered entity. The Security Rule requires “appropriate administrative, physical and technical safeguards” to protect the confidentiality, integrity, and security of electronic patient information. While the details of data privacy and other ethical considerations are beyond the scope of this chapter, it is clear that the ethical issues around data ownership, robustness of deidentification algorithms, and transparency in how patient information is shared with AI researchers and developers will play a crucial role in the development of a robust AI ecosystem [36, 37].
19.3.4 Develop Consistent Methods for Validation and Monitoring Algorithm Performance
While algorithms can be developed and used in single institutions without regulatory approval, in order to bringing new AI tools to widespread clinical use, developers will have to develop methods that ensure their products are generalizable to and reproducible in the wide variety of practice settings and patient populations that exist in the healthcare system. Inevitably some degree of governmental regulation for each algorithm will be necessary for AI to become broadly adopted. Algorithm validation standards must be developed that ensure that algorithms produce consistent results across the broad range of technical, geographic, and patient population diversity seen in clinical practice. Developers must be able to show that their algorithms can achieve the expected results on novel and diverse datasets, and there should be standardized statistical methods for comparing various algorithms that purport to have a similar function. Considering the thousands of algorithms that will likely be developed, governmental regulatory agencies could become overwhelmed further slowing the deployment of AI algorithms into clinical practice. An important role of the AI ecosystem for radiology will be to develop methods that support the validation of AI algorithms that can efficiently move AI products to market while ensuring patient safety. Establishing public-private partnerships between regulatory agencies and honest broker private groups, such as medical specialty societies, could play an important role in validation of AI algorithms.
However, regulatory approval of AI algorithms need not be entirely based on a premarket approval process. If methods can be developed that provide monitoring of the algorithms performance after deployment in community practice, these data can be used not only to ensure algorithms function as expected but also to provide information back to developers so that the algorithms can be improved. In radiology, these data should include not only information about the utility of the algorithm based on the radiologist input but also metadata about patient characteristics and technical parameters of the examination so that poor algorithm performance can be correlated with specifics about the examination.
19.3.5 Build Public-Private Partnerships for Safety and Efficacy
In the United States, the Food and Drug Administration (FDA) is charged with protecting the public health by “ensuring the safety, efficacy and security” of a wide range of healthcare products including medical devices . As software has begun to play an increasingly important role in medical device technology, the US FDA’s Center for Devices and Radiological Health has assumed a primary role in developing pathways for premarket review of medical device AI algorithms . As a participant in the International Medical Device Regulators Forum (IMDRF)—a group of medical device regulators from around the world working to reach harmonization on medical device regulation—the US FDA has chaired IMDRF’s Software as a Medical Device Working Group, which is developing guidance to support innovation and timely access to safe and effective “Software as a Medical Device” (SaMD) globally . SaMD, defined as “software intended to be used for one or more medical purposes that perform these purposes without being part of a hardware medical device,” has unique issues that make it worthy for consideration of its own regulatory approval processes [41, 42]. The FDA is working with the International Medical Device Regulators Forum  to ensure the US guidance on SaMD encompasses global issues around the regulation of software for medical purposes.
In the meantime, the US FDA has worked on several initiatives that will likely impact the regulation of AI products in the United States. In August 2017, the US FDA proposed the Medical Device Development Tools (MDDT) program , which is a pathway where the US FDA can qualify tools that medical device sponsors can use in the development and evaluation of medical devices. Qualification means that the FDA has evaluated the tool and has determined that the tool “produces scientifically-plausible measurements and works as intended within the specified context of use” . FDA anticipates these tools, which can be developed by sponsors or private entities, will be useful in the approval process for AI algorithms and other SaMD.
Another US FDA program, the National Evaluation System for Health Technology (NEST) is intended to shorten the time to market for new technology healthcare products by developing a system for more robust post-market surveillance . The US FDA NEST strives to generate better evidence for medical device evaluation more efficiently and enhance regulatory decision-making across the total product lifecycle of medical devices by “strategically and systematically leveraging real-world evidence and applying advanced analytics to data tailored to the unique data needs and innovation cycles of medical devices” . Stated goals include moving medical devices to market more quickly, improving the ability to detect safety issues by moving to more active surveillance and to “efficiently harness data from the diverse set of real-world evidence—digital information collected from clinical experience in registries and similar tools—creating the necessary infrastructure for a national evaluation system for medical devices” [46, 47]. The US FDA believes that the NEST program, “by leveraging real world data collected as part of routine clinical care, our nation and the public will more fully realize the potential of the digital revolution for the device space” [46, 47].
The US FDA NEST program has established a number of demonstration projects to provide proof of concept for scalable approaches to generate safety and efficacy data across the entire medical device product life cycle using real-world evidence. These projects include methods to develop, verify, and operationalize methods of evidence generation and data use in the pre- and post-market space and demonstrate scalability across healthcare systems, device types, and manufacturers . The NEST Coordinating Center (NESTcc) has chosen Lung-RADS Assist: Advanced Radiology Guidance, Reporting and Monitoring as one of their early demonstration projects for artificial intelligence algorithms. This project, sponsored by the American College of Radiology Data Science Institute (ACR DSI), is a method for validating and monitoring artificial intelligence algorithms built for detection and classification of lung nodules in lung cancer screening programs according to the ACR Lung-RADS classification system. The demonstration will use real-world data to assess the end-to-end workflow from deployment of an AI algorithm in a radiology reporting system through capture of performance metrics within a national registry . This example of a public-private partnership may serve as a model for how AI algorithms can be monitored in clinical practice to ensure ongoing patient safely while establishing a pathway to increase the efficiency of the US FDA premarket review process.
Finally, the US FDA has also been working to develop and pilot the “Software Precertification Program” which focuses on the assessment of organizations that perform high-quality software design, testing, and monitoring based on demonstration of a “culture of quality and organizational excellence and a commitment to monitor ongoing performance” [49, 50]. The Software Precertification Program is envisioned to evaluate a developer’s capability to respond to real-world performance and provide qualified developers with a more efficient premarket regulatory pathway for certain SaMD applications. SaMD developers would need to establish mechanisms for AI algorithm validation and post-market surveillance, and the program is expected to be synergistic with the US FDA MDDT and US FDA NEST programs.
While these US FDA programs are planned for the future, a number of solutions leveraging artificial intelligence algorithms have obtained premarket US FDA approval using the current US FDA processes. The US FDA classifies and regulates medical devices based on the degree of risk to the public with the least risky Class I devices subject to the lowest level of regulatory controls and Class III devices subject to the highest level of regulatory controls. Class I devices include simple medical supplies such as gloves. Class II devices include CT scanners and other radiological equipment, and Class III devices include intravascular balloon catheters and stents . Class I devices and certain Class II medical devices do not require formal premarket notification or 510(k), but the vast majority of Class II devices require premarket notification, also called 510(k). The 510(k) clearance process is the path to market for the majority of medical devices but requires that the device be substantially equivalent to a legally marketed device termed a “predicate” by the US FDA. Class III devices require a more robust premarket approval process than a 510(k) clearance. This approval process typically requires the sponsor submits clinical data showing reasonable safety and efficacy of the medical device . Medical devices with no legally marketed substantially equivalent predicate would be automatically classified as Class III regardless of risk; however, the US FDA has recently revamped the de novo request process that allows a developer of a low-to-moderate risk device without a predicate to submit a request to the US FDA to make a risk-based classification of the device into Class I or II, without first submitting a 510(k) and receiving a not substantially equivalent (NSE) determination. Once a device is cleared under the de novo process, this device may then serve as a predicate for 510(k) premarket approval of similar devices in the future . A number of US FDA approvals for artificial intelligence software have been granted based on this de novo process [53, 54, 55].
The US FDA also classifies computer software intended for lesion detection and diagnosis. The computer-aided detection (CADe) is defined as “computerized systems that incorporate pattern recognition and data analysis capabilities intended to identify, mark, highlight or in any other manner direct attention to portions of an image, or aspects of radiology device data, that may reveal abnormalities during interpretation of patient radiology images or patient radiology device data by the intended user” [56, 57]. Computer-aided diagnosis (CADx) is defined by the FDA as “computerized systems intended to provide information beyond identifying, marking, highlighting or in any other manner directing attention to portions of an image, or aspects of radiology device data, that may reveal abnormalities during interpretation of patient radiology images or patient radiology device data by the clinician.” Both CADe and CADx utilize highly complex algorithms. A primary distinction between CADe and CADx is that CADe is intended as merely an adjunct detection tool for the radiologist who, per device labeling, is expected to fully review the images and not rely on the software. Although initially regulated as Class III, more recently FDA has approved CADx under its 510(k) process. Because of the relatively higher risk associated with CADx, FDA has previously been slower to move CADx toward the 510(k) process.
However, on July 19, 2017, the US FDA granted developer QuantX de novo approval and Class II status to computer-aided diagnosis software (CADx) for breast cancer detection . This appears to represent a relaxation of the US FDA’s premarket approval process requirements. CADx software and may become the basis for clearance for some artificial intelligence applications.
Although these US regulatory programs seem somewhat disjointed, in all of its activities, the US FDA seems to be working to streamline the review process for artificial intelligence applications in healthcare, and they are demonstrating a high level of cooperation with international regulatory bodies. However, even with the streamlined premarket processes described above, developers will still need to demonstrate efficacy, patient safety, and a process for post-market surveillance of ongoing effectiveness using real-world data. Regulatory agencies are ill-equipped to perform these tasks internally. Additionally, the sheer number of algorithms that will likely be submitted for regulatory approval could place considerable burdens on the regulatory reviews process in the United States and internationally as well. Public-private partnerships between regulatory agencies and trusted organizations such as medical specialty societies can facilitate both the premarket review and the collection of real-world evidence that support ongoing efficacy and safety of AI algorithms in clinical practice.
19.3.6 Establish Standards for Interoperability and Pathways for Integration into Clinical Workflows
Imaging 3.0 informatics tools promote appropriate use of imaging services, the use of structured reporting so that critical data can be easily extracted from imaging reports, clinical decision support for radiologist interpretation, image sharing solutions to provide access to patient electronic access images within the enterprise and across sites, and communication enhancements using registry reporting to benchmark patient radiation exposure, patient outcomes, and quality assessment. Artificial intelligence algorithms are poised to become radiology professionals’ next important Imaging 3.0 informatics tool and will continue to increase radiologists’ value to patients and their health systems.
Just as with the informatics tools for Imaging 3.0, in order for radiologists to effectively use artificial intelligence algorithms in routine clinical practice, developers must pay careful attention to how algorithms will capture data for analysis and how output from the algorithms will integrate back into the clinical workflow. Seamless interoperability with the healthcare systems’ numerous electronic resources will be necessary for optimal clinical integration. Inputs for the algorithm may come from data from the imaging modalities, the picture archiving and communication systems (PACS), the electronic health record (EHR), and an array of data sources including pathological information, radiology information systems, patient monitoring systems, or even wearable health monitoring devices. Standard interfaces must be developed so that similar algorithms can import this information in the same way, and proprietary solutions must be avoided. For instance, it is inevitable that in robust clinical use, radiology departments will be using innumerable algorithms for a wide variety of clinical applications. Some may run on premises, and in those instances, it would be much more efficient for algorithms with similar hardware requirements to run on the same on-premises server and acquire input data using the same interfaces. Cloud-based solutions, even if developer specific, will also benefit from standardized input interfaces, and the developer community should work in concert with physicians and the health information technology (HIT) industry to set interoperability standards for these interfaces. By developing standardized methods for communications between platforms, different vendors can focus various different tool development areas within an infrastructure that allows them all to connect together ultimately giving the physicians and other end users access to a wider array of solutions without concern for compatibility. For instance, the Logical Observation Identifiers Names and Codes (LOINC), developed by the Regenstfrief Institute, in the mid-1990s, is a universal standard endorsed by the American Clinical Laboratory Association and the College of American College of Pathologists. It also contains a database of standard terms and has been expanded to include nursing diagnoses, interventions, and outcome classifications .
Equally important will be standardization of output interfaces. Radiologists, referring physicians, and other providers use an array of electronic resources throughout the imaging cycle. Output from AI algorithms will eventually interface with existing clinical decision support tools for selecting the most appropriate radiological examination as well as existing decision support tools for radiologist interpretation. Standardized interfaces for algorithm output into PACS worklists and at the modality will be necessary as well, and for optimal workflow integration, artificial intelligence algorithms will have to seamlessly interface with all of these resources. Developing open sources for coding and standardized interfaces for data transfer will ultimately affect the entire health information technology ecosystem, and developers of AI applications must avoid proprietary interfaces. An example of an open-source interface for bringing evidence-based guidelines to the point of care is the American College of Radiology’s Computer Assisted Reporting Data Science (CARDS) platform . The CARDS authoring and reporting system includes a definition format for representing radiology clinical guidelines as structured, machine-readable Extensible Markup Language (XML) documents with a user-friendly reference implementation to test the computer language with the clinical guideline. The CARDS output has open-source standards for delivering the CARDS output to voice recognition software (VRS) platforms.
There will be numerous other electronic resources in healthcare that developers must consider, and interoperable data transfer standards are critical. Communications with PACS and imaging modalities must include interfaces with the Digital Imaging and Communications in Medicine (DICOM) which is the standard for storing and transmitting medical images. The DICOM communication standards, developed through a collaboration between the American College of Radiology and National Electrical Manufacturers Association (NEMA), facilitate the integration of medical imaging devices such as scanners, servers, workstations, printers, network hardware, and PACS from multiple manufacturers . However, this standard is generally limited to use within radiology, and as AI evolves, other mechanisms for data transfer must be considered to allow input of patient information from sources outside radiology. Additionally, as AI evolves in other specialties in medicine such as pathology, ophthalmology, and dermatology, expanding image digitalization and transfer standards to other areas in the healthcare system will be necessary so that outputs from AI algorithms can interface with these resources as well.
AI algorithms will also be expected to interface with electronic health records and other primarily text-based systems. Data transfer in these systems is predominantly via Health Level Seven (HL7) protocols which are designed to facilitate “the exchange, integration, sharing, and retrieval of electronic health information that supports clinical practice and the management, delivery and evaluation of health services” . More recently, the Fast Healthcare Interoperability Resource (FHIR)  is showing tremendous promise for joining disparate systems together. FHIR’s resource-based modular components allow the development of an application-based approach to interoperability and health information exchange. FHIR supports interoperability over a variety of architectures including representational state transfer (REST), messaging, documents, and services. FHIR has the ability to be used over a variety of platforms including cloud communications, EHR data sharing, radiology information systems (RIS), server communications, and mobile platforms, among others . Artificial intelligence interfaces will need to be cognizant of these communication platforms to optimize input from and output to the patients’ health records outside of the radiology department.
Another requirement for interoperability and clinical integration of artificial intelligence algorithms will be the development of pathways to increase the production of more structured data in our health systems in general and more specifically in radiological reporting. Narrative radiological reports, designed for human consumption, contain a wealth of information that, if extractable by automated systems, will be invaluable not only for the clinical care of that specific patient but also useful for clinical quality improvement activities, population health management, and research . The creation of common data elements, which define the attributes and allowable values of a unit of information, are “data elements that are collected and stored uniformly across institutions and studies and are defined in a data dictionary” [67, 68]. CDEs allow machine-readable representation of imaging findings including anatomic location and dimensions and can store computational features including density and textural metrics. CDEs allow reports to be built from tiny collections of information that contain not only words but also context, meaning, and relationships [67, 68]. In order to optimize standardization and interoperability of artificial intelligence applications in radiology, use case definitions need to use standardized definitions (CDEs) for algorithm input and output, not only to be interoperable with other electronic resources, but also to ensure that algorithms built around similar clinical applications have standardized inputs and outputs, so they can be compared and integrated into clinical workflows in a similar manner. Standardized use cases are critical to interoperability and integration of AI into clinical practice, and these use cases can only be developed using CDEs.
19.3.7 Promote Explicability of Algorithm Output
Use cases for pediatric bone age determination have typically specified that the algorithm output displays a radiographic of the result of the inference model along with radiographs of bone ages 6 months on either side of the inference so that the radiologist is able to choose the standard in best agreement with the patient’s radiographs . While the ability to provide saliency maps will be possible when the algorithm inference is the detection of radiographic findings, explicability determination for many other AI applications will need to be established as the specific use case is developed.
19.3.8 Facilitate Radiologist Input into Development, Validation, and Implementation
Development of a robust AI ecosystem where there is widespread adoption and clinical implementation of artificial intelligence is dependent on active radiologist involvement in the development, validation, and implementation of AI algorithms. Creation of AI tools at single institutions does not ensure that the validity of the algorithm will be the same in widespread clinical use. Furthermore, specific needs of various institutions and implementation pathways could be significantly different from one institution to another. In order to ensure the development of AI tools is generalizable to widespread clinical use, there should be general agreement among the end users, that is, radiologists and their health systems, regarding selecting use cases for AI that impact a significant clinical or administrative need and can be seamlessly integrated into the workflow. Radiologists should work collectively to define these important use cases for AI. Additionally, radiologists will be necessary to develop datasets for training and testing of algorithms, and standards should be developed to help them create datasets for algorithm training and testing that are accurately annotated and well curated so that developers will have robust and diverse data sources for training and testing. Radiologists will also play a significant role in ensuring algorithms are effective and safe for clinical use. Datasets used for algorithm validation prior to general clinical deployment should have higher standards for ground truth than the datasets used for algorithm training, and radiologists should play a significant role in creating the standards for algorithm validation including not only in ensuring validation datasets are as close to ground truth as possible but also in defining the metrics used for algorithm validation so that similar algorithms can be compared to one another in a similar fashion. Finally, radiologists will be the best source of ensuring the safety and efficacy of AI algorithms in clinical practice. Mechanisms for capturing input about algorithm performance from radiologists should be built into the clinical workflow, and radiologists must recognize the importance of their role in assessing the performance of the algorithm in clinical practice. Collaborations between individual radiology professionals, their medical specialty societies, and the developer community will be necessary to facilitate the advancement and clinical use of artificial intelligence in clinical practice.
19.4 Bringing Artificial Products to Widespread Clinical Use: Challenges, Opportunities for Radiologists, and the Role of Medical Specialty Societies
Radiology professionals can help mitigate these challenges by playing a leading role in the use case development process, and radiology professionals’ medical specialty societies can serve as a convener, coordinator, and honest broker to facilitate the process.
19.4.1 Creating Clinically Effective Artificial Intelligence Use Cases
A software “use case” is much more than just an idea for what a software application, including artificial intelligence algorithms, should do. In software development terms, a use case is a prose description of a computing system’s behavior when interacting with the outside world. First proposed by Jacobson in 1987 , a use case defines a list of actions or events between the end users (actors) and the computing system and describes the system’s behavior under various conditions as the system responds to requests from the primary actors . The actors may be humans, or in the case of healthcare, the electronic resources used by the healthcare team in daily interactions. For AI development purposes, an artificial intelligence use case defines exactly how an AI algorithm takes in information (images, EHR, genetic, structured data, or unstructured data) from the clinical workflow and then provides a specific output (detection, classification, quantification, prioritization, etc.) to the end user within the clinical workflow . To help move AI algorithms into clinical practice, AI use cases can also include parameters for how the algorithms are trained, tested, and validated for regulatory approval and clinical use, how they are deployed into clinical workflows, and how their effectiveness can be monitored in clinical practice.
Use case creation is an opportunity for radiologists to play a leading role in assisting developers create algorithms that will be useful, effective, and safe in clinical practice and enhance the value radiology professionals provide to their patients and health systems. Radiology subspecialty societies are uniquely positioned to convene multiple stakeholders, ensure patient safety, promote diversity in algorithm development, and collaborate with regulatory agencies to facilitate the introduction of AI algorithms into clinical practice. A result of single institution development of AI algorithms is that in the aggregate, specific use cases for artificial intelligence (AI) in diagnostic radiology are broadly and inconsistently defined with variation in how algorithms will be developed, validated, adopted, and monitored in clinical practice. There has been little validation of algorithms across more than a few sites, and whether the effectiveness of these algorithms will be generalizable to widespread clinical practice and how they will be integrated into clinical workflows across a variety of practice settings remains uncertain. The American College of Radiology’s Data Science Institute has developed a standardized process for AI use case development to help achieve the goal of widespread use of clinically relevant, safe, and effective AI algorithms in routine radiological practice . Technology-Oriented Use Cases in Healthcare AI (TOUCH-AI) is an open framework authoring system for defining clinical and operational AI use cases for the radiological sciences that intersect high clinical value with problems solvable by AI. TOUCH-AI provides a framework that includes narrative descriptions and flowcharts that specify the goals the algorithm should meet, the required clinical inputs, how it should integrate into the clinical workflow, and how it should interface with both human end users and an array of electronic resources, such as reporting software, PACS, and electronic health records. Combined with the ACR’s existing open framework for authoring and implementing computer-assisted reporting tools in clinical workflows, CARDS (Computer Assisted Reporting Data Science) and TOUCH-AI provide an end-to-end AI use case authoring platform for the development of ACR DSI use cases for the AI developer community.
Using the guidelines and open specifications in authoring tools such as TOUCH-AI and CARDS, AI use cases can be developed in an environment that creates uniform data elements that allow standardization of data elements for creation and annotation of datasets for algorithm testing and training, data elements and statistical metrics for algorithm validation, application programming interfaces (APIs) for algorithm deployment into clinical and departmental workflows, and data elements for monitoring the algorithm’s performance in widespread clinical practice. This process helps ensure patient safety by creating use cases that have data elements for algorithm validation and regulatory review and for monitoring real-world performance of the algorithm after deployment in routine clinical practice. This process also ensures AI use cases have data elements for effective clinical integration using workflow tools such as reporting software, the modalities, PACS, and EHR. The TOUCH-AI development platform takes advantage of the array of common data elements being created under the joint ACR RSNA RadElements project to optimize clinical interoperability and implementation by ensuring standardization of input and output elements from the algorithms . While ACR DSI use cases begin as narratives and flowcharts describing the use case, this human language is then converted to machine-readable format (Extensible Markup Language—XML).
While many of the ACR DSI AI use cases will be developed by the panel members, crowdsourcing in AI development, particularly in the form of AI competitions, has been a key to rapid dissemination of knowledge and technical information . These concepts should be applied to use case development as well. Radiologists can collaborate through specialty societies to develop larger pools of thought regarding the highest priority for use cases for the radiological sciences. Additionally, individual developers and institutions can take use cases they are working on and have them encoded with data elements specifying broader standardized annotation of training sets, validation, integration, and monitoring in clinical practice .
Crowdsourcing has been a helpful tool for engaging the developer community around the development of AI algorithms, and Kaggle has hosted a number of competitions related to healthcare and medical imaging [94, 95, 96, 97]. These competitions have engaged thousands of researchers and developers to focus their attention on healthcare use cases; however, participants in many instances are not healthcare or diagnostic imaging experts, which creates a lack of information about how physicians and other stakeholders will interact with an AI algorithm. Additionally, the sponsors have often created use cases that are generally unstructured with broad rather than specific goals for outputs that can be integrated into clinical practice. For instance, the 2017 Data Science Bowl sponsored by Kaggle and the Memorial Sloan Kettering Cancer Center used a public dataset from the US National Institute of Health and asked participants to “develop algorithms that accurately determine when lesions in the lungs are cancerous” in order to “dramatically reduce the false positive rate that plagues the current detection technology” . While the algorithms developed for this competition were impressive from a data science perspective, the clinical utility of these algorithms is difficult to determine. There was no structured mechanism for detection, localization, and characterization of the lesions defined in the use case, and as a result the output of the algorithms was variable. Most of the algorithms reported a percent cancer risk for an individual nodule was reported, but the information was in many ways not useful in routine clinical practice. For instance, while the risk of cancer in a nodule could be classified as 95% or 15%, the ultimate medical treatment for both nodules is still tissue sampling . A better output for the algorithm might have been to assign a Lung-RADS score  along with the additional features radiologists would use in reporting lung cancer screening examinations such as lesion location, lesion size, and lesion characteristics such as solid or subsolid and smooth, lobulated, or spiculated. For these reasons, AI use cases developed by the end users in concert with an understanding of available guidelines and electronic resources for clinical integration are likely to gain more widespread clinical adoption than those developed from more broadly based unstructured use cases. The American College of Radiology (ACR) and the Medical Image Computing and Computer Assistance Intervention (MICCAI) Society recently announced that MICCAI will be using ACR DSI use cases in the MICCAI imaging AI competitions in order to foster the development artificial intelligence algorithms that will better meet the clinical needs of radiologists .
Radiology specialty societies such as the American College of Radiology are uniquely positioned to facilitate the development of an AI ecosystem that convenes multiple stakeholders, ensures patient safety, promotes diversity in algorithm development, and collaborates with federal regulatory agencies and even the Congress to facilitate the introduction of AI algorithms into the market that will enhance the care radiology professionals provide for their patients .
19.4.2 Enhancing the Availability of High-Quality Datasets for Algorithm Testing and Training
Use cases that standardize definitions of data elements, tools, and instructions for annotating these datasets will enable a common framework for multiple institutions and developers to use for algorithm training and testing. Using multiple sites as data sources for these datasets provides technical, geographic, and patient diversity to prevent unintended bias in algorithm development and allows more individual radiology professionals and institutions to participate in the AI development process. Public directories of institutions that have created these datasets around structured use cases can inform the developer community about sites that have training datasets available. Compared to unsupervised learning or the use of only loosely annotated datasets for algorithm training, the cost of creating well-curated, deeply annotated datasets will be high. Expert analysts, including radiologists, and methods to analyze health records for pathology data will be needed to create high-quality datasets, and this process will be costly . However, if the datasets created around multiple use cases are widely available from multiple developers, the aggregate cost of training and testing AI algorithms could be substantially reduced. The current practice and associated costs of developers obtaining data from single institutions have led some developers to require practices and institutions providing data to developers to sign noncompete agreements. If developers are expected to work together, then the AI ecosystem will need support mechanisms to protect intellectual property while fostering the sharing of annotated datasets and tools.
Another challenge to be addressed will be the integration of multiple healthcare datasets that will be complex, heterogeneous, and inconsistently structured. An aspirational goal is to amass large datasets to facilitate novel disease correlations in order to match patients to the best treatments based on their specific health, life experiences, and genetic profile . AI holds the promise of integrating all of these data sources with imaging data to promote population health management. However, the availability of high-quality data and the ability of AI algorithms to integrate between a narrow AI use case for image recognition and a more general AI use case interacting with unstructured data from non-imaging data sources have to be considered.
A collaborative approach between institutions with annotated datasets built according to specific AI use cases and AI developers working on algorithms around those use cases can be enhanced by involvement of honest-broker third parties such as medical specialty societies who can house directories of institutions with available datasets. This could become a key function of the radiology AI ecosystem to facilitate the advancement of AI tools to clinical practice.
19.4.3 Maintaining Patient Data Privacy in Developing and Validating Artificial Intelligence Algorithms
Both healthcare culture and public law require physicians to closely protect patients’ health data, but the development of large patient datasets incorporating wide ranges of radiologic, clinical, and pathologic information across multiple institutions for the development of AI algorithms will necessitate a thorough re-examination of issues surrounding patient privacy, confidentiality, and informed consent. The same tools that are anticipated to be useful in the characterization of a patient’s disease may eventually extract information in a manner that makes any image identifiable to a specific patient, similar to a fingerprint. The integration of patient data from multiple sites and sources in the development of AI use cases likely enhances the risk of large-scale leakage of protected information. Routine disclosure of patient information care, at least within a given institution, is widely accepted within direct patient care, while otherwise identical disclosures for research and development require informed consent . This model raises a number of questions for how patient data in radiology AI can be perceived. Will informed consent be required only for patient data in the development of deeply annotated AI datasets? How will conformed consent be addressed if a patient’s data is used in assessing an algorithm in routine clinical practice is then used to refine/retrain the algorithm? If the data is used to develop applications sold for profit, are patients entitled to compensation? What mechanisms are in place to protect individuals who do opt out? These questions will have to be addressed as clinical AI becomes routine .
One key in managing the use of patient data will be transparency. In general, the public is willing to share personal data if they believe there will be downstream benefits, but they want to be confident it will not be shared in ways they do not understand. In an interview with the Harvard Business Review, MIT professor Alex Petland contradicts the notion that organizations collecting the data actually own the data. He goes on to say that without developing rules for who does, the public will revolt, regulators will get involved, and there will inevitably be restrictive overreaction, and as such, applications such as AI, which are dependent on these data, will fail to reach their potential. Petland’s “New Deal on Data” proposes that transparency depends on allowing the public to see what is being collected and then allowing individuals opportunities to opt in or out . The AI community should work together to create an infrastructure that allows responsible use of patients’ health data to facilitate the development of AI tools that will improve population health. The industry should welcome structure around responsible data use, and having defined rules for data use will ultimately facilitate the development AI tools and hopefully prevent data breaches and other data disasters which could set the industry back decades.
Nonetheless, providing developer access to the large datasets will create the opportunity for large leaks of protected information, and new cryptography techniques should be considered . Blockchain methodologies use a distributed database consisting of continuously updated (augmented) “blocks” which contain a linked list of all previous transactions . In the case of healthcare, this encompasses all previous records of access to an individual data record including information about how the data was used and any additions or changes to the data record . Blockchain technology can also be used to validate the provenance of data and facilitate the distribution of data without compromising the quality of data. Pilots are underway assessing the ability of blockchain type ledgers to function within Health Level 7 and FHIR standards for electronic health records. In health systems, blockchain technology may solve some problems for researchers such as localizing the most current record and tracking patient activity across a health system. Development of Merkle tree technology for health systems , which uses a hash function and hash values to track changes to the database, may be one way to ensure security in a distributed data system. This type of data structure allows verification of users who made changes and what changes were made making it difficult to corrupt the database since changes in the data cause changes in the hash codes. No matter which technologies are ultimately considered most effective in protecting data privacy, the AI ecosystem must embrace standards for data security and patient privacy in both centralized and distributed models for algorithm development and implementation. This will help ensure there are no systematic data breaches or other data disasters that would almost certainly impede the development and implementation of AI algorithms in healthcare .
19.4.4 Enhancing Algorithm Validation
In addition to enhancing the supply of datasets available for training and testing, a robust AI ecosystem should also focus on creating rigorous testing and validation approaches for the clinical use of AI algorithms in order to identify and mitigate any problems in implementation to provide confidence to the medical community. The 2017 JASON Report Artificial Intelligence for Health and Health Care further recommends that work to prepare and assist developers of promising AI applications navigate the regulatory and other approval processes needed for acceptance in clinical practice should be supported and include “testing and validation approaches for AI algorithms to evaluate performance of the algorithms under conditions that differ from the training set” . One such approach is development of a centralized program that allows assessment of algorithm performance using novel validation datasets and the statistical metrics specified in structured AI use cases. By specifying the elements in the AI use case, algorithms can be readily compared and assessment for clinical deployment standardized. These validation datasets could be developed from an amalgam of datasets created at multiple institutions which when used in the aggregate would ensure geographic, technical, and patient diversity within the validation dataset. In addition to ensuring diversity within the validation datasets, these datasets must be held to the highest ground truth reasonably achievable by using data labeled at levels that exceed standard assessments when possible including the use of biopsy results to label dermatological images . Multiple readers and guidelines for data quality should be used to ensure consistency between sites and consistent metrics for measuring performance of different algorithms built around the same use case. Internal standards to protect developers’ intellectual property and to ensure patient privacy and diminish potential unintended bias in algorithm performance should also be developed. With these fundamentals in place, these validation centers could then prepare reports for developers about their algorithm’s performance for use in the regulatory approval processes such as US FDA clearance. As discussed previously, the US FDA is looking for tools within the MDDT program that developers can use to facilitate the regulatory approval process. While these have not been officially established as “special controls,” in the FDA’s proposal to reclassify many SaMD products as Class II (special controls), the AI community should welcome the opportunity to develop a streamlined process that can move AI products expeditiously into clinical practice.
Acceptance of AI in clinical practice will be dependent on the believability and explicability of the algorithm output. The JASON 2017 report Artificial Intelligence For Health and Healthcare highlighted this issue by summarizing a series of studies demonstrating the value of quantitative information from cardiac fluid flow reserve computed tomography (FFRCT) for identifying patients with clinically significant coronary artery disease at less cost than invasive coronary angiography [2, 106]. The favorable results shown by these studies as well as an independent review by United Kingdom’s National Institute for Health and Care Excellence (NICE) resulted in NICE issuing guidance FFRCT into the NICE pathway on chest pain . Because the FFRCT technology is based data than can be readily verified in clinical practice, physician acceptance may be better than for less-understood outputs of general AI algorithms. For the medical community to develop trust in AI-based tools, assessments at least as rigorous as the FFRCT technology will be needed .
19.4.5 Enhancing Clinical Integration
19.4.6 Mechanisms for Assessing Algorithm Performance in Clinical Practice
As methods for assessing algorithm performance in clinical practice are established, data elements in each structured AI use case can specify the appropriate data elements that should be captured in order to monitor an algorithm’s performance in clinical practice. Radiologist input is gathered as the case is being reported, and if the radiologist does not incorporate the algorithm inferences into the report, this change is captured in the background by the reporting software. If the radiologist agrees with the output of the algorithm, this is also noted and transmitted to a data registry. Radiology specialty societies are also uniquely positioned to host these registries. Metadata specified in the AI use case about the examination such as equipment vendor, slice thickness, and exposure are also transmitted to the registry. Algorithm assessment reports provide a summary of the algorithm’s real-world performance across a wide variety of practice settings. Areas where algorithm performance is low are correlated with examination metadata to look for patterns that will allow improvements to the algorithm through additional training. These reports will also be useful to developers in reporting real-world performance to regulatory agencies such as the US FDA and to the clinical sites to ensure their algorithm performance is in line with national benchmarks.
The American College of Radiology National Radiology Data Registry (NRDR)  is an example of how radiology specialty societies are helping the specialty capture and benchmark information about quality, patient safety, and other improvement activities. AI data registries can potentially capture both radiologist assessment and metadata about the examination without hampering clinical workflow. The results can be collated centrally and provided to developers and the clinical sites to ensure patient safety and improve algorithm effectiveness.
19.4.7 The Economics of AI and Business Models for Moving AI to Clinical Practice
A key ingredient in moving artificial intelligence (AI) algorithms for healthcare into routine clinical practice will be ensuring our healthcare system supports the fair compensation for the development of these algorithms and other AI tools, but developing a process for how that will happen may not be as simple as it might seem. Costs in the US healthcare system are already at unsustainable levels, and so developers and the physician community will have to demonstrate the value and cost savings that each artificial intelligence algorithm brings to our patients and our healthcare system before reimbursement from third-party payers can be considered. The value to patients may be in earlier and more accurate diagnoses and treatments. The value to physicians may be in improved efficiency in data management and integration, and the value to our health systems may be in improved quality of care, overall efficiency, and decreased length of stay.
Developers will need understanding of current and future payment models to develop sustainable business models and has to begin with the current US fee-for-service (FFS) model. In this system, specific medical services, procedures, and supplies are reimbursed using the Center for Medicare and Medicaid Service’s (CMS) Healthcare Common Procedure Coding System . Level I of the HCPCS system is based on Current Procedural TerminologyTM (CPT), which is a numeric coding system developed and maintained by the American Medical Association. The CPT system identifies and describes medical services and procedures commonly furnished and billed by physicians and other healthcare professionals. However, CPT does not include the codes needed to separately report medical items or services for patients that are provided outside of the physician office setting, such as durable medical equipment and supplies. The Level II HCPCS was established to provide codes for the non-physician providers to submit claims for these items to Medicare and private health insurance programs. Each HCPCS code is assigned a value by Medicare and other payers, and claims are submitted by providers based on these codes. When medical equipment and supplies are used in the physician office setting, the reimbursement for these items is included in the CPT code payment to the physician as “Direct Practice Expense”; however, when the same services are performed by physicians in the hospital or site of service other than a physician office, the payments for equipment and supplies payments are made directly to the facility. As such, each CPT code in the Medicare Physician Fee Schedule (PFS) has different payments to physicians based on whether the service was provided in a physician’s office (non-facility) or hospital (facility) setting [108, 109]. Finally, a portion of the payment for each physician service (“Indirect Practice Expense”) is designed to cover the costs of operating a practice including office rent, utilities, computers, and billing costs. The Medicare PFS uses the resource-based relative value scale (RVRVS) to assign relative value units (RVUs) for each physician service, and then all of the practice expenses are then converted to RVUs. RVUs for physician work and compensation for professional liability insurance are added to the direct and indirect practice expense RVUs to comprise the total RVUs for each physician service in the Medicare PFS, which is then multiplied by a conversion factor set by CMS to give the dollar payment to physicians. Hospitals are reimbursed under two separate payment systems, the inpatient prospective payment system (IPPS), which uses diagnosis-related groups (DRG) as its fundamental coding system, and the hospital outpatient prospective payment system (HOPPS), which uses ambulatory payment classification (APC) as its fundamental coding system. Each of these systems accounts for the payments for medical equipment, devices, and supplies in different ways. And while some private payers base their payment systems on the Medicare PFS, each private insurer has their own way assigning reimbursement for medical equipment, devices, and supplies to each service.
While the various US payment systems are complicated in their own right, the process is made even more complicated because there will not be a one-size-fits-all payment scheme for reimbursing the use of AI in healthcare. Some algorithms will affect payments to physicians, perhaps making their work more efficient or perhaps more time consuming as we bring in more and more patient information into our care of complex patients. Some algorithms will improve the overall quality and efficiency of our practices and health systems but cannot be attributed or assigned to a specific service or procedure, and while some algorithms may be directly reimbursable by third-party payers, many will not. Finally, all algorithms that are adopted by physicians and our health systems must be able to document that they are providing demonstrable value to our patients in a safe and bias-free environment.
The US CMS Quality Payment Program (QPP)  is the next step in the development and adoption of alternate payment models (APMs) in US healthcare. The QPP includes the Merit-based Improvement Payment System (MIPS) and Alternate Payment Models (APM). The MIPS uses four categories—quality, clinical practice improvement, resource use, and advancing care information—to adjust Medicare FFS payments to physicians, up or down by as much as 9% in 2022, based on their performance in each category. Measures for quality, clinical practice improvement activities, and advancing care information are reported to CMS by physicians, and if certain AI algorithms are able to provide documented value and improved quality to our patients, the use of the algorithms to improve patient care, quality, and value can be included as MIPS measures. While APMs are much less prevalent in the United States, algorithms that increase overall efficiency for health systems will be welcomed as the medical community strives to do more for our patents at a lower overall cost. In the alternate payment models, assigning and attributing a per unit cost of an AI algorithm to an individual CPT code will be much less important than ensuring the algorithm functions in a way that augments the care provided to patients without taking away the commonsense decisions of physicians and our patients.
Finally, the economics of AI in healthcare will have to include a discussion about potential disparities if AI is available to some patients and not available to others. While market leaders will likely emerge touting that their services include the latest AI innovations, the global healthcare system should not devolve into a two-tier system where some can afford AI, while others cannot. The reimbursement system has a duty to protect our patients by ensuring all physicians have access to these potentially revolutionary tools.
Radiology specialty societies such as the American College of Radiology have always been strategically involved in the federal regulatory and payment policy issues around the radiological sciences. Reimbursement issues for moving artificial intelligence into clinical practice will have to be considered in the payment policy arena. Specialty societies can function in the AI ecosystem to provide education around regulatory payment policy issues around AI, and these policy issues were discussed with developers, physicians, and the AI community at the ACR Data Science Institute’s Data Science Summit: The Economics of AI in conjunction with Society for Imaging Informatics in Medicine (SIIM). The proceedings of this summit are freely available to the community .
Medial specialty societies also play important roles in interacting with regulatory agencies including the US FDA, the International Atomic Energy Agency, and the World Health Organization (WHO), all of which may play an eventual role in regulating healthcare AI. Radiology specialty societies are uniquely positioned to serve as honest brokers with these regulatory agencies facilitating processes that advance the use of AI in clinical practice while protecting patients by ensuring algorithms are safe and effective in clinical use.
19.4.8 Facilitating the Development of Non-interpretive Use Cases for Artificial Intelligence in Radiological Practice
Non-interpretive AI algorithms will also be important for the radiology professionals . Use cases that promote quality, safety, protocol optimization, patient experience, and many others will be valuable to both radiology professionals and hospital systems. End users will not only include radiologists but also technologists, hospital administrators, hospital quality team, and hospital finance team. As with the interpretive-based AI use cases, development of appropriately curated data will be necessary for algorithm training, and demonstration projects will be needed to demonstrate the clinical utility. While these types of algorithms may not require regulatory approval, processes to ensure algorithms are effective and free of unintended bias will be important. Radiologists and radiological specialty societies can play an active role in facilitating the development of AI tools for non-interpretive uses by developing use cases for researchers and developers that address important workflow, patient access, and numerous non-interpretive issues in the radiological community. Developing standards for interoperability for using AI across the entire health enterprise will be even more important for developing non-interpretive uses for AI than the interpretive uses. Not only will data from imaging studies be necessary, but data from a variety of electronic resources will also be needed to bring in the additional information to accomplish these uses of AI. Radiologists and radiology specialty societies should play leading roles in working with all of medicine and the HIT community to develop these interoperability standards with artificial intelligence in mind. Additionally, specialty societies can coordinate piloting demonstration projects that can be used to establish utility and effectiveness of AI in using the abundance of data in the health systems, and the AI community should look for methods that can continuously monitor the effectiveness of these tools as they are deployed in clinical practice.
19.4.9 Educating Non-radiologist Stakeholders About the Value of AI
Fostering collaborations between stakeholders requires education demonstrating the value of establishing an AI ecosystem. Radiology specialty societies can foster collaborations between organizations establishing joint educational programs and other defined collaborations. These same organizations as well as governmental agencies and the developer community can provide venues that bring all stakeholders together. The Machine Learning Showcase at RSNA 2017 gathered AI developers into a common location and also provided a venue for education . There have been a number of events that included industry at meetings such as the Society for Imaging Informatics in Medicine and the American College of Radiology . The American College of Radiology DSI also hosted FDA representatives in the Fall of 2017. Finally, technology companies such as NVIDIA have hosted educational meetings where radiology specialty societies were invited to provide their perspectives on AI. These collaborative meetings should continue .
Additionally, radiology specialty societies are working to prepare the profession for the opportunities AI will bring. Rather than opposing AI initiatives as a threat to the specialty, radiology organizations have been providing educational activities that demonstrate how AI will help radiology professionals take better care of their patients and in turn be more valuable to their health systems .
19.5 Summary of the Proposed AI Ecosystem for the Radiological Sciences
Another opportunity for radiologist participation in the AI development process is in the production of well-annotated datasets for algorithm testing and training. While many radiologists have begun working with individual developers to annotate data for use in algorithm development, by using structured use cases as the basis for this effort, many practices can create training data based on the specifications in the AI use case that can then be used in aggregate by developers for algorithm development, training, and testing. The aggregated data provide a diverse mix of technical differences and variability in patient populations typical of widespread clinical practice. The healthcare ecosystem including physicians, healthcare administrators, government regulators, and patient advocates should support these efforts by offering standardized methodologies for deidentifying sensitive patient information across the health system so that development of AI in healthcare can proceed at a reasonable pace. A potentially important consideration is that if developers can use the datasets created by individual radiology practices on-prem at the institution rather than a centralized offsite location out of the institution’s direct control, the patient information is much better controlled and protected than if contained in a centralized repository or completely under the control of individual developers. This model allows development of a large data pool with technical and geographic diversity while avoiding the risks associated with a large centralized repository of patient data (Fig. 19.15b).
In a robust AI ecosystem, there should be many opportunities for radiological practices to participate in validation of AI algorithms for the radiological sciences. Structured AI use cases should contain the data elements and statistical metrics necessary to ensure algorithms will be safe and effective in clinical practice, and developers should be aware in advance how the algorithms will be evaluated. Furthermore, to facilitate comparison, similar algorithms should be evaluated using similar statistical metrics. Since the algorithm performance against validation datasets may be used to obtain premarket regulatory approval for many AI applications, the validation process must be robust, standardized, and statistically valid. This means that standards for ground truth must be higher than those for creating training datasets and should even exceed the standards for routine clinical practice.
In contrast to the training datasets, the validation datasets should be held centrally. A centralized repository of validation datasetsmaintained by a “third-party” honest broker promotes confidentiality so that the validation data cannot used for algorithm training. Additionally, safeguards can be in place to ensure protection of patient information as well as developer intellectual property. A natural host for algorithm validation would be radiological specialty societies. For example, the American College of Radiology (ACR) has developed an infrastructure designed to support multicenter clinical trials using imaging data. This infrastructure includes the ability to transmit DICOM, HL7, and other data sources from clinical sites to a central repository along with tools for data curation and aggregation to combine the results from individual sites into a combined result [114, 115]. Demonstrating the effectiveness of third-party validation of AI algorithms will be important in order to convince regulatory bodies such as the US FDA that these processes could be used in an AI algorithm’s premarket approval process. Once again, professional medical societies have a role to play in interacting with governmental agencies to facilitate a review process that facilitates AI development while ensuring the safety of patients and the public. The ACR has a long history of interacting with the US FDA to promote radiological quality and safety, particularly implementation of the Mammography Quality Standards Act (MQSA)  and radiation safety issues. The US FDA adopted the ACR Accreditation Program as a means to demonstrate MQSA compliance. Therefore, it seems that public-private partnerships between governmental regulatory agencies and medical specialty societies could be developed for AI in healthcare as well (Fig. 19.15c).
Once an AI product has received regulatory clearance for marketing, developing pathways for clinical implementation of AI models will be necessary. Structured use cases should contain data elements that specify how the output of the algorithm should interact with other electronic resources. Standardization of algorithm output so that the data can be used to inform the electronic resources used by physicians is necessary, and more robust standards for communicating between the array of electronic healthcare resources should be developed as well. Physicians and professional societies should also play a role in this process as well. The ACR and NEMA created DICOM to move image data between the electronic interfaces used by radiology. Professional organizations should be involved in development of standards that allow movement of AI inference model outputs to the most usable locations in a patient’s medical records.
Finally, to gain wide acceptance in the healthcare markets, being able to assure end users and the public that the AI applications used in medical practice perform as expected cannot be overstated. Physicians and patients will expect nothing less, but collection of real-world performance data will not be trivial. Physicians will not want to be distracted from their clinical workflows in order to complete and submit forms or other data designed to monitor performance in practice, and even data collected in that manner is likely to be unhelpful in systematically monitoring the real-world performance of AI algorithms. To mitigate these challenges, structured AI use cases can contain data elements specifying pathways for how AI algorithms will be monitored in routine clinical practice. For instance, AI algorithms designed to assist radiologists in lesion characterization could display the AI inference in the PACS and prepopulate a radiologists’ report. If the radiologist does not change the report, then the algorithm is considered to have worked as expected. If the radiologist changes the report beyond a predefined tolerance, then the algorithm is considered to have failed for that examination. To help understand potential reasons for algorithm failure, the transcription or other system can collect metadata about the examination specified in the use case in the background. For each instance of algorithm use, radiologist agreement and metadata can be transmitted to a registry for aggregation and collation. Reports regarding algorithm performance can be generated for developers to ensure compliance with any post-market regulatory requirements. By correlating algorithm performance with examination data, developers can understand which examination parameters may be associated with poor algorithm performance and expand training and testing to include those circumstances. These data can be collected and housed in data registries. Currently many medical specialty societies offer the collection and benchmarking of practice data . In some instances, data registries housed by specialty societies have dramatically improved the cost of premarket review for FDA clearance . If these processes can be implemented on a widespread basis, radiologists will be in the center of ensuring the development and use of AI in clinical practice reaches its potential, and feedback from clinical use will be the best way to assist developers improve software and expand algorithms into more and more clinical problems (Fig. 19.15d).
The development and implementation of AI algorithms for use in routine clinical practice will benefit from the establishment of an AI ecosystem that leverages the value of radiologists and radiology specialty societies from the development of AI use cases to assessing the use of AI in routine clinical practice. Such an ecosystem includes not only physicians, researchers, and software developers but also regulatory agencies, the HIT industry, and hospital administrators. By developing structured AI use cases based on the needs of the physician community, developers can create the tools that will advance the practice of medicine. If these use cases can specify how datasets for algorithm training, testing, and validation can be developed including statistical metrics for validation, parameters for clinical integration, and pathways for assessing algorithm performance in clinical practice, the likelihood of bringing safe and effective algorithms to clinical practice will increase dramatically. Additional challenges for the community such as respecting patient privacy, technical and geographic diversity, as well as decreasing unintended bias in algorithm development will be best solved with collaboration between all stakeholders. Finding a balance between promoting innovation and respecting and protecting confidential patient information will also require a consensus between the healthcare community and the public, and finally the healthcare community must come together to promote interoperable standards so that data from AI algorithms can be delivered to the electronic resource where it can be most useful to physicians and their patients. The development of an active AI ecosystem will facilitate the development and deployment of AI tools for healthcare that will help physicians solve medicine’s important problems.
Moving artificial intelligence tools in diagnostic imaging to routine clinical practice and avoiding another AI winter will require cooperation and collaboration between developers, physicians, regulators, and health system administrators.
Radiologists can play an important role in promoting this AI ecosystem by delineating AI use cases for diagnostic imaging and standardizing data elements and workflow integration interfaces.
Medial specialty societies can play a leading role in protecting patients from unintended consequences of AI through involvement in algorithm validation.
AI registries will be useful in monitoring the effectiveness and safety of AI tools in clinical practice.
- 2.JASON 2017. Artificial intelligence for health and heath care. JSR-17-Task-002.Google Scholar
- 3.Definition of Ecosystem. [Internet]. Merrian-webster.com. 2018 [cited 10 June 2018]. Available from: https://www.merriam-webster.com/dictionary/ecosystem
- 5.Moore JF. The death of competition: leadership and strategy in the age of business ecosystems. New York: HarperBusiness; 1996 May.Google Scholar
- 8.Barnett, JC, Berchick, ER. Current population reports, P60–260, Health Insurance Coverage in the United States: 2016, U.S. Washington, DC: Government Printing Office; 2017.Google Scholar
- 15.Huffman J. Healthcare Information and Management Systems Society. 2018 March 6.Google Scholar
- 18.McCarthy J. From here to human-level AI. In Proc. of principles of knowledge representation and reasoning (KR 1996).Google Scholar
- 19.Taubes G. The rise and fall of thinking machines. Inc. 1995;17(13):61–5.Google Scholar
- 20.Yang Z, Zhu Y, Pu Y. Parallel image processing based on CUDA. In Computer Science and Software Engineering, 2008 International Conference on 2008 Dec 12 (vol. 3, pp. 198–201). IEEE.Google Scholar
- 21.Ciregan D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification. In Computer vision and pattern recognition (CVPR), 2012 IEEE conference on 2012 Jun 16 (pp. 3642–3649). IEEE.Google Scholar
- 22.Mobile Fact Sheet. Pew Research Center: Internet, Science & Tech. 2018 [cited 10 June 2018]. Available from http://www.pewinternet.org/fact-sheet/mobile/
- 24.Remnick D. Obama reckons with a Trump presidency. The New Yorker. 2016 Nov;28:28.Google Scholar
- 25.Hinton G. Geoff Hinton on Radiology. Machine Learning and Market for Intelligence Conference, Creative Disruption Lab Toronto, Canada. 2016. Viewable at: https://www.youtube.com/watch?v=2HMPRXstSvQ
- 26.Oncology Expert Advisor [Internet]. MD Anderson Cancer Center. 2018 [cited 10 June 2018]. Available from: https://www.mdanderson.org/publications/annual-report/annual-report-2013/the-oncology-expert-advisor.html
- 27.Herper M. MD Anderson benches IBM Watson in setback for artificial intelligence in medicine. Forbes. Zugriff im Juli. 2017 Feb.Google Scholar
- 32.Buolamwini J, Gebru T. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency 2018 Jan 21 (pp. 77–91).Google Scholar
- 33.Health Insurance Portability and Accountability Act of 1996 (HIPAA.)Pub. L. 104–191, 110 Stat. 1936 (1996)Google Scholar
- 34.The HIPAA Privacy Rule. 45 CFR 160, 162, and 164. 28 Dec 2000.Google Scholar
- 35.The Security Rule. 45 CFR Part 160 and Subparts A and C of Part 164. 20 Feb 2003.Google Scholar
- 36.Artificial Intelligence For Health and Health Care. https://www.healthit.gov/sites/default/files/jsr-17-task-002_aiforhealthandhealthcare12122017.pdf
- 37.AI has no place in the NHS If patient privacy isn’t assured. Wired. http://www.wired.co.uk/article/ai-healthcare-gp-deepmind-privacy-problems
- 38.US Food and Drug Administration. What we do. https://www.fda.gov/AboutFDA/WhatWeDo/
- 39.US Food and Drug Administration. Medical Devices.Google Scholar
- 40.The 21st Century Cures Act. Pub. L. 114–255.Google Scholar
- 41.US Food and Drug Administration. Response To 21st Century Cures Act. https://www.fda.gov/ downloads/MedicalDevices/DeviceRegulationand Guidance/GuidanceDocuments/UCM587820.pdf
- 42.US Food and Drug Administration. Software as a medical device. Do. https://www.fda.gov/MedicalDevices/DigitalHealth/SoftwareasaMedical Device/default.htm
- 43.US Food and Drug Administration. International Medical Device Regulators Forum. https://www.fda.gov/MedicalDevices/International Programs/IMDRF/default.htm
- 44.Qualification of Medical Device Development Tools. https://www.fda.gov/downloads/Medical Devices/DeviceRegulationandGuidance/Guidance Documents/UCM374432.pdf
- 45.US Food and Drug Administration. Medical Device Development Tools Program. https://www.fda.gov/MedicalDevices/ScienceandResearch/MedicalDevi ceDevelopmentToolsMDDT
- 46.US Food and Drug Administration. National Evaluation System for Health Technology. https://www.fda.gov/aboutfda/centersoffices/office ofmedicalproductsandtobacco/cdrh/cdrhreports/ucm301912.htm
- 47.US Food and Drug Administration. National evaluation system for health technology demonstration projects. https://nestcc.org/demonstration-projects/
- 48.Lund-RADS Assist: Advanced radiology guidance, reporting and monitoring. https://www.acr.org/Media-Center/ACR-News-Releases/2018/FDA-NEST-Program-Names-ACR-DSI-Use-Case-as-Demo-Project
- 49.Digital Health Software Precertification Program. https://www.fda.gov/MedicalDevices/DigitalHealth/DigitalHealthPreCertProgram/default.ht
- 50.US FDA Software Precertification Program. https://www.fda.gov/downloads/MedicalDevices/DigitalHealth/DigitalHealthPreCertProgram/UCM605685.pdf
- 51.US FDA Classification of Medical Devices. https://www.fda.gov/MedicalDevices/Device RegulationandGuidance/Overview/ClassifyYourDevice/
- 53.US FDA de novo approval clinical decision support software for stroke. https://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm596575.htm
- 54.US FDA de novo approval artificial intelligence based device to detect diabetes related eye problems. https://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm604357.htm
- 55.US FDA de novo approval of artificial intelligence algorithm for aiding providers in detecting wrist fractures. https://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm608833.htm
- 58.USFDA approval QuantX as Class II device. https://www.accessdata.fda.gov/cdrh_docs/pdf17/DEN170022.pdf
- 60.ACR, Imaging 3.0. http://www.acr.org/Advocacy/Economics-Health-Policy/Imaging-3.
- 62.LOINC. Available at: http://loinc.org/about/
- 64.A Brief History of DICOM. In: Digital Imaging and Communications in Medicine (DICOM). Berlin, Heidelberg: Springer; 2008.Google Scholar
- 65.HL7 protocols. http://www.hl7.org
- 66.Fast Healthcare Interoperability Resources Specification. http://www.hl7.org/implement/standards/product_brief.cfm?product_id=449
- 70.ACR National Radiology Data Registry. https://nrdr.acr.org/Portal/Nrdr/Main/page.aspx
- 72.Structured Reporting. http://www.radreport.org
- 75.Rad Elements. http://www.radelement.org
- 76.Miller T, Howe P, Sonenberg L. Explainable AI: Beware of inmates running the asylum. InIJCAI-17 Workshop on Explainable AI (XAI). 2017 (p. 36).Google Scholar
- 77.American Medical Association Policy. https://www.ama-assn.org/ama-passes-first-policy-recommendations-augmented-intelligence
- 80.Data Science Bowl Lung Cancer Detection. http://blog.kaggle.com/2017/06/29/2017-data-science-bowl-predicting-lung-cancer-2nd-place-solution-write-up-daniel-hammack-and-julian-de-wit/
- 81.Iglovikov V, Rakhlin A, Kalinin A, Shvets A. Pediatric Bone Age Assessment Using Deep Convolutional Neural Networks. arXiv preprint arXiv:1712.05053. 2017 Dec 13.Google Scholar
- 83.Rajpurkar P, Irvin J, Zhu K, Yang B, Mehta H, Duan T, Ding D, Bagul A, Langlotz C, Shpanskaya K, Lungren MP. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv preprint arXiv:1711.05225. 2017 Nov 14.Google Scholar
- 86.FDA Announcements. https://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/default.htm
- 87.Reclassification of Medical Image Analyzers. https://www.federalregister.gov/documents/2018/06/04/2018-11880/radiology-devices-reclassification-of-medical-image-analyzers
- 89.RSNA Machine Learning Showcase. https://www.rsna.org/Machine-Learning-Showcase/
- 91.Jacobson I. Object-oriented development in an industrial environment. ACM SIGPLAN Not. 1987 Dec 1;22 (12):183–191). ACM.Google Scholar
- 92.Alistair C. Writing effective use cases. Michigan: Addison-Wesley; 2001.Google Scholar
- 94.Competitions Kaggle Data Science Bowl. https://www.kaggle.com/c/data-science-bowl-2017
- 95.Competitions Kaggle Lung Cancer Risk. https://www.kaggle.com/c/msk-redefining-cancer-treatme nt
- 96.Competitions Kaggle Heart Disease. http://www.datasciencebowl.com/competitions/transform ing-how-we-diagnose-heart-disease/
- 97.Competitions Kaggle Seizure Prediction. https://www.kaggle.com/c/seizure-prediction
- 98.Personal communication. (soon in press_Andriole, Katherine. MGH and BWI Center For Clinical Data Science.Google Scholar
- 99.Lung-RADS American College of Radiology. https://www.acr.org/Clinical-Resources/Reporting-and-Data-Systems/Lung-Rads
- 101.Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W. Opportunities and obstacles for deep learning in biology and medicine. bioRxiv. 2018 Jan;1:142760.Google Scholar
- 103.Berinato S. With big data comes big responsibility. Harv Bus Rev. 2014;92(11):20.Google Scholar
- 104.Merkle RC. A digital signature based on a conventional encryption function. In Conference on the theory and application of cryptographic techniques 1987 Aug 16 (pp. 369–378). Berlin, Heidelberg: Springer.Google Scholar
- 106.Clinical trials. https://clinicaltrials.gov/ct2/show/NCT01189331
- 107.Ekblaw A, Azaria A, Halamka JD, Lippman A. A case study for blockchain in healthcare: “MedRec” prototype for electronic health records and medical research data. In Proceedings of IEEE Open & Big Data Conference 2016 Aug 22 (vol. 13, p. 13).Google Scholar
- 111.ACR Data Science Institute Data Science Summit. https://www.acrdsi.org/dsisummit2018
- 112.NVIDIA GTC. https://www.nvidia.com/en-us/gtc/
- 114.ACR TRIAD. https://triadhelp.acr.org
- 115.ACR DART. https://dart.acr.org
- 116.MQSA public Law. PL 102-539.Google Scholar