AI in Healthcare: Bridging Ideas to Implementation
Authors: Zack Allen, MD, MBA and Jeff Chang MD, MBA
The journey of machine learning, from its nascent stages in the 1940s with the likes of Alan Turing pondering the possibilities of machines thinking, has been nothing short of revolutionary. However, technological integration into the healthcare sector has not been without its pitfalls—for instance, the cumbersome electronic health record (EHR) systems. This post aims to provide general advice on navigating the complex landscape of developing AI applications in healthcare, offering insights for achieving seamless integration and impactful outcomes. Depending on your project, the specifics will vary greatly, but following some or all of the below will make for much smoother sailing.
Identifying the problem and having a clear end result.
The first step in creating a meaningful application is identifying a concrete problem. This might seem straightforward, but it's where many teams need to improve. The importance of this foundational step cannot be overstated. Successful ML applications are not just about leveraging advanced algorithms or having access to vast datasets; it's about applying these tools to solve a specific, well-defined problem that has real-world implications for patients and providers.
A practical example of this concept is Rad AI Omni Impressions, a tool designed to assist radiologists by generating high-precision summaries of their reports known as ‘Impressions’. The creation of these ‘Impressions’ is normally repetitive and time consuming for radiologists. Impressions specifically aims to alleviate the burden during this phase of the workflow, focusing on tangible goals such as reducing fatigue and enhancing efficiency.
Similarly, while ML has tremendous transformative potential, it is worth cautioning against the allure of technology for technology's sake (Topol, 2020). The key is not just to innovate but to innovate with purpose and, as discussed later, with a strong proposed Return on Investment (ROI) over the status quo. The development of ML applications must be driven by a clear understanding of the challenges they aim to address and the tangible benefits they seek to deliver.
The process of problem identification involves not only understanding the clinical need but also engaging with stakeholders, including healthcare professionals, patients, and policymakers. This engagement ensures that the ML solution is not only technologically advanced but also practical, usable, and aligned with the needs of those it aims to serve. Furthermore, setting clear, achievable goals is essential. These goals should be specific, measurable, attainable, relevant, and time-bound (SMART), providing a roadmap for the project's development and implementation.
A real-world example of this approach can be seen in the development of ML applications for diagnosing diabetic retinopathy. The problem is clear and well-defined: early detection of this condition can prevent severe visual impairment. The goal, therefore, is to develop a tool that can accurately and efficiently screen patients for early signs of diabetic retinopathy, facilitating timely intervention. This example underscores the importance of beginning with a concrete problem and having a clear end result in mind.
Ensuring data represents real-world situations.
Data is undoubtedly the lifeblood of any AI application, but it's a whole different ball game when it comes to healthcare. Here, sheer volume of data isn't the only game in town. It's the diversity and quality of that data that truly makes a difference, ensuring it accurately reflects the myriad complexities and nuances of real-world medical scenarios.
For LLMs and other contemporary model architectures to truly excel, they need more than just a large amount of data. They need the correct data. Just like in any educational setting where the quality of materials determines the quality of learning, LLMs require high-quality, well-curated data to thrive. This isn't just about feeding them information; it's about educating them with data that's as close to the real deal as possible.
Drawing from the insights provided in the paper "Exploring the Impact of Data Quality on Large Language Models" (arXiv:2306.11644) it becomes evident that the 'nutrition' we provide to these AI models through data significantly impacts their performance across the board. The paper eloquently discusses how nuanced, high-quality datasets serve as the optimal nourishment that enables LLMs, particularly in the healthcare domain, to achieve better performance. It's akin to the difference between fast food and a balanced diet for humans; the quality of the input directly affects the output.
In practice, this involves using targeted datasets created in real-world workflows, and applying domain knowledge to highlight the most meaningful parts of this data. At Rad AI, we have access to hundreds of millions of radiology reports. However, effectively utilizing them is more complex than simply feeding them into a model training pipeline. To focus the data in a way that reflects what you want your model to learn, you must first understand the data you're starting with.
Imagine teaching a medical student with outdated textbooks and then expecting them to perform complex surgeries. The outcome likely wouldn't be great. The same goes for models. If we're aiming for AI applications that not only support but enhance healthcare delivery, we must prioritize the impeccable quality and variety of data we provide. This ensures that these models can handle the complexities of healthcare, offering data-efficient approaches that significantly surpass their predecessors.
Have an ROI pitch.
The theoretical promise of AI in healthcare is potentially enormous. However, the tangible results are often elusive. This is one of many reasons why ROI is a core component of anything worth building. To be clear, we are not advocating ROI in purely financial terms. In fact, the best clinical-facing offerings tend to start with non-financial ROI and then supply financial benefits as an additional consideration.
It is crucial that you not only have a solid idea of your ROI proposition but also work on how you will communicate this to the stakeholders in the healthcare institutions you will need to work with. It will not matter if your model is 99% accurate if you cannot or do not convey how this moves the needle in practical terms.
For instance, when we converse with medical providers and administrators, we don't discuss vague outcomes or intangible benefits, even though these areas hold considerable potential. Instead, we focus on concrete results. For Omni Impressions, we highlight quantifiable reductions in fatigue and increased accuracy in reporting. For Rad AI Continuity, we emphasize measurable patient follow-up rates and real-world patient outcome statistics where Continuity played a key role.
Developing ML models for healthcare is a multifaceted endeavor that goes beyond technical achievement and requires a highly multidisciplinary team. Your ROI pitch should be grounded in tangible benefits like improved patient outcomes, operational efficiency, and financial savings. Further, the more you are able to tie this to quantifiable outcomes and existing industry KPIs, the more straightforward your conversations will be. A significant caveat to this, though, is that you should not feel limited to these metrics; there are, of course, a multitude of benefits that will be just as, if not more, impactful but difficult to quantify.
Build with innovation partners.
Once you are at the point of having end users engage with some version of your product, you want to start looking for and building with individuals and teams that are not just end users of your product but are willing to innovate alongside your team. The part that makes all the difference in the world here is engagement—the vision of the potential benefits of what your project could be and not necessarily what it currently is.
The building of these applications tends to be a very iterative process. Therefore, you will want people who will not only have some patience for the inevitable pitfalls as biases and edge cases are worked through. The trade of here is you are asking for additional help from your end users who can not only point out things you have not thought of but also provide ample amounts of constructive feedback to help hammer your product from a dented piece of metal to a smooth and shiny offering that provides many more upside benefits than the original prototype.
This process is exemplified by several radiology groups that have been partners of Rad AI for a long time, some since the company's earliest days. The radiologists we collaborate with not only act as end users but also help us tailor our tools to be as effective and purposeful as possible. We actively engage with our users and value their feedback, striving to make the development process beneficial for both parties.
While this refinement process can occur without such engagement, and you should only offload some of the debugging process onto your end users, a more collaborative approach in the iterative process can accelerate progress on core issues and result in a superior end product.
Test and monitor thoroughly.
Thorough testing and validation are non-negotiable. This process begins with offline validation and should ideally include human validation to assess the application's real-world efficacy. The advent of generative AI (GenAI) introduces the need for innovative performance monitoring metrics, necessitating creative approaches to assessment.
In our deployments, we utilize a wide range of both offline and online metrics to maintain quality and consistency. Besides traditional text-based metrics like BLEU and ROUGE, we've also implemented our own custom metrics. These focus on the content and meaning of the text generated by our models, rather than just token-level accuracy. Creating custom metrics is a domain-specific and time-consuming process, but it can be a worthwhile investment. This is especially true for generative models, where more traditional metrics may fall short.
Deployment is just the beginning. The real test comes when the application faces production data. Monitoring performance necessitates a blend of qualitative feedback and quantitative metrics, focusing on the solution’s performance and the synergy between the AI and its human users. Adjusting to feedback and data drift is an ongoing process essential for the application's evolution.
Embrace Human-in-the-Loop systems.
While the allure of full automation is strong, the reality of healthcare demands a much more grounded approach. Human-in-the-loop systems represent a balanced integration of AI and human expertise, capitalizing on the strengths of each. These systems have proven to be the most effective in scenarios requiring complex reasoning or nuanced judgment. From both ethical and practical considerations, clinicians are still ultimately responsible for how they choose to use AI-powered tools. As powerful as they are becoming, these new features are still just that: tools. These systems should be looked at as additional tools used as input to the overall decision-making process, not something that is meant to supplant it.
We've deployed numerous ML driven applications and features. However, most still incorporate human input within the workflow. Despite their capabilities, modern language models aren't yet ready to operate independently. For instance, in Omni Imressions, a radiologist still reviews and signs off on the impression. In Continuity, a medical professional ensures the validity of patient follow-ups. While some argue that this approach limits the potential of ML augmented workflows, the success of Rad AI and many others in the industry proves otherwise. These ML-augmented workflows continue to offer significant benefits relative to baseline.
Expanding on this, we also establish systems that interpret human interaction with our model outputs, which can then be used as direct human feedback. Beyond just analytics, this approach enables you to feed production data back into your models. This allows for ongoing improvement, ensures data remains current, and minimizes issues with data drift and models becoming outdated.
There are many additional considerations such as appropriate transparency on limitations, training on how to use, and the potential limitations that should also be considerations in the deployment of AI-powered technologies both in and outside of the healthcare domain.
The responsibility for appropriate and responsible implementation of these powerful tools does not rest on any one party, as with so many other areas of society. This burden is a shared commodity. The people involved at all stages of the process, ranging from developers, clinicians, regulators, and, in some instances, the patients, should all be vigilant to how these autonomous systems are changing how we approach healthcare (Herrington, 2023).
Conclusion
Returning to the example of a screening solution for diabetic retinopathy, the solution needs to be introduced at the right steps in the patient journey (at times where retinal images are being acquired, or being made available in retrospect). Deployment and training needs to be done with specific clinical stakeholders (potentially optometrists, ophthalmic technicians, nurses, primary care physicians or Physician Assistants (PAs), depending on the complexity of product training and use, and the route of introduction into the clinical workflow). Product and data security and privacy need to be ensured, alongside any necessary clearances for that specific geographic market.
The production data used for the screening solution needs to be readily available in a consistent format effectively represented by the ML models’ training data, or acquired specifically for the purpose of screening. Clinical stakeholders and executive leadership at optometry offices, primary care offices and/or health systems need to buy into the necessity and benefit of adoption—and will likely need to see specific measurable ROI beyond the improvements to patient care.
Incorporating artificial intelligence into the healthcare sector comes with its set of difficulties, but it also offers significant opportunities for important changes. While developers must overcome these hurdles, which might involve acquiring new knowledge and adopting novel approaches, successful outcomes generally justify the additional effort involved.