Episode 1023 • 26 min

Artificial Intelligence for the Clinician Episode 5: Are Radiologists Out of a Job?

0:00

26:13

Overview

Welcome back to the AI journal club! In this episode, we bring you a deep dive into a game-changing paper from The Lancet -- the MASAI study. This is the first randomized controlled trial to evaluate the use of artificial intelligence in breast cancer screening and we're so excited to discuss it.

We'll break down the study's impressive findings on interval cancer rates, sensitivity, and massive workload reductions for radiologists. Beyond the data, we'll tackle the big-picture questions and some sensational recent headlines. Are we deploying AI too fast? Or is it time to go faster?

Hosts:
- Ayman Ali, MD
Ayman Ali is a Behind the Knife fellow and general surgery PGY-4 at Duke Hospital in his academic development time where he focuses on data science, artificial intelligence, and surgery.

- Ruchi Thanawala, MD: @Ruchi_TJ
Ruchi Thanawala is an Associate Professor of Informatics and Thoracic Surgery at Oregon Health and Science University (OHSU) and founder of Firefly, an AI-driven platform that is built for competency-based medical education. In addition, she is the Director of the Surgical Data and Decision Sciences Lab for the Department of Surgery at OHSU and Associate Program Director for the Clinical Informatics Sub-specialty Fellowship.

- Phillip Jenkins, MD: @PhilJenkinsMD
Phil Jenkins is a general surgery PGY-4 at Oregon Health and Science University and a National Library of Medicine Post-Doctoral fellow pursuing a master’s in clinical informatics.

***Fellowship Application Link: https://forms.gle/QSUrR2GWHDZ1MmWC6

Transcript

welcome back to the Behind the Knife Artificial Intelligence series, and specifically our journal club. I'm Iole, a general surgery PGY four at Duke Hospital, and I'm really excited about this episode today because we're gonna discuss a landmark randomized control trial in AI and breast radiology that was recently, published in The Lancet. This is the first artificial intelligence randomized control trial within the space of breast radiology. And so it's. Really important to discuss what it means, how it was done, and whether or not this can be implemented. Before we go further into the study, I wanna introduce my co-hosts. They're both from Oregon Health and Science University. I have Dr. Phil Jenkins and other PGY four, and then Dr. Ruchi Awa, who is a thoracic surgeon at OHSU. So let's just jump into the study. The study was called the Messiah Study. I'm not sure if I'm pronouncing that as the author's intended, but it's the mammography screening with artificial intelligence, and it was published in The Lancet a few weeks ago. Again, it's the first randomized control trial that looks at the

effect of ai. On breast cancer, and specifically what the authors did here is interval cancers. So they deployed this in Sweden and there were about a hundred thousand women randomly assigned to their intervention and control groups, and all of these women had at least a two year clinical follow up in the intervention group. The way the AI was used was as a triage,. It initially triaged a mammogram to either a single or a double reading by radiologists. Whereas in the control group, all mammograms were double read, meaning that they were read by two separate radiologists, which is the current gold standard. So first, Dr. Awa, what do they mean by interval cancer? Their primary outcome, and why didn't they just look at how many cancers they found? So this is a great question. If you just wanna find more cancers, you can just lower your threshold and biopsy everyone, but that's really not gonna help. You'll find cancer, but you'll also harm healthy people. Interval cancers are the cancers diagnosed between

screenings. These are the ones, the screening miss. They're usually aggressive and have worse outcomes. I like to think of 'em as the sneaky ones. Um, we don't yet have a handle on their growth pattern other than the two time points. And really the question is, when did they pop up and could we have done something about it earlier? Yeah, and the way that this study worked again is that the AI was a triage. But this triage was a little bit more nuanced, so it had a score from one to 10. If it gave a low risk score, which they defined as one to nine, it only went to one radiologist, and if it was a 10, which they regarded as very high risk, then that mammogram did still go to two radiologists. And the AI flagged the suspicious areas on the screen. But a very important part of this study. That the marks that the AI flagged were not shown to the radiologists. They did receive them after they did the case reading, but not before and not during. So, Phil, what were the results of this study? Basically, it worked. The cancer interval rate

was 1.55 per 1000 in the AI group versus 1.76 in the control group, so statistically non-inferior. Overall, the AI group found 338 cancers with. 82 intervals and the control group, 262 cancers and 93 intervals. This defined the sensitivity and specificity with statistically higher sensitivity in the AI group, about 76 more cancers and 11 fewer intervals and equal specificity. What's really fascinating though, is that the AI group, there was a 44.2 reduction in screen readings, which is huge. So the AI group was non-inferior, potentially more sensitive and required half as many radiologist reads. Yeah. And I, and I really wanna emphasize that point that people are interpreting this study really as an alleviation of radiology burnout, which I tend to agree with.

I think that what this study is high, that is that you're getting radiologists to a point where they're reading at the top of their license, or they're focusing on biopsies, so they're spending less time looking at all of the normal,, mammograms. And I, I, I hope that that really helps alleviate some burnout. I think another key thing about the study is that the way the AI tool was interjected into the workflow was in parallel to radiologists. Like the AI tool really functioned somewhat independently compared to the workflow being interjected into directly what the radiologist is doing in real time. I think what this does is allow the AI tool, the software. To work at the top of, its so-called license, if we wanna think about it that way, and have the radiologists do that as well, which I think is the optimal pairing because you know, as we continue to think about these things, how these tools actually interface with who we are as humans is just as important as the work that it's

doing and the work that we're doing. Additionally, you know, I'm in what you're saying is. The radiologist being able to function at the top of their license., The AI software was proven to be exceptionally good as as a screening tool, as it had a high degree of sensitivity. And the radiologists are able to be incredibly specific 'cause they're going into this, you know, the things that scored high, the mammograms that scored that 10 is what came to them. So they know that they're primed to help figure out like. Is there a cancer? What is this? Cancer without the clear interjection of the markings, I think this paper shows us not only a great outcome in, in the study, from the stats standpoint, from the results, but also an exceptional model of how to introduce these AI softwares into clinical workflow in a way that's actually,, allowing everyone to function optimally and have exceptional outcomes. Yeah, it's a fantastic study and I, and I think something that we should comment about is that there may

be a tendency for people to say that they may have a higher sensitivity in the AI group, and that may lead to over-diagnosis, over biopsies. And do you think that that is the case here? Looking at the data, the increase wasn't just in low grade cancers like DCIS that we've seen, you know, with previous cad, um, softwares and, you know, papers that are published on those results. What they found were more invasive cancers, and these are the cancers we actually wanna do something about, specifically more of the aggressive subtypes, what they call the non luminal A cancers. These are the tumors that really kill people if you miss them. So this looks like good detection. Just not finding noise. This is also supported with fewer interval cancers. Although there isn't a statistical difference and the study was not powered for these metrics, I think a pretty good result overall. Yeah. Phil, what, what other thoughts do you have about this study that maybe we should talk about here? I. You know, we should probably mention lead time bias, whether or not this would be any change in

mortality over 10 years., So we'll kind of start with that. So I think one thing to think about, and you know, the authors are very forthcoming about this, is the population. Were Swedish Womens generally a highly homogenous group. I think one thing to look at in the paper is, you know, we, we think about training data. How does this software tool actually do what it's doing? How did it learn to know what to look for now? The training data came from 10 different countries, um, broad population, but its application was in a homogenous group. And when we think about training data and its application dermatology, so many of these other computer vision based applications, we know there can be challenges when you get into a very heterogeneous population. So I think a future, future place for this to go is to actually take this work into a highly heterogeneous population and see how it performs. Yeah, and I mean, my, my impression personally is, you look at how they trained many different countries and then you look at their results and they never had a

radiologist out of the picture. So even those,, one-time screened images by the ai, those still got read by a radiologists. So there is a degree of safety here, and I think that looking at this study, it, it, it seems like something that we. Can and should consider deploying,, especially in a place like the United States where we, we, we desperately need to work on our screening as, as a general principle. And this can really help us address some of those vulnerable populations and really increase the amount of people that we can save and the amount of cancers that we can,, catch. I, again, see no reason why this should not be at least. Trialed in the United States and I, and I think that this study is a powerful enough one that it should lead to change. And I think one of the things we're going to talk about in the next article as well,, that was brought up is, is the training piece. So in this study it was a textbook way of, no one had to change their training to adapt to a new tool. And I think that's really important., And no other

profession that I've encountered. I was a pilot for a long time. Did we get a new tool? Put upon us that we weren't trained on how to use. And I think that that is really important. And I think it shows here that being able to operate at the top of your license also is operating within your same training scope. Um, and I think that'll lead really well into the next article. Yeah. So the, the next articles that we wanted to chat about here are, not research studies necessarily, but these are articles that have been circulating like wildfire throughout mainstream media. A a, a few, a few weeks ago, around the same time as this,, cancer screening article was published, Reuters published a investigation. About errors with specifically they focused on the TruD navigation system. It's the system used in ENT surgery., And the FDA's received, many malfunctions. And again, I'm not saying that. It is statistically higher than it should be or has caused any statistical benefit. But it's important to address this because, I mean, it's addressed in the public

and things that are addressed in the public, I think we should be aware of as physicians,, because our patients will ask us about Now the, the conversation that it opens up is that. Well, this particular system is only 80% accurate and it's an overlay. It's an anatomical overlay, and, and especially with anatomy, I get very hesitant about AI deployment because they're not regulated in the same way. They don't need to go through a same rigorous clinical trial screening. Um, and you can have these overlays, which a lot of companies are shipping because they're easy to do. But that accuracy component is very important, and an overlay is not benign. So the way that I see it anyway is that it can obscure things like, um, a, a a, a feeding vessel or or, or just a small little flap on an artery. Things that you wouldn't otherwise miss, but that you may miss with the technology that's supposed to, uh, help you. So I just wanted to get both of your thoughts about that, because I think that this does open up a really good conversation.

So I think. When, when I read this article, I think of multiple levels at which we could probably do better and we can learn from what's happening. First is how do these tools get into the operating room? What, what is our FDA process and really what the influx of these technologies are. Are we able to keep up with them? Second, that you bring up Iman, is um, how do these tools actually affect us as surgeons, as we're actually doing the work? What is the interplay between what we're thinking and what we're seeing? How are the human computer interaction studies done on how this actually affects our workflow in real time? And you know. Where are these issues coming up and what is the root cause? And I think there is a lot of exploration that's needed. And the question is, can we actually keep up with the studies that are necessary to show what the root cause is based on how quickly these are coming at us in the operating room in real time.

So those are the things that I'm thinking about and I think would be really good to talk about. Yeah,, I, I just,, think it's important that surgeons always ask questions about what tools they're given,, especially when those tools are, are on top of an interface that you're normally accustomed to. So any changes in that, and a, again, coming back to the MESSAI trial, this is what it did very well. At no point did the radiologists in this study actually have an interaction with the ai. It was a triage. It was a triage, and even when it overlaid, it did not provide those overlays until after the read. Again, in the future that may not be necessary, but as a trial design, I think it's beautiful and it's, very important,, because it just fits in very naturally. As you were saying before, Phil, what do you think? And I think this hits a lot of points that we've been making that are not just in the age of ai, but you know, we talk about regulation., Who generally regulates these things is the industries themselves in

this country., You know, like it or not, that's. The majority of it, and that has changed even more in favor of industry over the last few years. So we're gonna be dependent on industry to regulate itself more likely. There's a lot of, like you said, there's very little kind of oversight before these products get rolled out, but there's a lot of. Variability in their capabilities, and I don't think there's probably a lot of training going on. Um, I think a lot of it is try this new thing, look at does these cool stuff., In the article themselves, they talked about an ultrasound that can mislabel anatomy very easily,, known, but you know, because it wasn't, you know, directly involved with. With patient,, bad outcomes. It doesn't get a lot of publicity, but these are the, the, the types of things. And then it kind of also comes into the, um, piece of conflict of interest, right? We already see this with other devices., And you know, as we start talking about software as a medical device, our people

having economic vested interests in this, how is that being disclosed? And this is all changing at a rate at which I think. Legislatively, we'll not be able to keep up. We can barely keep up, uh, technologically. So I think it'll be really interesting to see how that goes. Bill, I'm gonna anchor on to the, the software part. And we talked about this in some of the other podcasts that we've had about this. So,, I think the distinction between software and hardware implementation in the operating room, like this comes back to fundamentals. This is not about. Artificial intelligence software can be as simple as like a statistical tool that just has an interface that we use in the operating room all the way to something that functions fairly independently. The thing is, is that software evolves invisibly, whereas hardware doesn't. Like when we hold a stapler or clamp in our hands, we know if the shape of it has changed, the weight of it has changed. But when software changes. We are not trained to detect that. And going back to training, how do we actually train

surgeons and healthcare providers in the operating room to know if these tools are working or not? And I think that's where the crux, a lot of the safe application of these tools actually are. Like, how does the surgeon know if this anatomy that's being highlighted is correct or not? We fall back on our own knowledge, but now we're having to push back against this interface that is. Trying to help us, but may not be helping us because as we see in this article, sometimes, you know, the accuracy is 80%. Really, that's not something we as surgeons would find acceptable in our own hands, but as software evolves in invisible ways, how do we push back on that? We need to be trained. Then there is the question of automation bias. This comes up all the time. Humans, we are all subject to automation bias, and unless we're aware that we might be lead being led astray, we fall into these traps. And it's not that. The surgeons, the developer of these tools are not well-meaning the intention is good. This

is just a complicated environment and a complicated problem. And I think it just requires a lot more training and,, intentionality in the application and looking at the results and kind of like finding the right way to do this instead of just deploying them into the wild. Yeah, I mean, those are all excellent points and I, and I will say one more time that., Just because Reuters published this article, I, I don't know if statistically those complications are any higher, but we should all learn anyway from the conversation that's going on regardless of whether or not it is truly a statistical difference or not. I do think the problem. Does exist now. I do. I wanna get both of your thoughts on one final point because this article came out last week. So on March 25th,, Dr. Katz, the president and CCEO of New York City Health and Hospitals spoke at this panel,, and, and, and said that he really wanted to replace a, his exact quote is, we could replace a great deal of radiologists with AI at this moment. If we are ready to do the regulatory challenge Now,, the, the

way that I interpreted that is I thought it was unfortunate that he used the word replace. I think that if he had stated we could redistribute or reallocate radiologists or redirect efforts, you know, I think it would be a much more. Um, palatable, uh, uh, Sensa headline. I think that the word replace was probably too much, but to his point, I, I see no reason after looking at a study like the Maasai study, why you shouldn't deploy. Breast screening massively. At the same time though, the reason I have an issue with the word replace is because there are so many other things a radiologists could do, and we do not have an algorithm to look at every single thing yet. We don't have an algorithm for a radiologist to do a splenic artery embolization. So my, my point is just, at least when I read that, I think that he has a point. That AI may actually be safe or may actually screen more people for particular cases. And it, I just, I personally found it, unfortunately, he used the word replace. I just think that jobs might change but, we're

not quite to the point where every task is automated. But what, what are your thoughts about that? 'cause I think that is also gaining quite a lot of public and, and, uh, interest in the medical community. I think,, I agree that. I think the word replace is probably a little strong, I would say support. You know, what remains true is there is the work at hand When you're looking at imaging, I'm a thoracic surgeon, not a radiologist, but, you know, projecting onto what their, their work is. And then there's the clinical context, like the radiologist is able to, you know, go to the chart and understand, you know. What is it about this cancer that I'm looking for? Like, what is it about this patient? So I, I think that these tools have an opportunity to support radiologists, to support physicians. And,, then what is left is what humans do well, which is integrating large tracks of information and, you know,

pulling, pulling out context and putting that together. And, you know, the tools can work together, human and ai. Not to beat the aviation analogies to death, but I'm known to do that. But you know, airplanes do a lot of things automated, right? Like, but yet we still pay two people that are trained to fly the airplane. A lot of money to sit up there and watch the plane mostly fly itself. Right. That has been a model for safety for a long time. So although medicine is confronting levels of automation at a new rate, I think that there's a lot of industry history that's going to support from and, and the other thing too. If we could get through the regulatory hurdles as if that was a, a little bump in the road and not a mountain,, to quote, get through., So I think that that's part of the, a little bit of this is the AI hype cycle. I think we see it from the, the, the titans of the industry talking about their predictions of mass layoffs

and,, all of this sort of stuff because it's gonna improve their shares. Right. So like I, I understand from, from that standpoint. I don't think radiologists are going away. There's a reason why they're highly paid for what they do now., They're not, um, they're overworked. I think the idea that this is going to supplement and help and assist and take your radiologists from. Good to great productivity is much more likely than them just being replaced., I think that is a, and if looking through history, that is typically how these sorts of things go., Now this is a different level. I agree. And I, I am very poor at looking into the future, but I think that,, picking up on the word replace is very, you know, it's very buzzy, but I, I don't think it's realistic. Yeah. And I, I, I just, I really like the point Dr. Th said before about clinical context. I mean, how often do we call the radiologists as, uh, you know, surgeons, residents? How often do we call radiologists and say, look, I understand that we see this, but in this context, I really want you to look at this

particular piece, which,, you may not have commented on. Now if AI learns to pick up on everything else,, but doesn't comment on that one thing we're really curious about. Then you, I mean, you still need a radiologist, right? So I, I, I think that we do that a lot in practice, you know, at least on my consult shifts, it's every other day,, that I'm calling radiology and really trying to work through an image. So I do think that part's important. So I guess it's time for final thoughts. I think my,, Allstar first, I think that this is a brilliant study, a randomized control trial, and I really hope that going forward we see more of these. I would love to see these types of randomized control trials for a lot of different AI applications,, to this scale, because I think we're there., That being said, I, on, on the, on the other hand, I would also like to see these before things are deployed. Uh, so that's, that's how I kind of feel about it and I think that the more we are vigilant about that. Process, the more we can avoid these types of articles like that Reuters, that really

points out when we, we deploy incorrectly. So those, those are my final thoughts. Yeah. I think, you know, kind of building on that, um. What we're also doing is building public trust in the safe application of these AI tools. And when we have strong papers like the Messai paper, that gives us really good results, but also a model in which to safely implement these tools in, in clinical workflow, this helps physicians and patients know that these tools are gonna be helping us., And, and trust is a big part of this. And I think, you know, when we, we have these findings that are. Published in Reuters, in the Reuters article or otherwise., What we're trying to do is regain trust. Yeah. And I, I think that the. Big takeaway., I think the amazing thing about BTK doing this entire series is I think just goes to show you that the clinicians are not gonna be able to avoid

this technology. It is going to be part of your life one way or another. And the onus will always be on the physician. So I think that both these studies have a great way of showing you the, not studies, but there's a great way to do a study. There's the, what I would say, nearing a gold standard way of doing these type of implementations. And then you have a, you know, a more of a. Public, kind of a publicity piece that shows what happens when it doesn't go that way. So it's a really nice balance and I think that it's just important for us to continue to have these conversations,, and that the physicians out there listening that are interested and curious, continue to ask the questions,, and develop this phone to knowledge because it's going to be part of your practice., It already is. Alright, well thank you all again for another episode and.