DeepCAT, a RAIL project, was presented by Dr. Paul Yi at SIIM C-MIMI 2019 in Austin, TX this past weekend. The presentation was covered on AuntMinnie, the article of which is below. This project was led by Drs. Lisa Mullen and Susan Harvey of Breast Radiology and Dr. Greg Hager of the Malone Center for Engineering in Healthcare, with primary technical work performed by Mr. Dhananjay Singh.
The DeepCAT team gratefully acknowledges the support of the Johns Hopkins Discovery Award for funding this project!
By Erik L. Ridley, AuntMinnie staff writer
September 23, 2019 — AUSTIN, TX – An artificial intelligence (AI) algorithm could cut in half the number of screening mammograms that need to be interpreted by radiologists by directing them only to cases most likely to contain cancer, according to research presented on Sunday at the Society for Imaging Informatics in Medicine’s (SIIM) Conference on Machine Intelligence in Medical Imaging (C-MIMI).
A team of researchers led by Dr. Paul Yi of Johns Hopkins University in Baltimore developed and tested a deep-learning algorithm called DeepCAT (computer-assisted triage). The group found that the DeepCAT algorithm offered the potential to significantly reduce the number of screening mammography exams that needed to be read by radiologists and also enable priority review of studies likely to have malignant masses.
“Our system could potentially reduce mammography workload by approximately one-half and prioritize studies with malignant masses over benign and normal exams,” Yi said.
Augmenting the radiologist
Breast cancer screening with mammography for all women older than 40 years has placed radiologists under increasing pressure. In addition to there being more studies to interpret, there are more images to read due to the growing utilization of digital breast tomosynthesis (DBT) exams, according to Yi. Furthermore, the shortage of radiologists has become a growing problem worldwide, particularly in developing countries, but also, for example, in the U.S. and U.K., he said.
Deep learning has shown promise for enhancing the detection of breast cancer, but workflow improvements would be a lot more relevant for augmenting radiologists, according to Yi.
As a result, the researchers sought to develop DeepCAT to perform triage of mammography exams based on suspicion of cancer. In particular, they wanted to evaluate two approaches for augmenting radiologists: using AI to discard images unlikely to contain cancer and also to prioritize images that are likely to contain cancer.
They trained DeepCAT using the Digital Database for Screening Mammography, a dataset of 1,878 2D mammography images gathered from four U.S. hospitals. Of these images, 1,438 (77%) contained masses and 440 (23%) were normal. All cases were pathology-proven, and the masses were segmented by radiologists. For their AI project, the Johns Hopkins researchers randomly divided the exams into training (55%), validation (13.5%), and testing (31.5%) sets.
DeepCAT consists of two key elements: a “mammogram classifier cascade” to find image features — such as architectural distortions — that are indicative of cancer, as well as a “mass detector” algorithm that is tasked with finding discrete masses.
The mammogram classifier cascade is based on two ResNet-34 based classifiers that were pretrained on the ImageNet database. One classifier is weighted toward finding malignancy in order to maximize malignancy recall and nonmalignant precision, Yi said. The second classifier takes a more balanced approach to maximize accuracy. Meanwhile, the mass detector is a RetinaNet-based detector with a ResNet-50 feature pyramid network. Its job is to detect background of the breast, benign masses, and malignant masses on the images, according to the researchers.
After receiving a stack of mammograms, the malignancy-weighted classifier analyzes the images and then removes the normal studies. Next, the balanced classifier and mass detector both review the remaining exams and produce an overall exam priority score for the radiologists.
The researchers observed a theoretical decrease in workload due to DeepCAT’s discarding of normal studies. Of the 595 testing exams, DeepCAT would have discarded 315 (53%) as normal and none would have contained a malignant mass, according to Yi.
The researchers also evaluated the algorithm’s performance in prioritizing studies by calculating a priority ordinal distance, which represents the average of how many exams would have to be read before the radiologist gets a suspicious study. The best-performing configuration of DeepCAT had an average priority ordinal distance of 25.5 — indicating that a radiologist would have to read an average of approximately 25 mammograms before encountering a suspicious exam, Yi said.
These types of AI triage systems could free up time for radiologists and enable them to focus the most attentive hours of their day on suspicious images, he said.
In the next phase of their work, the researchers plan to apply DeepCAT to DBT exams and perform prospective validation of the algorithm, Yi said.
In response to an audience question about the medicolegal implications of having a radiologist not read half of the mammograms, Yi acknowledged that this approach wouldn’t currently be advisable in the current U.S. healthcare system. However, he also pointed out that a diabetic retinopathy screening system has been approved by the U.S. Food and Drug Administration (FDA) that doesn’t require physician review for most of the studies.
“So we’re already seeing that this is not just a theoretical future, but it’s actually here,” he said.
Other countries that don’t have enough radiologists may be better candidates to utilize DeepCAT technology initially, according to Yi.