Age, Race Affect Accuracy of Mammogram Reading by AI
Black women and older women were more likely to have false-positive results than white women and younger women when an artificial intelligence (AI) program read their mammograms, according to a study published in the journal Radiology. A false-positive result means the AI program labeled an area as suspicious, but it turned out to be not cancer.
Key takeaways
Mammograms from a diverse group of more than 4,800 women were read by an AI program after doctors read the mammograms and determined that the women didn’t have breast cancer.
The AI program was more likely to return false-positive results for mammograms from Black women than mammograms from white women. Asian women were less likely to have false-positive results than white women.
Older women – ages 71 to 80 – were more likely to have false-positive results than women ages 51 to 60. Younger women – ages 41 to 50 – were less likely to have false-positive results than women ages 51 to 60.
Women with extremely dense breasts were more likely to have false-positive results than women with mostly fatty breasts.
What the results mean for you
More mammogram facilities are using AI software to help radiologists read mammograms. In these settings, the AI reading is one of several pieces of information a radiologist relies on to interpret a mammogram. A 2023 study showed that using AI to read mammograms found 20% more cancers than radiologists found. The current study highlights potential drawbacks of using AI to read mammograms.
AI mammogram programs are developed using large databases of images showing what is and what isn’t breast cancer. As the researchers who did this study noted, few databases contain information from a diverse group of people, and the U.S. Food and Drug Administration, which approves AI programs for use in cancer detection in the United States, doesn’t require that developers use diverse data sets.
“This study shows us that AI is still not very good at diagnosing breast cancer and that radiologists are still crucial to breast cancer detection,” explained Meredith Broussard, a data journalist and AI researcher who is associate professor at the Arthur L. Carter Journalism Institute of New York University. Broussard, who has also been treated for breast cancer, is also the research director at the NYU Alliance for Public Interest Technology.
About the study
The researchers selected 3D mammograms from 4,855 women who had screening at the Duke University School of Medicine between January 2016 and December 2019. All the mammograms showed no evidence of cancer, and the women had no history of breast cancer.
The researchers made sure the mammograms came from a diverse group of women, including 28% who were Asian, 27% who were white, 26% who were Black, and 19% who were Hispanic. Most of the women had BI-RADS category B breast density (scattered fibroglandular densities), but 9% had category D (extremely dense breasts).
The researchers used the ProFound AI 3.0 program, which is FDA-approved, to read the mammograms. The program gives each mammogram what’s called a case score ranging from zero to 100. The case score is the likelihood the mammogram shows cancer. Mammograms with a score of 50 or higher usually require additional imaging or other testing. The closer the score is to 100, the more confident the AI program is that the mammogram shows cancer.
The AI program also calculates a risk score for each mammogram that takes into account a woman’s breast density and age. The risk score is the likelihood that a woman will develop breast cancer in the next year. Mammograms with a risk score greater than 0.8 have the highest risk of the next mammogram showing cancer.
Detailed results
The AI program classified 816 out of the 4,855 mammograms (17%) as suspicious, meaning they had a case score of 50 or higher. The researchers called these false-positives.
Compared to white women:
Black women were 50% more likely to have a false-positive result
Asian women were 30% less likely to have a false-positive result
Compared to women ages 51 to 60:
women ages 41 to 50 were 40% less likely to have a false-positive result
women ages 71 to 80 were 90% more likely to have a false-positive result
The AI program gave 240 of the 4,855 mammograms (5%) a risk score of greater than 0.8. The researchers also called these false-positives.
There were differences in risk scores by race and ethnicity, age, and breast density:
Black women were 1.5 times more likely to have a false-positive result than white women.
Women ages 61 to 70 were 3.5 times more likely to have a false-positive result than women ages 51 to 60.
Women with extremely dense breasts were 2.8 times more likely to have a false-positive result than women with mostly fatty breasts.
“If radiologists normalize the adoption of AI recommendations based on white patients — the largest demographic group in the United States — then they risk inappropriately higher recall rates for Black patients,” the researchers wrote. “This has the potential to worsen health care disparities and decrease the benefits of AI assistance. The Food and Drug Administration should provide clear guidance on the demographic characteristics of samples used to develop algorithms, and vendors should be transparent about how their algorithms were developed. Continued efforts to train future AI algorithms on diverse data sets are needed to ensure standard performance across all patient populations.”
Learn more
If you want more information about how your mammogram was interpreted, ask your doctor or someone at the center where you had your mammogram.
Listen to the episode of The Breastcancer.org Podcast featuring Broussard discussing bias in healthcare AI.
Nguyen, D., et al. Patient Characteristics Impact Performance of AI Algorithm in Interpreting Negative Screening Digital Breast Tomosynthesis Studies. Radiology 2024 311:2.
— Last updated on December 27, 2024 at 7:17 PM