Towards improving the visual explainability of artificial intelligence in the clinical setting
BMC Digital Health volume 1, Article number: 23 (2023)
Improving the visual explainability of medical artificial intelligence (AI) is fundamental to enabling reliable and transparent clinical decision-making. Medical image analysis systems are becoming increasingly prominent in the clinical setting as algorithms are learning to accurately classify diseases in various imaging modalities. Saliency heat-maps are commonly leveraged in the clinical setting and allow clinicians to visually interpret regions of an image that the model is focusing on. However, studies have shown that in certain scenarios, models do not attend to clinically significant regions of an image and perform inference using insignificant visual features. Here, we discuss the importance of focusing on visual explainability and an effective strategy that has the potential to improve a model's ability to focus more on clinically relevant regions of a given medical image using attention mechanisms.
Artificial intelligence (AI) and deep learning have the potential to significantly impact clinical decision-making and healthcare workflows . As more AI systems are being translated into the clinical setting, we are uncovering both the strengths and limitations present in human-centered design and the interactions between algorithms and healthcare providers . A major impact of AI-based advancements is occurring in medical image analysis . Medical image analysis refers to the study of applying deep learning techniques to classify or detect disease in a wide range of medical imaging modalities (e.g. x-ray, MRI, CT). The main objective of these systems is to hasten the time to diagnosis, democratize access to imaging specialists, and lower the overall workload of physicians by reducing manual time-consuming processes. Research in the area of medical image analysis commonly leverages state-of-the-art developments in the general field of AI and involves applying these techniques to specific clinical problems. In recent years, these systems have become increasingly capable leading to the wider adoption of AI-based imaging diagnostic tools at the point-of-care and the rise in approval of AI-based medical devices by the US Food and Drug Administration (FDA) . When deploying algorithms to provide diagnostic assistance to clinicians, it is important to enable explainable insights in terms of how the algorithm is performing inference. When AI-based medical image analysis systems are deployed, they commonly provide the following to the user: a diagnostic prediction, percentage of confidence (or level of certainty), and a heat-map or visualization of which parts of the image the model used to perform inference. Typically, increasing the accuracy of AI is the highest priority during development stages. However, understanding the visual explainability of algorithms through a clinical perspective is important in ensuring usability and reliability. Here, we discuss the importance of visual explainability in healthcare, current limitations, and potential strategies which can be employed to improve the reliability of AI when deployed in the clinical setting.
Visual explainability of medical AI
Saliency methods that generate heat-maps are commonly leveraged in medical image analysis, allowing clinicians to visualize regions of importance that a model is focusing on in a given image . When these models are deployed in a clinical setting, the ability for a clinician to fully interpret results in a transparent manner is of high importance. Solely focusing on a model's predicted diagnosis limits the confidence in the model for clinical decision-making and lacks visual interpretation. When a clinician traditionally interprets a medical image, they are able to apply cognitive attention to specific regions of the image which are relevant to their diagnosis. The ability to assess the importance of features in a medical image is essential as it allows the clinician to make a final diagnosis based only on relevant information. This cognitive process of attention allows the clinician to disregard unnecessary features using prior knowledge ensuring that the diagnosis is based on regions of an image that are important to the specific clinical task and target imaging modality. Studies have shown that medical image analysis models tend to evaluate features more generally when compared to clinicians which can lead to predictions based on irrelevant regions of an image . In one study, it was discovered that a model was using insignificant pieces of text in an x-ray to make predictions regarding COVID-19 rather than clinically significant regions . This is a major limitation of medical AI which can be a significant risk in the clinical setting as it can lead to false interpretations. Replicating the cognitive capability of attention within AI systems is of high importance as it has the potential to improve not only performance but visual explainability in the clinical setting. Finding, developing, and validating strategies that can tune algorithms to perform inference similar to the cognitive process of clinicians can enhance AI interpretation in the clinical setting and lead to more reliable decision-making.
Attention for medical AI
In the general field of AI and deep learning, enabling models to focus on more relevant regions of an image has been studied. A notable strategy used to improve the “focus” of models is attention mechanisms. Attention mechanisms aim to replicate the cognitive capability of attention in terms of deep learning. For general image classification purposes, popular implementations include squeeze-and-excitation (SE) , convolutional block attention module (CBAM) , global-context blocks (GC) , among others. Such mechanisms use various operations to tune a model to focus on features in an image that are semantically important in a fully data-driven manner. Across various general AI benchmarks, these mechanisms have significantly improved the relevance of saliency heat-maps to a wide range of specific contexts. Additionally, these mechanisms can be simply integrated within the common image classification model, the convolutional neural network (CNN). Studies have applied attention techniques to CNNs for medical image analysis tasks. One study showed that by applying attention to skin cancer classification, the performance of standard algorithms can be improved . Another study showed that by using attention mechanisms, models can learn to suppress irrelevant regions of medical images from saliency maps . Additionally, in the area of medical image segmentation, attention-based approaches are being leveraged to improve the focus of models toward relevant fields of view when segmenting skin lesions and more . In our study, we aimed to understand the clinical implications of attention and to answer the question “Does attention improve the visual explainability of medical AI?” from the perspective of practicing clinicians . We trained a common image classification model (ResNet-18 ) across the x-ray and dermatological image modalities, with and without the use of various types of attention (SE, CBAM, and GC). Then, we visualized saliency heat-maps from the baseline model and subsequently each attention-augmented variant across medical image samples. We then anonymously consulted clinicians in each respective medical domain and presented the samples as shown in Fig. 1 in a randomized manner without specifying which samples were derived using attention . The goal of this observational study was to understand if attention improves the “focus” of models in terms of using clinically relevant regions to perform inference and if this improves visual explainability.
Evaluation of attention for medical AI
In our observational study, we concluded that in the majority of images across these medical image modalities, attention mechanisms improved the visual explainability of the AI in terms of clinical relevance based on subjective feedback from clinicians. The clinicians surveyed, in all but one situation, selected an attention-augmented model over the baseline . The reasoning behind these decisions was based on the ability of the models with attention to focus on more clinically significant regions of the medical image and the low amount of focus that the baseline model had in comparison. Additionally, all attention-augmented models showed an increase in accuracy over the baseline. It is important to note that this is only a single study, and further experimentation must be done across various institutions, specialists, and imaging modalities to provide more comprehensive results. Attention mechanisms are only a single strategy that can be used to potentially improve the “focus” of models towards clinically significant regions of an image. Other approaches have enabled models to focus on more clinically significant regions by analyzing gaze data using observational supervision from clinicians .
In order to increase the clinical confidence in medical AI systems, more effort needs to be made to find ways to integrate mechanisms and capabilities which can enable models to become more explainable, not just more “accurate”. Attention mechanisms, as described, is an approach that can be used to potentially help models learn to focus on clinically relevant regions in medical images. There are various other tools developed in the general field of computer vision to increase the attention of models. By doing so, AI models can potentially provide higher utility in the clinical setting and lead to more robust and clinically explainable decision-making. Making models more explainable can also provide a means to implement quality assurance systems before deploying in any given population.
Availability of data and materials
The datasets generated and/or analysed during the current study are available here: chest x-ray dataset (https://data.mendeley.com/datasets/rscbjbr9sj/2) and dermatological dataset (https://www.kaggle.com/datasets/fanconic/skin-cancer-malignant-vs-benign).
Ting DS, Liu Y, Burlina P, Xu X, Bressler NM, Wong TY. AI for medical imaging goes deep. Nat Med. 2018;24(5):539–40.
Chen H, Gomez C, Huang CM, Unberath M. Explainable medical imaging AI needs human-centered design: guidelines and evidence from a systematic review. NPJ Digit Med. 2022;5(1):1–15.
Suzuki K. Overview of deep learning in medical imaging. Radiol Phys Technol. 2017;10(3):257–73.
Benjamens S, Dhunnoo P, Meskó B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. NPJ Digit Med. 2020;3(1):1–8.
Vellido A. The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural Comput Appl. 2020;32(24):18069–83.
Saporta A, Gui X, Agrawal A, Pareek A, Truong S Q, Nguyen CD, Rajpurkar P, et al. Benchmarking saliency methods for chest X-ray interpretation. medRxiv. 2021.02.28.21252634.
DeGrave AJ, Janizek JD, Lee SI. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat Mach Intell. 2021;3:610–19.
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. p. 7132–41.
Woo S, Park J, Lee JY, Kweon IS. Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV). 2018. p. 3–19.
Wang X, Girshick R, Gupta A, He K. Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. p. 7794–803.
Datta SK, Shaikh MA, Srihari SN, Gao M. Soft attention improves skin cancer classification performance. In: Interpretability of machine intelligence in medical image computing, and topological data analysis and its applications for medical data. Cham: Springer; 2021. p. 13–23.
Schlemper J, Oktay O, Schaap M, Heinrich M, Kainz B, Glocker B, Rueckert D. Attention gated networks: learning to leverage salient regions in medical images. Med Image Anal. 2019;53:197–207.
Azad R, Asadi-Aghbolaghi M, Fathy M, Escalera S. Attention deeplabv3+: Multi-level context attention mechanism for skin lesion segmentation. In: European conference on computer vision. Cham: Springer; 2020. p. 251–66.
Rao A, Park J, Woo S, Lee JY, Aalami O. Studying the effects of self-attention for medical image analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. p. 3416–25.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770–8.
Saab K, Hooper SM, Sohoni NS, Parmar J, Pogatchnik B, Wu S, Ré C. Observational supervision for medical image classification using gaze data. In: International conference on medical image computing and computer-assisted Intervention. Cham: Springer; 2021. p. 603–14.
No funding received.
Ethics approval and consent to participate
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Rao, A., Aalami, O. Towards improving the visual explainability of artificial intelligence in the clinical setting. BMC Digit Health 1, 23 (2023). https://doi.org/10.1186/s44247-023-00022-3