INSIGHT: Explainable Weakly-Supervised Medical Image Analysis

A novel approach to interpretable and efficient medical image analysis using weakly supervised learning.

Wenbo Zhang

University of Rochester

Junyu Chen

University of Rochester

Christopher Kanan

University of Rochester

Visualization

Below are examples of heatmaps generated by INSIGHT, which highlight diagnostically relevant regions in whole-slide images (WSIs). Our method achieves this using only WSI-level labels, making it both efficient and interpretable without requiring costly pixel-level annotations.

Motivation

The rapid growth of medical imaging data has presented significant challenges for developing diagnostic systems that are both accurate and interpretable. Traditional methods often rely on fully supervised approaches that require dense annotations, which are labor-intensive and costly to obtain. Moreover, existing aggregators, such as those based on multiple-instance learning (MIL), struggle to achieve a balance between classification accuracy and spatial calibration. While they can identify regions of interest, they typically depend on post-hoc visualization methods like Grad-CAM to generate interpretable outputs. This reliance on external tools introduces additional complexity and fails to integrate interpretability as a core feature of the model.

About INSIGHT

INSIGHT (Integrated Network for Segmentation and Interpretation with Generalized Heatmap Transmission) is a novel framework designed to analyze large-scale medical images, such as whole-slide pathology images (WSIs) and volumetric CT scans, while maintaining interpretability for clinicians. It addresses the limitations of traditional methods by embedding interpretability directly into its architecture, eliminating the need for post-hoc visualization tools like Grad-CAM. INSIGHT combines fine-grained local feature detection with broader contextual awareness through two key modules: the Detection Module, which captures small, diagnostically critical details, and the Context Module, which suppresses irrelevant activations by incorporating global contextual information. This design enables INSIGHT to generate heatmaps that closely align with ground-truth diagnostic regions, offering both accuracy and transparency. By requiring only image-level labels, INSIGHT significantly reduces the annotation burden while delivering state-of-the-art classification and weakly supervised segmentation performance.

INSIGHT Architecture

INSIGHT's inputs and architecture: (a) Images are pre-processed to extract pre-trained features from each CT slice or WSI patch. (b) These features are processed through the Detection and Context modules to generate slice- or patch-level heatmaps by incorporating both fine-grained details and broader contextual information. (c) Heatmaps are aggregated across slices or patches to produce a binary prediction for each category along with interpretable heatmaps.

Quantitative Performance

Below is the comparison of INSIGHT with other models across CAMELYON16, BRACS, and MosMed datasets. The results showcase classification AUC and segmentation Dice metrics.

CAMELYON16 & BRACS

Aggregator CAMELYON16 BRACS
AUC Dice ADH FEA DCIS Invasive Macro AUC
ABMIL 0.975 55.8 ± 25.0 0.656 0.744 0.804 0.995 0.800
CLAM-SB 0.966 64.7 ± 24.1 0.611 0.757 0.833 0.999 0.800
CLAM-MB 0.973 67.7 ± 22.6 0.701 0.687 0.828 0.998 0.804
TransMIL 0.982 12.4 ± 22.4 0.644 0.653 0.769 0.989 0.764
INSIGHT(Ours) 0.990 74.6 ± 19.1 0.734 0.790 0.837 0.999 0.840

MosMed

Task Model Setting AUC / Dice
Classification ViT-COVID 0.6/0.2/0.2 split 0.870
PR-3D-CNN Five-fold CV 0.914 ± 0.049
INSIGHT Five-fold CV 0.962 ± 0.012
Segmentation 3D U-Net Voxel-level 40.5 ± 21.3
3D GAN Volume-level 41.2 ± 14.7
INSIGHT Volume-level 42.7 ± 15.3

Acknowledgments

This work was supported in part by NSF award #2326491. The views and conclusions contained herein are those of the authors and should not be interpreted as the official policies or endorsements of any sponsor. We thank Jhair Gallardo and Shikhar Srivastava for their comments on early drafts.

↑ Back to Top