Benchmarking foundation models as feature extractors for weakly-supervised computational pathology

Advancements in artificial intelligence have driven the development ofnumerous pathology foundation models capable of extracting clinically relevantinformation. However, there is currently limited literature independentlyevaluating these foundation models on truly external cohorts andclinically-relevant tasks to uncover adjustments for future improvements. Inthis study, we benchmarked 19 histopathology foundation models on 13 patientcohorts with 6,818 patients and 9,528 slides from lung, colorectal, gastric,and breast cancers. The models were evaluated on weakly-supervised tasksrelated to biomarkers, morphological properties, and prognostic outcomes. Weshow that a vision-language foundation model, CONCH, yielded the highestperformance when compared to vision-only foundation models, with Virchow2 asclose second. The experiments reveal that foundation models trained on distinctcohorts learn complementary features to predict the same label, and can befused to outperform the current state of the art. An ensemble combining CONCHand Virchow2 predictions outperformed individual models in 55

Further reading