Transfer learning and choice intuition

While a key component to the success of deep learning is the availability of massive amounts of training data, medical image datasets are often limited in diversity and size. Transfer learning has the potential to bridge the gap between related yet different domains [GC17]. For medical applications, however, it remains unclear whether it is more beneficial to pre-train on natural or medical images. Our study contributes to this discussion by comparing initialization on two large-scale datasets, ImageNet (natural) and RadImageNet (medical), across seven medical classification tasks [JD23].

To further understand model robustness, we introduced the Medical Imaging Contextualized Confounder Taxonomy (MICCAT) [JD24], a framework where we conceptualize confounders in medical imaging. We investigated a range of confounders, both synthetic and sampled from the data, using two public chest X-ray and CT datasets. Based on our findings, we recommend that researchers assess model robustness with careful consideration of pre-training sources and confounding factors.

Our latest study [LY26] takes a complementary Human-computer interaction (HCI) approach, using a task-based survey to explore how machine learning practitioners choose their source datasets for transfer learning. We find that decisions depend on task context, community norms, dataset properties, and perceived similarity. Ambiguous terminology further underscores the need for clearer definitions and HCI tools. By clarifying these heuristics, our work offers practical insights for more systematic source dataset selection in transfer learning.

Publications

[LY26] Intuitions of machine learning researchers about transfer learning for medical image classification
Yucheng Lu, Hubert Dariusz Zając, Veronika Cheplygina, Amelia Jiménez-Sánchez
Preprint, under review
PDF  

[JD24] Source matters: source dataset impact on model robustness in medical imaging
Dovile Juodelyte, Yucheng Lu, Amelia Jiménez-Sánchez, Sabrina Bottazzi, Enzo Ferrante, Veronika Cheplygina
AMAI Workshop @ Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024 [oral]
PDF   Code

[JD23] Revisiting hidden representations in transfer learning for medical imaging
Dovile Juodelyte, Amelia Jiménez-Sánchez, Veronika Cheplygina
Transactions on Machine Learning Research -- TMLR 2023
PDF   Code

Funding

  • DFF (Independent Research Council Denmark) Inge Lehmann 1134-00017B.
  • Novo Nordisk Foundation NNF21OC0068816.