Copycats: the many lives of a publicly available medical imaging dataset

Date:

Andaluz.IA aims at showcasing the research on Artificial Intelligence developed by scientists in/from Andalusia. That is, scientists who currently work in Andalusia or who pursued part of their studies or career in Andalusia. This unique meeting gathers multidisciplinary researchers working in the field of AI and shows the potential of the community to become the AI hub in Southern Europe.

In my talk, I will provide recommendations to enhance the accessibility of machine learning datasets and their evaluation, including adding rich metadata and real-world testing, particularly in the context of medical image analysis. We recommend that documentation should be complete and up-to-date. Additionally, we discuss how community-contributed platforms like Kaggle or HuggingFace could benefit from commons-based governance. We present additional findings and a more in-depth discussion in our paper.