Arrow Left Health care

Use case: 20k Challenge – A challenge to use the data of 20000 lung cancer patients from 5 countries to predict outcomes of therapy choices for patients

The aim of the project is to show that the Personal Health Train (PHT) distributed learning infrastructure can be scaled to many thousands of patients from at least five healthcare providers in more than five countries. This amount of data is in the same order of magnitude of national healthcare registries. Based on the data of thousands of cancer patients treatment outcomes, a prediction model for post-treatment two-year survival will be developed by using machine learning.

Medical relevance 

The prediction model can support shared decision making for patients dealing with non-small cell lung cancer (NSCLC)


Within 4 months, we connected databases with 23203 patient cases across 8 healthcare institutes in 5 countries (Amsterdam, Cardiff, Maastricht, Manchester, Nijmegen, Rome, Rotterdam, Shanghai) using the PHT. A distributed logistic regression model predicting post-treatment two-year survival was trained on 14810 patients treated between 1978 and 2011 and validated on 8393 patients treated between 2012 and 2015.

Main results

This project did not address a novel analysis technique or infrastructure. However, it showed that the personal health train can scale to multiple countries and hospitals within 4 months, which is a small timespan for sharing insights from data. Especially when hospitals were involved in PHT-projects previously.

Lessons learned

  • The personal health train is a scalable model
  • Research questions can be relatively quickly conducting across multiple continents while data is not leaving the individual institutes

 Follow up

  • Extending the set of used variables in the current model (e.g. image-derived information)
  • Including more different (e.g. American) countries to extend the mix in represented countries and continents

Available material

In this project, we used the PHT implementation developed by Varian Medical Systems, where we developed the infrastructure and algorithms ourselves. These algorithms are publicly available at GitHub. The infrastructure is commercially available for oncology purposes (TRL 7-8). The algorithms are available, and can be executed on the commercial infrastructure (TRL 6).

Project details

Project leader

André Dekker

Funders Varian Medical Systems, NOW, Province of Limburg, Dutch Cancer Society
Cooperating partners
  • Department of Radiation Oncology (MAASTRO), GROW – School for Oncology and Developmental Biology, Maastricht University Medical Centre+
  • The D-Lab: Dpt of PrecisionMedicine, GROW – School for Oncology and Developmental Biology, Maastricht University Medical Centre+
  • Department of Radiation Oncology, Radboud University Medical Center,Nijmegen
  • Department of Radiation Oncology, The Netherlands Cancer Institute – Antoni van Leeuwenhoek, Amsterdam, The Netherlands
  • The University of Manchester,Manchester Academic Health Science Centre, The Christie NHS Foundation Trust, United Kingdom
  • Università Cattolica del Sacro Cuore, Fondazione Policlinico Universitario A.Gemelli IRCCS, Rome, Italy
  • Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Department of Oncology, Shanghai Medical College, Fudan University,Shanghai, China
  • School of Engineering, Cardiff University
  • Velindre Cancer Centre, Cardiff, United Kingdom

Department of Radiation Oncology, Erasmus MC Cancer Institute, Rotterdam, The Netherlands




T.M. Deist, F.J.W.M. Dankers, P. Ojha, M. Scott Marshall, T. Janssen, C. Faivre-Finn, C. Masciocchi, V. Valentini, J. Wang, J. Chen, Z. Zhang, E. Spezi, M. Button, J. Jan Nuyttens, R. Vernhout, J. van Soest, A. Jochems, R. Monshouwer, J. Bussink, G. Price, P. Lambin, A. Dekker, Distributed learning on 20 000+ lung cancer patients – The Personal Health Train, Radiotherapy and Oncology. 144 (2020) 189–200.

Share this page…