In cancer research, survival analysis helps predict the chances of survival a few years after a cancer diagnosis. It can give insight in the factors that influence the chance of survival, e.g. the patient’s fitness, the method of treatment, and hospital of diagnosis. Survival analysis can be enriched by adding relevant information. This may help us understand how we can increase the chance of survival and improve treatment.
Various organisations record information relevant for survival analysis. That’s why traditionally data were collected centrally prior to analysis. However, this data pooling is undesirable because of the associated risks for privacy. To study the factors influencing cancer survival - we need new approaches that allow us to analyse data in a distributed manner. We adopt the Personal Health Train (PHT) philosophy by building algorithms that enable us to conduct survival analysis without the need to share original patient records and thus minimize legal risks. This way, our analysis can be enriched with new relevant information that may give us a better understanding of factors that influence our chances of survival.
The situation that various organisations each have different pieces of information about the same group of people, is called vertical partitioning of data. Distributed analysis on vertically partitioned data requires more complex techniques. We apply the latest developments in cryptography, i.e. secure multi-party computation (MPC).
MPC is an umbrella term for techniques that allow different entities to jointly perform analysis on data without sharing their data. IKNL and TNO are collaborating to develop solutions using these technologies to enable privacy-preserving training of survival analysis models (e.g. Kaplan-Meier estimator, Log Rank Test, Cox regression, etc.)
A library of MPC algorithms for vertically-partitioned data, i.e. Log Rank test, Kaplan-Meier estimator, and Cox Proportional Hazards Model.
One still has to be careful that output from a particular algorithm does not reveal sensitive information. Publishing a Kaplan-Meier curve is practically equivalent to releasing the sensitive data, as the underlying datapoints can easily be recovered from the curve. Hence, we advise against publishing such curves, without some form of distortion, when they are computed on multiple sensitive data sources.
While MPC is a little more complex, it is also more secure, as it hides intermediary computations and only reveals the final result. Other techniques still require aggregate information to be shared which could still reveal privacy-sensitive information.
Continue extending our library with new survival models for vertically-partitioned data using MPC techniques. Once developed, the intention is to collaborate with other organisations to perform a joint analysis and hopefully get a better understanding of what factors might influence our chances of surviving from cancer.
|Funders||Netherlands Comprehensive Cancer Organisation (IKNL), Netherlands Organisation for applied scientific research|