Risk Group Detection and Survival Function Estimation for Interval Coded Survival Methods

V. Van Belle, P. Neven, V. Harvey, S. Van Huffel, J. Suykens, and S. Boyd

Neurocomputing, 113:200-210, 2013.

The highly flexible model structure of methods in data mining and machine learning results in models that are often difficult to interpret. Their use in domains where interpretability is an issue is therefore hampered. In order to bridge the gap between advanced modeling techniques and their use in domains that demand interpretable results, the interpretability aspect should be included in the design of the technique. The Interval Coded Score index (ICS) is a recently proposed model that satisfies this condition and automatically detects thresholds on variables to generate score systems. The method was extended for censored data (ICSc) but two problems remain: (i) given a prognostic index, how can observations be grouped in different risk groups; (ii) given the risk groups, how can survival curves be estimated for survival models based on support vector machines or ICS models. This work offers solutions to both these problems. The ICSc model is used on the prognostic index to detect thresholds on this index. A grouped index, that can be interpreted as a risk group indicator, is the result. The method is then modified to ensure that observations with a lower prognostic index are allocated to higher risk groups. The second problem is tackled by simultaneously estimating multiple Kaplan–Meier curves, taking into account that the estimated survival curve for higher risk groups should always be lower than the curve for lower risk groups. The proposed approach is illustrated on the prognosis of breast cancer patients and compared with the proportional hazard model. Both models are comparable w.r.t. discrimination, but calibration is better for the ICSc risk groups.