Bioinformatics

NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data

Zou, M., Liu, Z., Zhang, X.-S., Wang, Y..

Motivation: In prognosis and survival studies, an important goal is to identify multi-biomarker panels with predictive power using molecular characteristics or clinical observations. Such analysis is often challenged by censored, small-sample-size, but high-dimensional genomic profiles or clinical data. Therefore, sophisticated models and algorithms are in pressing need.

Results: In this study, we propose a novel Area Under Curve (AUC) optimization method for multi-biomarker panel identification named NCC-AUC (Nearest Centroid Classifier for AUC optimization). Our method is motived by the connection between AUC score for classification accuracy evaluation and Harrell’s concordance index in survival analysis. This connection allows us to convert the survival time regression problem to a binary classification problem. Then an optimization model is formulated to directly maximize AUC and meanwhile minimize the number of selected features to construct a predictor in the nearest centroid classifier framework. NCC-AUC shows its great performance by validating both in genomic data of breast cancer and clinical data of stage IB NSCLC (Non-Small-Cell Lung Cancer). For the genomic data, NCC-AUC outperforms SVM (Support Vector Machine) and SVM-RFE (Support Vector Machine-based Recursive Feature Elimination) in classification accuracy. It tends to select a multi-biomarker panel with low average redundancy and enriched biological meanings. Also NCC-AUC is more significant in separation of low and high risk cohorts than widely used Cox model (Cox proportional-hazards regression model) and L1-Cox model (L1 penalized in Cox model). These performance gains of NCC-AUC are quite robust across 5 subtypes of breast cancer. Further in an independent clinical data, NCC-AUC outperforms SVM and SVM-RFE in predictive accuracy and is consistently better than Cox model and L1-Cox model in grouping patients into high and low risk categories.

Conclusion: In summary, NCC-AUC provides a rigorous optimization framework to systematically reveal multi-biomarker panel from genomic and clinical data. It can serve as a useful tool to identify prognostic biomarkers for survival analysis.

Availability: NCC-AUC is available at http://doc.aporc.org/wiki/NCC-AUC.