Hierarchical Similarity Network Fusion for Discovering Cancer Subtypes
Recent breakthroughs in biologic sequencing technologies have cost-effectively yielded diverse types of observations. Integrative analysis of multiple platform cancer data, which is capable of revealing intrinsic characteristics of a biological process, has become an attractive research route on cancer subtypes discovery. Most machine learning based methods need represent each input data in unified space, losing certain important features or resulting in various noises in some data types. Furthermore, many network based data integration methods treat each type data independently, leading to a lot of inconsistent conclusions. Subsequently, similarity network fusion (SNF) was developed to deal with such questions. However, Euclidean distance metrics employed in SNF suffers curse of dimensionality and thus gives rise to poor results.
To this end, we propose a new integrated method, dubbed hierarchical similarity network (HSNF), to learn a fused discriminating patient similarity network. HSNF randomly samples sub-features from different input data to construct multiple input similarity matrixes used as a basic of fusion so that diverse similarity matrixes are generated by multiple random sampling. Then we design a hierarchical fusion framework to make full use of the complementariness of diverse similarity networks from different feature modalities. Finally, based on the final fused similarity matrix, spectral clustering was used to discover cancer subtypes. Experimental results on five public cancer datasets manifest that HSNF can discover significantly different subtypes and can consistently outperform the-state-of-the-art in terms of silhouette, and p-value of survival analysis.
KeywordsHierarchical similarity network fusion Multi-platform cancer data Cancer subtypes discovery Data integration
The authors would like to thank the anonymous reviewers. This work has been supported by the National Natural Science Foundation of China (Grant No. 61332014 and 61772426).
- 3.Kim, D., Lee, G., Sohn, K.-A., Bang, L., Kim, S.Y.: Identifying subtype-specific associations between gene expression and DNA methylation profiles in breast cancer. BMC Med. Genom. 10(1), 28 (2017)Google Scholar
- 5.Verhaak, R.G., Hoadley, K.A., Purdom, E., Wang, V., Qi, Y., Wilkerson, M.D., Miller, C.R., Ding, L., Golub, T., Mesirov, J.P.: Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 17(1), 98–110 (2010)CrossRefGoogle Scholar
- 9.Dai, X., Li, T., Bai, Z., Yang, Y., Liu, X., Zhan, J., Shi, B.: Breast cancer intrinsic subtype classification, clinical use and future trends. Am. J. Cancer Res. 5(10), 2929 (2015)Google Scholar
- 14.Wang, B., Jiang, J., Wang, W., Zhou, Z.-H., Tu, Z.: Unsupervised metric fusion by cross diffusion. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2997–3004. IEEE (2012)Google Scholar
- 15.Tao, H., Hou, C., Zhu, J., Yi, D.: Multi-view clustering with adaptively learned graph. In: Asian Conference on Machine Learning, pp. 113–128 (2017)Google Scholar
- 20.Zhang, Z., Zhai, Z., Li, L.: Uniform projection for multi-view learning. IEEE Trans. Pattern anal. Mach. Intell. (2016)Google Scholar
- 21.Law, M.T., Urtasun, R., Zemel, R.S.: Deep spectral clustering learning. In: International Conference on Machine Learning, pp. 1985–1994 (2017)Google Scholar
- 23.Liaw, A., Wiener, M.: Classification and regression by randomForest. R News 2(3), 18–22 (2002)Google Scholar