Hierarchical Multimodal Fusion of Deep-Learned Lesion and Tissue Integrity Features in Brain MRIs for Distinguishing Neuromyelitis Optica from Multiple Sclerosis
Neuromyelitis optica spectrum disorder (NMOSD) is a disease of the central nervous system that is often misdiagnosed as multiple sclerosis (MS) because they share similar clinical and radiological characteristics. Two key pathological signs of NMOSD and MS that are detectable on magnetic resonance imaging (MRI) are white matter lesions and alterations in tissue integrity as measured by fractional anisotropy (FA) values on diffusion tensor images (DTIs). This paper proposes a multimodal deep learning model that discovers latent features in brain lesion masks and DTIs for distinguishing NMOSD from MS. The main technical challenge is to optimally extract and integrate features from two very heterogeneous image types (lesion masks and FA maps). Our solution is to first build two modality-specific pathways, each designed to accommodate the expected feature density and scale, then integrate them into a hierarchical multimodal fusion (HMF) model. The HMF model contains two multimodal fusion layers operating at two different scales, which in turn are joined by a multi-scale fusion layer. We hypothesize that the HMF approach would allow the automatic extraction of joint-features of heterogeneous image types to be optimized with greater efficiency and accuracy than the traditional multimodal approach of combining only the top-layer modality-specific features with a single fusion layer. The proposed model gives an average diagnostic accuracy of 81.3% (85.3% sensitivity and 75.0% specificity) on 82 NMOSD patients and 52 MS patients in a seven-fold cross-validation, which significantly outperforms the user-defined MRI features previously used in clinical studies, as well as deep-learned features using the conventional fusion approach.
This work was supported by the Natural Sciences and Engineering Research Council of Canada, the MS Society of Canada, the Milan and Maureen Ilich Foundation, and the National Research Foundation of Korea.
- 4.Yoo, Y., Tang, L.W., Brosch, T., Li, D.K.B., Metz, L., Traboulsee, A., Tam, R.: Deep learning of brain lesion patterns for predicting future disease activity in patients with early symptoms of multiple sclerosis. In: Carneiro, G., et al. (eds.) LABELS/DLMIA -2016. LNCS, vol. 10008, pp. 86–94. Springer, Cham (2016). doi: 10.1007/978-3-319-46976-8_10 CrossRefGoogle Scholar
- 5.Karpathy, A., Toderici, G., Shetty, S., et al.: Large-scale video classification with convolutional neural networks. In: Proceeding of IEEE CVPR (2014)Google Scholar
- 6.Ngiam, J., Khosla, A., et al.: Multimodal deep learning. In: Proceeding of ICML (2011)Google Scholar
- 10.Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML (2013)Google Scholar
- 11.Zeiler, M.: ADADELTA: An adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)
- 14.Montavon, G., Orr, G.B., Müller, K.-R. (eds.): Neural Networks: Tricks of the Trade. LNCS, vol. 7700. Springer, Heidelberg (2012)Google Scholar
- 16.Neelakantan, A., Vilnis, L., Le, Q.V., et al.: Adding gradient noise improves learning for very deep networks. arXiv preprint arXiv:1511.06807 (2015)
- 17.Xu, T., Zhang, H., Huang, X., Zhang, S., Metaxas, D.N.: Multimodal deep learning for cervical dysplasia diagnosis. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 115–123. Springer, Cham (2016). doi: 10.1007/978-3-319-46723-8_14 CrossRefGoogle Scholar