Abstract
Constrained non-negative matrix factorization (CNMF) is an effective machine learning technique to cluster documents in the presence of class label constraints. In this work, we provide a novel application of this technique in research on neuro-degenerative diseases. Specifically, we consider a dataset of documents from the Netherlands Brain Bank containing free text describing clinical and pathological information about donors affected by Multiple Sclerosis. The goal is to use CNMF for identifying clinical profiles with pathological information as constraints. After pre-processing the documents by means of standard filtering techniques, a feature representation of the documents in terms of bi-grams is constructed. The high dimensional feature space is reduced by applying a trimming procedure. The resulting datasets of clinical and pathological bi-grams are then clustered using non-negative matrix factorization (NMF) and, next, clinical data are clustered using CNMF with constraints induced by the clustering of pathological data. Results indicate the presence of interesting clinical profiles, for instance related to vision or movement problems. In particular, the use of CNMF leads to the identification of a clinical profile related to diabetes mellitus. Pathological characteristics and duration of disease of the identified profiles are analysed. Although highly promising, results of this investigation should be interpreted with care due to the relatively small size of the considered datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Urbach, D., Moore, J.H.: Data mining and the evolution of biological complexity. BioData Min. 4 (2011)
Davis, D., Chawla, N.V.: Exploring and exploiting disease interactions from multi-relational gene and phenotype networks. PloS ONE 6(7), e22670 (2011)
Bell, J.E., et al.: Management of a twenty-first century brain bank: experience in the BrainNet Europe consortium. Acta Neuropathol. 115(5), 497–507 (2008)
Vapnik, V., Vashist, A.: A new learning paradigm: learning using privileged information. Neural Netw. 22(5–6), 544–557 (2009)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Wu, H., Liu, Z.: Non-negative matrix factorization with constraints. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence, pp. 506–511 (2010)
Roberts, K., Harabagiu, S.M.: A flexible framework for deriving assertions from electronic medical records. J. Am. Med. Inform. Assoc. 18(5), 568–573 (2011)
Roque, F.S., et al.: Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Comput. Biol. 7(8), E1002141 (2011)
Hripcsak, G., et al.: Mining complex clinical data for patient safety research: a framework for event discovery. J. Biomed. Inform. 36(1), 120–130 (2003)
Melton, G.B., Hripcsak, G.: Automated detection of adverse events using natural language processing of discharge summaries. J. Am. Med. Inform. Assoc. 12, 448–457 (2005)
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference, pp. 267–273. ACM (2003)
Huang, X., Zheng, X., Yuan, W., Zhu, S.: Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization. Inf. Sci. 181, 2293–2302 (2012)
Ling, Y., Pan, X., Li, G., Hu, X.: Clinical documents clustering based on medication/symptom names using multi-view nonnegative matrix factorization. IEEE Trans. Nanobiosci. 14(5), 500–504 (2015)
Luo, Y., et al.: Subgraph augmented non-negative tensor factorization (SANTF) for modeling clinical narrative text. J. Am. Med. Inform. Assoc. 22(5), 1009–1019 (2015)
Bö, L., Geurts, J.J.G., Mörk, S.J., Van der Valk, P.: Grey matter pathology in multiple sclerosis. Acta Neurol. Scand. 113, 48–50 (2006)
Van der Valk, P., De Groot, C.J.A.: Staging of multiple sclerosis (MS) lesions: pathology of the time frame of MS. Neuropathol. Appl. Neurobiol. 26, 2–10 (2000)
Feldman, R., Fresko, M., Kinar, Y., Lindell, Y., Liphstat, O., Rajman, M., Schler, Y., Zamir, O.: Text mining at the term level. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 65–73. Springer, Heidelberg (1998)
Vavasis, S.A.: On the complexity of nonnegative matrix factorization. SIAM J. Optim. 20(3), 1364–1377 (2009)
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: NIPS, pp. 556–562. MIT Press (2000)
Meilă, M., Heckerman, D.: An experimental comparison of model-based clustering methods. Mach. Learn. 42(1–2), 9–29 (2001)
Tettey, P., Simpson, S., Taylor, B.V., van der Mei, I.A.F.: The co-occurrence of multiple sclerosis and type 1 diabetes: shared aetiologic features and clinical implication for MS aetiology. J. Neurol. Sci. 348(1), 126–131 (2015)
Acknowledgments
This work has been partially funded by the Netherlands Organization for Scientific Research (NWO) within the NWO project 612.001.119.
Author information
Authors and Affiliations
Consortia
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Acquarelli, J., The Netherlands Brain Bank., Bianchini, M., Marchiori, E. (2016). Discovering Potential Clinical Profiles of Multiple Sclerosis from Clinical and Pathological Free Text Data with Constrained Non-negative Matrix Factorization. In: Squillero, G., Burelli, P. (eds) Applications of Evolutionary Computation. EvoApplications 2016. Lecture Notes in Computer Science(), vol 9597. Springer, Cham. https://doi.org/10.1007/978-3-319-31204-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-31204-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31203-3
Online ISBN: 978-3-319-31204-0
eBook Packages: Computer ScienceComputer Science (R0)