Abstract
Data mining is an extensive area of research involving pattern discovery and feature extraction which is applied in various critical domains. In clinical aspect, data mining has emerged to assist the clinicians in early detection, diagnosis, and prevention of diseases. Advances in computational methods have led to implementation of machine learning in multi-modal clinical image analysis. One recent method is online learning where data become available in a sequential order, thus sequentially updating the best predictor for the future data at each step, as opposed to batch learning techniques which generate the best predictor by learning the entire data set at once.
In this chapter, we have examined and analysed multi-modal medical images by developing an unsupervised machine learning algorithm based on online variational inference for finite inverted Dirichlet mixture model. Our prime focus was to validate the developed approach on medical images. We do so by implementing the algorithm on both synthetic and real data sets. We test the algorithm’s ability to detect challenging real world diseases, namely brain tumour, lung tuberculosis, and melanomic skin lesion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Same as footnote 1.
- 3.
References
Agrawal, J.P., Erickson, B.J., Kahn, C.E.: Imaging informatics: 25 years of progress. Yearb. Med. Inform. Suppl 1, 23–31 (2016)
Sohail, M.N., Jiadong, R., Uba, M.M., Irshad, M.: A comprehensive looks at data mining techniques contributing to medical data growth: A survey of researcher reviews. In: Patnaik, S., Jain, V. (eds.) Recent Developments in Intelligent Computing, Communication and Devices. Springer, Singapore, pp. 21–26 (2019)
Ganguly, D., Chakraborty, S., Balitanas, M., Kim, Th.: Medical imaging: A review. In: Kim, Th., Stoica, A., Chang, R.S. (eds.) Security-Enriched Urban Computing and Smart Grid. Springer, Heidelberg, pp. 504–516 (2010)
Perera, C.M., Chakrabarti, R.: A review of m-health in medical imaging. Telemed. e-Health 21(2), 132–137 (2015)
Lester, D.S., Olds, J.L.: Biomedical imaging: 2001 and beyond. Anat. Rec. An Offi. Publ. Am. Assoc. Anatomists 265(2), 35–36 (2001)
Van Beek, E.J., Hoffman, E.A.: Functional imaging: CT and MRI. Clin. Chest Med. 29(1), 195–216 (2008)
Doi, K.: Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput. Med. Imaging Graph. 31(4–5), 198–211 (2007)
Petrick, N., Sahiner, B., Armato III, S.G., Bert, A., Correale, L., Delsanto, S., Freedman, M.T., Fryd, D., Gur, D., Hadjiiski, L., Huo, Z., Jiang, Y., Morra, L., Paquerault, S., Raykar, V., Samuelson, F., Summers, R.M., Tourassi, G., Yoshida, H., Zheng, B., Zhou, C., Chan, H.P.: Evaluation of computer-aided detection and diagnosis systems. Med. Phys. 40(8), 087001 (2013)
Erickson, B.J., Korfiatis, P., Akkus, Z., Kline, T.L.: Machine learning for medical imaging. Radiographics 37(2), 505–515 (2017)
Guadalupe Sanchez, M., Guadalupe Sánchez, M., Vidal, V., Verdu, G., Verdú, G., Mayo, P., Rodenas, F.: Medical image restoration with different types of noise. In: 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 4382–4385 (2012)
Sittig, D.F., Wright, A., Osheroff, J.A., Middleton, B., Teich, J.M., Ash, J.S., Campbell, E., Bates, D.W.: Grand challenges in clinical decision support. J. Biomed. Inform. 41(2), 387–392 (2008)
Chen, T.J., Chuang, K.S., Chang, J.H., Shiao, Y.H., Chuang, C.C.: A blurring index for medical images. J. Digit. Imaging 19(2), 118–125 (2005)
Fan, W., Bouguila, N., Ziou, D.: Variational learning for finite Dirichlet mixture models and applications. IEEE Trans. Neural Netw. Learn. Syst. 23(5), 762–774 (2012)
Tirdad, P., Bouguila, N., Ziou, D.: Variational learning of finite inverted Dirichlet mixture models and applications. In: Laalaoui, Y., Bouguila, N. (eds.) Artificial Intelligence Applications in Information and Communication Technologies, vol. 607, pp. 119–145. Springer, Cham (2015)
Robert, C.P., Casella, G.: Monte Carlo Statistical Methods (Springer Texts in Statistics). Springer, Heidelberg (2005)
Gultepe, E., Makrehchi, M.: Improving clustering performance using independent component analysis and unsupervised feature learning. Hum-centric Comput. Inf. Sci. 8(1), 148:1–148:19 (2018)
Fan, W., Bouguila, N., Ziou, D.: Variational learning of finite Dirichlet mixture models using component splitting. Neurocomputing 129, 3–16 (2014)
Bouguila, N., Ziou, D.: Online clustering via finite mixtures of Dirichlet and minimum message length. Eng. Appl. Artif. Intell. 19(4), 371–379 (2006)
Zakariya, S.M., Ali, R., Ahmad, N.: Combining visual features of an image at different precision value of unsupervised content based image retrieval. In: 2010 IEEE International Conference on Computational Intelligence and Computing Research, pp. 1–4 (2010)
Constantinopoulos, C., Likas, A.: Unsupervised learning of Gaussian mixtures based on variational component splitting. IEEE Trans. Neural Netw. 18(3), 745–755 (2007)
Williams, G.: Descriptive and predictive analytics. In: Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery, pp. 171–177. Springer, New York (2011)
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Disc. 15(1), 55–86 (2007)
Bellazzi, R., Zupan, B.: Predictive data mining in clinical medicine: current issues and guidelines. Int. J. Med. Inform. 77(2), 81–97 (2008)
Swan, M.: Emerging patient-driven health care models: an examination of health social networks, consumer personalized medicine and quantified self-tracking. Int. J. Environ. Res. Public Health 6(2), 492–525 (2009)
Iavindrasana, J., Cohen, G., Depeursinge, A., Müller, H., Meyer, R., Geissbuhler, A. Clinical data mining: a review. Yearb. Med. Inform. 121–133 (2018)
Chechulin, Y., Nazerian, A., Rais, S., Malikov, K.: Predicting patients with high risk of becoming high-cost healthcare users in Ontario (Canada). Healthc. Policy 9, 68–79 (2014)
Ramezankhani, A., Kabir, A., Pournik, O., Azizi, F., Hadaegh, F.: Classification-based data mining for identification of risk patterns associated with hypertension in middle eastern population: A 12-year longitudinal study. Medicine (Baltimore) 95(35), e4143 (2016)
Parva, E., Boostani, R., Ghahramani, Z., Paydar, S.: The necessity of data mining in clinical emergency medicine; a narrative review of the current literature. Bull. Emerg. Trauma. 5(2), 90–95 (2017)
Kuo, I.T., Chang, K.Y., Juan, D.F., Hsu, S.J., Chan, C.T., Tsou, M.Y.: Time-dependent analysis of dosage delivery information for patient-controlled analgesia services. PLoS One 13(3), 1–13 (2018)
Lee, M.J., Chen, C.J., Lee, K.T., Shi, H.Y.: Trend analysis and outcome prediction in mechanically ventilated patients: A nationwide population-based study in Taiwan. PLoS One 10(4), 1–13 (2015)
Baek, H., Cho, M., Kim, S., Hwang, H., Song, M., Yoo, S.: Analysis of length of hospital stay using electronic health records: A statistical and data mining approach. PLoS One 13(4), 1–16 (2018)
Tiao, G.G., Cuttman, I.: The inverted Dirichlet distribution with applications. J. Am. Stat. Assoc. 60(311), 793–805 (1965)
Xu, R., Wunsch, D.C.: Clustering algorithms in biomedical research: A review. IEEE Rev. Biomed. Eng. 3, 120–154 (2010)
Wang, H.X., Luo, B., Zhang, Q.B., Wei, S.: Estimation for the number of components in a mixture model using stepwise split-and-merge EM algorithm. Pattern Recogn. Lett. 25(16), 1799–1809 (2004)
Schneider, A., Hommel, G., Blettner, M.: Linear regression analysis: part 14 of a series on evaluation of scientific publications. Dtsch. Arztebl. Int. 44, 776–82 (2010)
Kovalchuk, S.V., Funkner, A.A., Metsker, O.G., Yakovlev, A.N.: Simulation of patient flow in multiple healthcare units using process and data mining techniques for model identification. J. Biomed. Inform. 82, 128–142 (2018)
Jensen, P.B., Jensen, L.J., Brunak, S.: Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13(6), 395–405 (2012)
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)
Corduneanu, A., Bishop, C.: Variational Bayesian model selection for mixture distributions. In: Proceedings Eighth International Conference on Artificial Intelligence and Statistics, pp. 27–34. Morgan Kaufmann, San Francisco (2001)
Lawrence, N.D., Bishop, C.M., Jordan, M.I.: Mixture Representations for Inference and Learning in Boltzmann Machines (2013). CoRR abs/1301.7393. 1301.7393
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999)
Bishop, C.M., Lawrence, N., Jaakkola, T., Jordan, M.I.: Approximating posterior distributions in belief networks using mixtures. In: Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems 10, pp. 416–422. MIT Press, Cambridge (1998)
Amari, S.I.: Natural gradient works efficiently in learning. Neural. Comput. 10(2), 251–276 (1998)
Fan, W., Bouguila, N.: Online variational learning of finite Dirichlet mixture models. Evol. Syst. 3(3), 153–165 (2012)
Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 856–864. Curran Associates, Inc., (2010)
Bakas, S., Kuijf, H.J., Keyvan, F., Reyes, M., van Walsum, T.: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. Springer International Publishing, Berlin (2018)
Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., Lanczi, L., Gerstner, E., Weber, M., Arbel, T., Avants, B.B., Ayache, N., Buendia, P., Collins, D.L., Cordier, N., Corso, J.J., Criminisi, A., Das, T., Delingette, H., Demiralp, Durst, C.R., Dojat, M., Doyle, S., Festa, J., Forbes, F., Geremia, E., Glocker, B., Golland, P., Guo, X., Hamamci, A., Iftekharuddin, K.M., Jena, R., John, N.M., Konukoglu, E., Lashkari, D., Mariz, J.A., Meier, R., Pereira, S., Precup, D., Price, S.J., Raviv, T.R., Reza, S.M.S., Ryan, M., Sarikaya, D., Schwartz, L., Shin, H., Shotton, J., Silva, C.A., Sousa, N., Subbanna, N.K., Szekely, G., Taylor, T.J., Thomas, O.M., Tustison, N.J., Unal, G., Vasseur, F., Wintermark, M., Ye, D.H., Zhao, L., Zhao, B., Zikic, D., Prastawa, M., Reyes, M., Van Leemput, K.: The multimodal brain tumor image segmentation benchmark (brats). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2015)
Kistler, M., Bonaretti, S., Pfahrer, M., Niklaus, R., Büchler, P.: The virtual skeleton database: An open access repository for biomedical research and collaboration. J. Med. Internet Res. 15(11), e245 (2013)
Barkhof, F., Scheltens, P.: Imaging of white matter lesions. Cerebrovasc. Dis. 13(Suppl 2), 21–30 (2002)
Arroyo-Camarena, S., DomÃnguez-Cherit, J., Lammoglia-Ordiales, L., Fabila-Bustos, D.A., Escobar-Pio, A., Stolik, S., Valor-Reed, A., de la Rosa-Vázquez, J.: Spectroscopic and imaging characteristics of pigmented non-melanoma skin cancer and melanoma in patients with skin phototypes iii and iv. Oncol. Ther. 4(2), 315–331 (2016)
Codella, N.C.F., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza, S.W., Kalloo, A., Liopyris, K., Mishra, N.K., Kittler, H., Halpern, A.: Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium On Biomedical Imaging (ISBI), Hosted By The International Skin Imaging Collaboration (ISIC) (2017). CoRR abs/1710.05006, 1710.05006
Asaid, R., Boyce, G., Padmasekara, G.: Use of a smartphone for monitoring dermatological lesions compared to clinical photography. J. Mob. Technol. Med. 1, 16–18 (2012)
Wu, X., Marchetti, M.A., Marghoob, A.A.: Dermoscopy: not just for dermatologists. Melanoma Manag 2(1), 63–73 (2015)
Sakamoto, K.: The pathology of mycobacterium tuberculosis infection. Vet. Pathol. 49(3), 423–39 (2012)
Huda, W., Abrahams, R.B.: Radiographic techniques, contrast, and noise in x-ray imaging. AJR Am. J. Roentgenol. 204(2), W126–131 (2015)
Brady, A., Laoide, R., McCarthy, P., McDermott, R.: Discrepancy and error in radiology: concepts, causes and consequences. Ulster Med. J. 81(1), 3–9 (2012)
Candemir, S., Jaeger, S., Palaniappan, K., P Musco, J., K Singh, R., Xue, Z., Karargyris, A., Antani, S., Thoma, G., Mcdonald, C.: Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration. IEEE Trans. Med. Imaging 33, 577–590 (2014)
Jaeger, S., Karargyris, A., Candemir, S., Folio, L., Siegelman, J., Callaghan, F., Xue, Z., Palaniappan, K., Singh, R.K., Antani, S., Thoma, G., Wang, Y., Lu, P., McDonald, C.J.: Automatic tuberculosis screening using chest radiographs. IEEE Trans. Med. Imaging 33(2), 233–245 (2014)
Kohli, M.D., Summers, R.M., Geis, J.R.: Medical image data and datasets in the era of machine learning-whitepaper from the 2016 c-MIMI meeting dataset session. J. Digit. Imaging 30, 392–399 (2017)
Valindria, V.V., Lavdas, I., Bai, W., Kamnitsas, K., Aboagye, E.O., Rockall, A.G., Rueckert, D., Glocker, B.: Reverse classification accuracy: predicting segmentation performance in the absence of ground truth. IEEE Trans. Med. Imaging 36, 1597–1606 (2017)
Kouanou, A.T., Tchiotsop, D., Kengne, R., Zephirin, D.T., Armele, N.M.A., Tchinda, R.: An optimal big data workflow for biomedical image analysis. Inform. Med. Unlocked 11, 68–74 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 Proof of Eq. (11.17): Variational Solution of \(Q\big ( \mathcal {Z}\big )\)
For the variational solution Q s(Θ s), the general expression expressed as:
where const is an additive term representing every term that is independent of Q s(Θ s). Now consider the joint distribution in Eq. (11.10), the variational solution for \(Q\big (\mathcal {Z}\big )\) can be derived as follows:
Where
and
Since we don’t have a closed form solution for \(\mathcal {R}_{j}\), therefore it is not possible to directly apply the variational inference. Therefore, in order to provide traceable approximations, the second-order Taylor’s expansion is used to approximate the expected values of parameters α j [14]. Hence, considering the logarithm form of (11.6), Eq. (11.47) can be written as
where
Since all the term without \(\mathcal {Z}_{ij}\) can be added to the constant, it possible to show that
To find the exact formula for \(Q(\mathcal {Z})\), Eq. (11.53) should be normalized and the calculation can be expressed as
where
It is noteworthy that \(\sum ^M_{j=1} r_{ij} = 1\), thus the result for \(Q(\mathcal {Z})\) is
1.2 Proof of Eqs. (11.18), (11.22) and (11.23)
Assuming the parameters α jl are independent in a mixture model with M components, we can factorize Q(α) as
We compute the variational solution for the Q(α jl ) by using Eq. (11.16) instead of using the gradient method. The logarithm of the variational solution Q(α jl ) is given by,
where,
Similar to what we encountered in the case of R j, the equation for \(\mathcal {J}\big (\alpha _{jl}\big )\) is also intractable. We solve this problem finding the lower bound for the equation by calculating the first-order Taylor expansion with respect to \(\overline {\alpha }_{jl}\). The calculated lower bound is given by [44],
Substituting this equation for lower bound in Eq. (11.57)
This equation can be rewritten as,
where,
Equation (11.61) is the logarithmic form of a gamma distribution. If we exponentiate both the sides, we get,
This leaves us with the optimal solution for the hyper-parameters u jl and ν jl given by,
1.3 Proof of Eq. (11.27)
We calculate the mixing coefficients value π by maximizing the lower bound w.r.t to π. It is essential to include Lagrangian term in the lower bound because of the constraint \(\sum ^M_{j=1} \pi _j = 1\). Then, solving for the derivative w.r.t π j and setting the result to zero, we have [44]
By taking the sum of both sides of Eq. (11.67) over j, we can obtain λ = −N. Then substituting the value of λ Eq. (11.66), we can obtain
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kalra, M., Osadebey, M., Bouguila, N., Pedersen, M., Fan, W. (2020). Online Variational Learning for Medical Image Data Clustering. In: Bouguila, N., Fan, W. (eds) Mixture Models and Applications. Unsupervised and Semi-Supervised Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-23876-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-23876-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23875-9
Online ISBN: 978-3-030-23876-6
eBook Packages: EngineeringEngineering (R0)