Abstract
Abnormality detection, also known as outlier detection or novelty detection, seeks to identify data that do not match an expected distribution. In medical imaging, this could be used to find data samples with possible pathology or, more generally, to exclude samples that are normal. This may be done by learning a model of normality, against which new samples are evaluated. In this paper four methods, each representing a different family of techniques, are compared: one-class support vector machine, isolation forest, local outlier factor, and fast-minimum covariance determinant estimator. Each method is evaluated on patches of CT interstitial lung disease where the patches are encoded with one of four embedding methods: principal component analysis, kernel principal component analysis, a flat autoencoder, and a convolutional autoencoder. The data consists of 5500 healthy patches from one patient cohort defining normality, and 2970 patches from a second patient cohort with emphysema, fibrosis, ground glass opacity, and micronodule pathology representing abnormality. From this second cohort 1030 healthy patches are used as an evaluation dataset. Evaluation occurs in both the accuracy (area under the ROC curve) and runtime efficiency. The fast-minimum covariance determinant estimator is demonstrated to have a fair time scaling with dataset dimensionality, while the isolation forest and one-class support vector machine scale well with dimensionality. The one-class support vector machine is the most accurate, closely followed by the isolation forest and fast-minimum covariance determinant estimator. The embeddings from kernel principal component analysis are the most generally useful.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note: The predict times do no include the time taken to run the embedding method on the data being predicted on.
References
Barbará, D., Li, Y., Couto, J., Lin, J.-L., Jajodia, S.: Bootstrapping a data mining intrusion detection system. In: Proceedings of the 2003 ACM Symposium on Applied Computing, pp. 421–425. ACM (2003)
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM SIGMOD Record, vol. 29, pp. 93–104. ACM (2000)
Chollet, F., et al.: Keras (2015). https://github.com/keras-team/keras
Depeursinge, A., Vargas, A., Platon, A., Geissbuhler, A., Poletti, P.-A., Müller, H.: Building a reference multimedia database for interstitial lung diseases. Comput. Med. Imag. Graph. 36(3), 227–238 (2012)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)
Ertöz, L., Steinbach, M., Kumar, V.: Finding topics in collections of documents: a shared nearest neighbor approach. In: Wu, W., Xiong, H., Shekhar, S. (eds.) Clustering and Information Retrieval, pp. 83–103. Springer, Boston (2004). https://doi.org/10.1007/978-1-4613-0227-8_3
Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
He, Z., Xiaofei, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recogn. Lett. 24(9–10), 1641–1650 (2003)
Jolliffe, I.T.: Principal component analysis and factor analysis. In: Jolliffe, I.T. (ed.) Principal Component Analysis, pp. 115–128. Springer, New York (1986). https://doi.org/10.1007/978-1-4757-1904-8_7
Kohonen, T., Schroeder, M.R., Huang, T.S.: Self-Organizing Map, p. 2. Springer, Secaucus (2001)
Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining. ICDM 2008, pp. 413–422. IEEE (2008)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Radiopaedia: Windowing (ct), (2018). https://radiopaedia.org/articles/windowing-ct
Rousseeuw, P.J., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3), 212–223 (1999)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Schölkopf, B., Smola, A., Müller, K.-R.: Kernel principal component analysis. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0020217
Sorensen, L., Shaker, S.B., De Bruijne, M.: Quantitative analysis of pulmonary emphysema using local binary patterns. IEEE Trans. Med. Imag. 29(2), 559–569 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Daykin, M., Sellathurai, M., Poole, I. (2018). A Comparison of Unsupervised Abnormality Detection Methods for Interstitial Lung Disease. In: Nixon, M., Mahmoodi, S., Zwiggelaar, R. (eds) Medical Image Understanding and Analysis. MIUA 2018. Communications in Computer and Information Science, vol 894. Springer, Cham. https://doi.org/10.1007/978-3-319-95921-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-95921-4_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95920-7
Online ISBN: 978-3-319-95921-4
eBook Packages: Computer ScienceComputer Science (R0)