Pap-smear Image Classification Using Randomized Neural Network Based Signature

de Mesquita Sá Junior, Jarbas Joaci; Backes, André R.; Bruno, Odemir Martinez

doi:10.1007/978-3-319-75193-1_81

Jarbas Joaci de Mesquita Sá Junior^15,16,
André R. Backes¹⁷ &
Odemir Martinez Bruno¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10657))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

2115 Accesses
2 Citations

Abstract

This paper presents a state-of-the-art texture analysis method called “randomized neural network based signature” applied to the classification of pap-smear cell images for the Papanicolaou test. For this purpose, we used a well-known benchmark dataset composed of 917 images and compared the aforementioned image signature to other texture analysis methods. The obtained results were promising, presenting accuracy of 87.57% and AUC of 0.8983 using LDA and SVM, respectively. These performance values confirm that the randomized neural network based signature can be applied successfully to this important medical problem.

You have full access to this open access chapter, Download conference paper PDF

RepoMedUNM: A New Dataset for Feature Extraction and Training of Deep Learning Network for Classification of Pap Smear Images

Classification of Pap Smear Image of Cervix Cell Using Machine Learning Techniques and Transfer Learning-Based Convolutional Neural Network Architecture and Scrutinizing Their Performances

Single-cell conventional pap smear image classification using pre-trained deep neural network architectures

Article Open access 29 June 2021

Keywords

1 Introduction

Texture is among the most important attributes in computer vision and has been the focus of intensive research throughout the years. In a concise term, we can define texture as an arrangement of sub-patterns, which can be pixels, regions or other visual attributes [1]. Obviously, such definition is quite restrict and does not encompass a great variety of images (for instance, smoke, mammograms, fire, water etc.), which present a persistent stochastic pattern with a cloud-like appearance [2].

Even though texture lacks a formal definition, it is a feature easily understood by the human visual system. Such importance has motivated the development of many techniques for the analysis and recognition of texture patterns, making this a field of intense research [3]. Among the many techniques available, there are those that describe the image texture using second-order statistics [4, 5], spectral analysis (e.g., Fourier and Gabor filters) [6,7,8], local binary patterns [9], gravitational systems [10] and agents walking over the texture pattern [1].

Medical image analysis is a field of intense research, with many approaches being developed over the years. For instance, [11] proposed LBP variants as texture descriptors for medical image analysis, which were evaluated in different medical datasets, such as cell phenotype image classification, neonatal facial images classification of pain states and detection of abnormal smear cells. In [12], a fuzzy clustering algorithm was proposed for brain tumor segmentation. The authors stated that the fuzzy clustering enables many cases of uncertainty to be considered during the segmentation process. Breast cancer risk assessment has been the focus of many studies on texture analysis. In [13], two types of texture features are proposed to assess breast cancer risk: textons based on local pixel intensities and features based on oriented tissue structures. In [14], background intensity independent texture features were proposed for mammogram classification. Another topic of intense research is the identification and segmentation of melanocytic skin lesions. Machine learning techniques were used in [15] to select the parameters of a classification framework of melanocytic lesions. The paper [16] presented an approach using a feature learning scheme and normalized graph cuts for skin lesion image segmentation.

This paper proposes to apply a recent and very discriminative texture analysis method to a relevant medical problem, which consists of classifying pap-smear cells to discover pre-cancerous or cancerous stages in the cervix. Section 2 briefly describes the randomized neural network and how to use it to obtain an image signature from its neuron weights. Section 3 presents the pap-smear database, the other texture analysis methods used for comparison and the classification procedure. Section 4 discusses the obtained results, and, finally, Sect. 5 presents some remarks about this work.

2 Randomized Neural Network and its Texture Signature

A randomized neural network [17,18,19,20] is a recent proposal of neural network that has only two neuron layers and a very fast training procedure. In the hidden layer, the weights of the neurons are randomly determined according to a uniform or Gaussian distribution. These weights can be arranged in a matrix

$$\begin{aligned} W=\left( \begin{array}{ccccc} w_{10} &{} w_{11} &{} \ldots &{} w_{1p} \\ w_{20} &{} w_{21} &{} \ldots &{} w_{2p} \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ w_{Q0} &{} w_{Q1} &{} \ldots &{} w_{Qp} \\ \end{array} \right) , \end{aligned}$$

(1)

where each line represent the weights of a determined hidden neuron q, p is the number of attributes of an input vector ${{\varvec{x}}}$, and Q is the total of hidden neurons.

Let $X=\left[ {{{\varvec{x}}}_\mathbf{1}}, {{{\varvec{x}}}_\mathbf{2}}, \dots , {{{\varvec{x}}}_{{\varvec{N}}}}\right] $ and $D=\left[ {{{\varvec{d}}}_\mathbf{1}}, {{{\varvec{d}}}_\mathbf{2}}, \dots , {{{\varvec{d}}}_{{\varvec{N}}}}\right] $ be matrices representing the input vectors ${{{\varvec{x}}}_{{\varvec{i}}}}$ and their respective labels ${{{\varvec{d}}}_{{\varvec{i}}}}$ (N is the number of feature vectors). Then, after inserting a new first line composed of $-1$ into X (for bias), we can provide the output of the hidden neurons according to the equation $Z=\phi (WX)$, where $\phi (.)$ is a transfer function (in general, logistic or hyperbolic function).

Next, we create a matrix $Z=\left[ {{{\varvec{z}}}_\mathbf{1}}, {{{\varvec{z}}}_\mathbf{2}}, \dots , {{{\varvec{z}}}_{{\varvec{N}}}}\right] $ representing the output of the hidden neurons for each input feature vector ${{{\varvec{x}}}_{{\varvec{i}}}}$. Again, we insert a new first line composed of $-1$ into Z (for bias) and the objective is to solve $D=MZ$, where M represents the weights of the output neurons. The matrix M can be easily obtained after some simple matrix operations, according to the following equation

$$\begin{aligned} M=DZ^{T}(ZZ^{T})^{-1}. \end{aligned}$$

(2)

2.1 Randomized Neural Network Texture Signature

The random neural network texture signature is proposed in the paper [21] and consists of using image pixels as input and label data in order to train a randomized neural network. Next, the weights of the output neuron layer of this trained network are used as the image signature. For this purpose, the image is divided into overlapping windows $K \times K$ ($K=\{3,5,7\}$). For each window, its border pixels are used as input feature vector ${{{\varvec{x}}}_{{\varvec{i}}}}$ and its central pixel is used as the respective scalar label $d_i$. Thus, we have 8-, 16- and 24-dimensional feature vectors ${{{\varvec{x}}}_{{\varvec{i}}}}$ for the aforementioned window sizes, respectively.

The next step is to determine the values of the matrix W. For this, the paper [21] adopted the Linear Congruent Generator (LCG) [22, 23] to produce pseudorandom values in a uniform distribution. The parameter values for the “seed” and other adjustment parameters are based on the value Q (number of hidden neurons) and p (dimensionality of the input feature vector). All the values of W and each line of the matrix X are normalized to have zero mean and unit variance. Finally, the logistic transfer function is used in all the neurons.

Once these fundamental procedures are determined, it is possible to construct two signatures based on Eq. 2, which becomes a vector ${{\varvec{f}}}=DZ^{T}(ZZ^{T})^{-1}$ because D is also a vector. The first signature considers only one value Q for multiples values K, as follows

$$\begin{aligned} \varvec{\varTheta }(Q)_{K_1, K_2, \ldots , K_n}=\left[ {{\varvec{f}}}_{K_1}, {{\varvec{f}}}_{K_2}, \ldots , {{\varvec{f}}}_{K_n}\right] . \end{aligned}$$

(3)

The second signature, which consists of the concatenation of the previous signature for different values Q, is determined according to the following equation

$$\begin{aligned} \varvec{\varPsi }_{Q_1,\ldots ,Q_m}=\left[ \varvec{\varTheta }(Q_1)_{K_1,\ldots , K_n}, \ldots , \varvec{\varTheta }(Q_m)_{K_1, \ldots , K_n} \right] . \end{aligned}$$

(4)

A detailed description of the randomized neural network based signature can be found in the paper [21].

3 Experiments

3.1 Pap-smear database

The pap-smear database [24] is a collection of 917 cell images extracted from cervices. The images were obtained at the Herlev University Hospital and were classified into 7 groups, which are: normal superficial squamous epithelial (74 cells); normal intermediate squamous epithelial (70 cells); normal columnar epithelial (98 cells); mild squamous non-keratinizing dysplasia (182 cells); abnormal moderate squamous non-keratinizing dysplasia (146 cells); abnormal severe squamous non-keratinizing dysplasia (197 cells); and abnormal squamous cell carcinoma in situ intermediate (150 cells). These cell images can also be classified into two groups: normal cells (242 images) and abnormal cells (675 images). In our experiments, all the images were converted into grayscale. Moreover, we addressed only the 2-class problem, since the 7-class problem is still a challenge for texture analysis methods. Figure 1 shows one sample of each class.

3.2 Classification Procedure

In the randomized neural network texture signature, we used the same parameter values adopted in the paper [21], that is, $Q=\{19,39\}$, $K=\{3,5,7\}$ for the second signature (Eq. 4) in order to establish a fair comparison with the other texture methods, in which we used parameter values according to either their respective papers or the common use. At this point, it is important to mention that, even though the paper [21] proposes a strategy to make the method more robust to rotation, we did not use it for two reasons: first, there is no orientation in the pap-smear cells; second, the method is faster without this strategy.

In order to assess the performance of the method, we compare it to other classical and recent texture analysis. They are: Co-occurrence matrices [5], Wavelets descriptors [25, 26], Tourist Walk [27], Discrete Cosine Transform (DCT) [28], Lacunarity 3D [29], Local binary patterns (LBP) [9], Gray Level Difference Matrix (GLDM) [30, 31] and Complex Network Texture Descriptor (CNTD) [32].

For classification, we used the Linear Discriminant Analysis [33], which is a classical statistical classifier that creates hyperplanes among the groups based on the their centroid vectors and the covariance matrix of all the samples. As strategy validation, we adopted the leave-one-out cross-validation, which uses one sample for testing the remainder for training. This process is repeated N times (N is the number of samples), each time with a different sample for testing. The performance measure is the average of the N accuracies.

We also obtained the AUC (Area Under the ROC Curve) [34] to compare it to the highest AUC values obtained in two recent papers [11, 35], which compared several LBP variants applied to the pap-smear database. To assess the randomized neural network signature, we used the same procedures present in these two works: a Linear Support Vector Machine (SVM) as classifier and the 5-fold cross-validation. The paper [11] does not mention the parameter values used, but the paper [35] uses the default parameter values of the LIBSVM [36], which is a public library for SVM. Thus, for a fair comparison, we also used the default parameter values of this library. Moreover, because 5-fold-cross-validation is not a deterministic strategy, we performed 101 validation runs and adopted the median AUC value as the performance measure of the randomized neural network signature.

4 Results and Discussion

Table 1 shows the comparison of the randomized neural network signature with other grayscale texture analysis methods. As one can see, the neural network approach surpasses all the compared methods in terms of accuracy. One disadvantage of the method is its excessively large number of descriptors. However, it is important to notice that its accuracy is $1.20\%$ superior to the second best method (wavelet descriptors). This percentage represents 11 more images correctly classified by the method, thus corroborating its efficiency and ability to discriminate pap-smear samples, a challenging database in which any improvement is desirable.

Table 1. Comparison of different texture analysis methods applied to the pap-smear database.

Full size table

Table 2 shows the median AUC obtained by the neural network signature and all the compared approaches, as well as the highest AUC values present in two recent papers. As it is possible to notice, the randomized neural network signature obtained the second best result among all the methods. Although this performance is already impressive, it is important to emphasize that ENS and MAG1 provide the highest AUC values of the papers [11] and [35], respectively. Thus, considering that the paper [11] applied nine LBP variants to the pap-smear database, and the paper [35] performed more than 50 tests on this same dataset, our obtained result acquires an even higher perspective and demonstrates that the randomized neural network descriptors are very discriminative in pap-smear cell images.

Table 2. Comparison of AUC (Area Under the ROC Curve) of different texture analysis methods applied to the pap-smear database

Full size table

5 Conclusion

This paper presented the application of a very discriminative texture analysis method to the highly relevant medical problem of classifying pap-smear cells. The randomized neural network texture signature obtained a high performance in this problem, surpassing all the compared methods (LDA experiment) and presenting the second best AUC value, which is comparable to the highest results of two recent papers that address the same problem. Thus, it is possible to affirm that the randomized neural network signature is suitable for the pap-smear problem, and, therefore, adds a new tool to the computer vision research focused on the Papanicolaou test.

References

Backes, A.R., Martinez, A.S., Bruno, O.M.: Texture analysis based on maximum contrast walker. Pattern Recogn. Lett. 31(12), 1701–1707 (2010)
Article Google Scholar
Kaplan, L.M.: Extended fractal analysis for texture classification and segmentation. IEEE Trans. Image Process. 8(11), 1572–1585 (1999)
Article Google Scholar
Bhosle, V.V., Pawar, V.P.: Texture segmentation: different methods. Int. J. Soft Comput. Eng. 3, 69–74 (2013)
Google Scholar
Zwiggelaar, R.: Texture based segmentation: automatic selection of co-occurrence matrices. In: ICPR, vol. I, pp. 588–591 (2004)
Google Scholar
Haralick, R.M.: Statistical and structural approaches to texture. Proc. IEEE 67(5), 786–804 (1979)
Article Google Scholar
Manjunath, B.S., Ma, W.Y.: Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 837–842 (1996)
Article Google Scholar
Dawood, H., Dawood, H., Guo, P.: Efficient texture classification using short-time Fourier transform with spatial pyramid matching. In: SMC, pp. 2275–2279. IEEE (2013)
Google Scholar
Li, C., Huang, Y., Zhu, L.: Color texture image retrieval based on Gaussian copula models of Gabor wavelets. Pattern Recogn. 64, 118–129 (2017)
Article Google Scholar
Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)
Article MATH Google Scholar
Sá Junior, J.J.M., Backes, A.R.: A simplified gravitational model to analyze texture roughness. Pattern Recogn. 45(2), 732–741 (2012)
Article Google Scholar
Nanni, L., Lumini, A., Brahnam, S.: Local binary patterns variants as texture descriptors for medical image analysis. Artif. Intell. Med. 49(2), 117–125 (2010)
Article Google Scholar
Ananthi, V.P., Balasubramaniam, P., Kalaiselvi, T.: A new fuzzy clustering algorithm for the segmentation of brain tumor. Soft. Comput. 20(12), 4859–4879 (2016)
Article Google Scholar
Li, X.Z., Williams, S., Bottema, M.J.: Texture and region dependent breast cancer risk assessment from screening mammograms. Pattern Recogn. Lett. 36, 117–124 (2014)
Article Google Scholar
Li, X.Z., Williams, S., Bottema, M.J.: Background intensity independent texture features for assessing breast cancer risk in screening mammograms. Pattern Recogn. Lett. 34(9), 1053–1062 (2013)
Article Google Scholar
Capdehourat, G., Corez, A., Bazzano, A., Alonso, R., Musé, P.: Toward a combined tool to assist dermatologists in melanoma detection from dermoscopic images of pigmented skin lesions. Pattern Recogn. Lett. 32(16), 2187–2196 (2011)
Article Google Scholar
Flores, E.S., Scharcanski, J.: Segmentation of melanocytic skin lesions using feature learning and dictionaries. Expert Syst. Appl. 56, 300–309 (2016)
Article Google Scholar
Schmidt, W.F., Kraaijveld, M.A., Duin, R.P.W.: Feedforward neural networks with random weights. In: Proceedings of the 11th IAPR International Conference on Pattern Recognition, Conference B: Pattern Recognition Methodology and Systems, vol. II, pp. 1–4 (1992)
Google Scholar
Pao, Y.H., Takefuji, Y.: Functional-link net computing: theory, system architecture, and functionalities. IEEE Comput. J. 25(5), 76–79 (1992)
Article Google Scholar
Pao, Y.H., Park, G.H., Sobajic, D.J.: Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2), 163–180 (1994)
Article Google Scholar
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)
Article Google Scholar
Sá Junior, J.J.M., Backes, A.R.: ELM based signature for texture classification. Pattern Recogn. 51, 395–401 (2016)
Article Google Scholar
Lehmer, D.H.: Mathematical methods in large scale computing units. Ann. Comput. Lab. Harvard Univ. 26, 141–146 (1951)
MathSciNet MATH Google Scholar
Park, S.K., Miller, K.W.: Random number generators: good ones are hard to find. Commun. ACM 31(10), 1192–1201 (1988)
Article MathSciNet Google Scholar
Jantzen, J., Norup, J., Dounias, G., Bjerregaard, B.: Pap-smear benchmark data for pattern classification. In: Proceedings of the NiSIS 2005, NiSIS, pp. 1–9 (2005)
Google Scholar
Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia (1992)
Book MATH Google Scholar
Chang, T., Kuo, C.J.: Texture analysis and classification with tree-structured wavelet transform. IEEE Trans. Image Process. 2(4), 429–441 (1993)
Article Google Scholar
Backes, A.R., Gonçalves, W.N., Martinez, A.S., Bruno, O.M.: Texture analysis and classification using deterministic tourist walk. Pattern Recogn. 43(3), 685–694 (2010)
Article MATH Google Scholar
Ng, I., Tan, T., Kittler, J.: On local linear transform and Gabor filter representation of texture. In: International Conference on Pattern Recognition, pp. 627–631 (1992)
Google Scholar
Backes, A.R.: A new approach to estimate lacunarity of texture images. Pattern Recogn. Lett. 34(13), 1455–1461 (2013)
Article Google Scholar
Weszka, J.S., Dyer, C.R., Rosenfeld, A.: A comparative study of texture measures for terrain classification. IEEE Trans. Syst. Man. Cybern. 6(4), 269–285 (1976)
Article MATH Google Scholar
Kim, J.K., Park, H.W.: Statistical textural features for detection of microcalcifications in digitized mammograms. IEEE Trans. Med. Imaging 18(3), 231–238 (1999)
Article Google Scholar
Backes, A.R., Casanova, D., Bruno, O.M.: Texture analysis and classification: a complex network-based approach. Inf. Sci. 219, 168–180 (2013)
Article Google Scholar
Webb, A.R.: Statistical Pattern Recognition, 2nd edn. Wiley, Chichester (2002)
Book MATH Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Nanni, L., Lumini, A., Brahnam, S.: Survey on LBP based texture descriptors for image classification. Expert Syst. Appl. 39(3), 3634–3641 (2012)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
Article Google Scholar

Download references

Acknowledgments

Jarbas Joaci de Mesquita Sá Junior thanks CNPq (National Council for Scientific and Technological Development, Brazil) (Grant: 152054/2016-2 and 453835/2017-1) for the financial support of this work. André R. Backes gratefully acknowledges the financial support of CNPq (Grant #302416/2015-3) and FAPEMIG (Foundation to the Support of Research in Minas Gerais) (Grant #APQ-03437-15). Odemir M. Bruno gratefully acknowledges the financial support of CNPq (307797/2014-7 and 484312/2013-8) and FAPESP (14/08026-1).

Author information

Authors and Affiliations

São Carlos Institute of Physics, University of São Paulo, PO Box 369, São Carlos, SP, 13560-970, Brazil
Jarbas Joaci de Mesquita Sá Junior & Odemir Martinez Bruno
Department of Computer Engineering, Campus de Sobral - Universidade Federal do Ceará, Rua Estanislau Frota, S/N, Centro, Sobral, Ceará, 62010-560, Brazil
Jarbas Joaci de Mesquita Sá Junior
School of Computer Science, Universidade Federal de Uberlândia, Av. João Naves de Ávila, 2121, Uberlândia, MG, 38408-100, Brazil
André R. Backes

Authors

Jarbas Joaci de Mesquita Sá Junior
View author publications
You can also search for this author in PubMed Google Scholar
André R. Backes
View author publications
You can also search for this author in PubMed Google Scholar
Odemir Martinez Bruno
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jarbas Joaci de Mesquita Sá Junior .

Editor information

Editors and Affiliations

Universidad Federico Santa María, Santiago, Chile
Marcelo Mendoza
Carlos III University of Madrid, Madrid, Spain
Sergio Velastín

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Mesquita Sá Junior, J.J., Backes, A.R., Bruno, O.M. (2018). Pap-smear Image Classification Using Randomized Neural Network Based Signature. In: Mendoza, M., Velastín, S. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2017. Lecture Notes in Computer Science(), vol 10657. Springer, Cham. https://doi.org/10.1007/978-3-319-75193-1_81

Download citation

DOI: https://doi.org/10.1007/978-3-319-75193-1_81
Published: 04 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75192-4
Online ISBN: 978-3-319-75193-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)