An Image Texture Analysis Method for Minority Language Identification

Brodić, Darko; Amelio, Alessia; Milivojević, Zoran N.

doi:10.1007/978-3-319-59108-7_22

An Image Texture Analysis Method for Minority Language Identification

Darko Brodić¹⁵,
Alessia Amelio¹⁶ &
Zoran N. Milivojević¹⁷

Conference paper
First Online: 17 May 2017

841 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10256))

Abstract

This paper introduces an image texture analysis method for minority language identification. In the first stage, each letter is associated with a given script type according to its energy status in the text-line area. Mapping is carried out by extracting unicode text and transforming it into coded text. There are four different script types, which correspond to four grey levels of an image. Then, the obtained image is subjected to a feature extraction process performed by the texture analysis. This way, the grey level co-occurrence matrix and its derivative features are calculated. Extracted features are compared and classified using the K-Nearest Neighbors and Naive Bayes methods to establish a difference that can identify a minority language such as Serbian language among other world languages in the text. Very good accuracy results prove the efficiency of the proposed approach, when compared to other state-of-the-art methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Brodić, D., Amelio, A., Milivojević, Z.N.: An approach to the language discrimination in different scripts using adjacent local binary pattern. J. Exp. Theor. Artif. Intell., 1–19 (2016, in press). doi:10.1080/0952813X.2016.1264090
Brodić, D., Amelio, A., Milivojević, Z.N.: Language discrimination by texture analysis of the image corresponding to the text. Neural Comput. Appl., 1–22 (2016, in press). doi:10.1007/s00521-016-2527-x
Brodić, D., Amelio, A., Milivojević, Z.N.: Clustering documents in evolving languages by image texture analysis. Appl. Intell. 46(4), 916–933 (2017)
Article Google Scholar
Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Document Analysis and Information Retrieval, Las Vegas, USA, pp. 161–175 (1994)
Google Scholar
Clausi, D.A.: An analysis of co-occurrence texture statistics as a function of grey level quantization. Can. J. Remote Sens. 28(1), 45–62 (2002)
Article Google Scholar
Confusion Matrix. http://www2.cs.uregina.ca/~dbd/cs831/notes/confusion_matrix/confusion_matrix.html
Dasarathy, B.V.: Nearest Neighbor: Pattern Classification Techniques (Nn Norms: Nn Pattern Classification Techniques). IEEE Computer Society Press, Los Alamitos (1990)
Google Scholar
Dunning, T.: Statistical Identification of Language. Technical report MCCS 94–273, New Mexico State University (1994)
Google Scholar
Dunning, T.: Statistical Identification of Language. Technical report CRLMCCS-94-273, Computing Research Lab, New Mexico State University (1994)
Google Scholar
Eleyan, A., Demirel, H.: Co-occurrence matrix and its statistical features as a new approach for face recognition. Turkish J. Electr. Eng. Comput. Sci. 19(1), 97–107 (2011)
Google Scholar
Elkan, C.: Nearest Neighbor Classification (2011). http://cseweb.ucsd.edu/~elkan/250Bwinter2010/nearestn.pdf
Grefenstette, G.: Comparing two language identification schemes. In: Statistical Analysis of Textual Data, Rome, Italy, pp. 1–6 (1995)
Google Scholar
Grothe, L., De Luca, E.W., Nurnberger, A.: A comparative study on language identification methods. In: Language Resources and Evaluation, Marrakech, Morocco, pp. 980–985 (2008)
Google Scholar
Haralick, R.M., Shanmugan, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3(6), 610–621 (1978)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York (2009)
Google Scholar
Kornai, A.: Digital language death. PLoS ONE 8(10), 1–11 (2013)
Article Google Scholar
Newsam, S., Kamath, C.: Comparing shape and texture features for pattern recognition in simulation data. In: Image Processing: Algorithms and Systems IV, San Jose, USA, pp. 1–14 (2005)
Google Scholar
Padro, M., Padro, L.: Comparing methods for language identification. In: XXCongreso de la Sociedad Espanola para el Procesamiento del Lenguage Natural, Barcelona, Spain, pp. 155–161 (2004)
Google Scholar
Proietti, A., Panella, M., Leccese, F., Svezia, E.: Dust detection and analysis in museum environment based on pattern recognition. Measurement 66, 62–72 (2015)
Article Google Scholar
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd ed. Prentice Hall (2003). [1995]
Google Scholar
Sibun, P., Spitz, A.L.: Language determination: natural language processing from scanned document images. In: 4th Conference on Applied Natural Language Processing, Stuttgart, Germany, pp. 15–21 (1994)
Google Scholar
Souter, C., Churcher, G., Hayes, J., Hughes, J., Johnson, S.: Natural language identification using corpus-based models. Hermes J. Linguist. 13, 183–203 (1994)
Google Scholar
Takcı, H., Soğukpınar, İ.: Letter based text scoring method for language identification. In: Yakhno, T. (ed.) ADVIS 2004. LNCS, vol. 3261, pp. 283–290. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30198-1_29
Chapter Google Scholar
Wackerly, D.D., Mendenhall, W., Scheaffer, R.L.: Mathematical Statistics with Applications. Duxbury Press, Belmont (1996)
MATH Google Scholar
Web 2014. http://w3techs.com/technologies/overview/content_language/all
Zramdini, A.W., Ingold, R.: Optical font recognition using typographical features. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 877–882 (1998)
Article Google Scholar

Download references

Acknowledgments

This work was partially supported by the Grant of the Ministry of Education, Science and Technological development of the Republic Serbia within the project TR33037.

Author information

Authors and Affiliations

Technical Faculty in Bor, University of Belgrade, V.J. 12, 19210, Bor, Serbia
Darko Brodić
DIMES, University of Calabria, Via Pietro Bucci Cube 44, 87036, Rende, CS, Italy
Alessia Amelio
College of Applied Technical Sciences, Aleksandra Medvedeva 20, 18000, Niš, Serbia
Zoran N. Milivojević

Authors

Darko Brodić
View author publications
You can also search for this author in PubMed Google Scholar
Alessia Amelio
View author publications
You can also search for this author in PubMed Google Scholar
Zoran N. Milivojević
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Darko Brodić .

Editor information

Editors and Affiliations

SUNY Buffalo State, Buffalo, New York, USA
Valentin E. Brimkov
Department of Applied Professional Studies, SUNY Fredonia, Fredonia, New York, USA
Reneta P. Barneva

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brodić, D., Amelio, A., Milivojević, Z.N. (2017). An Image Texture Analysis Method for Minority Language Identification. In: Brimkov, V., Barneva, R. (eds) Combinatorial Image Analysis. IWCIA 2017. Lecture Notes in Computer Science(), vol 10256. Springer, Cham. https://doi.org/10.1007/978-3-319-59108-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-59108-7_22
Published: 17 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59107-0
Online ISBN: 978-3-319-59108-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics