Language Identification as Process Prediction Using WoMan

Ferilli, Stefano; Esposito, Floriana; Redavid, Domenico; Angelastro, Sergio

doi:10.1007/978-3-319-68130-6_13

Stefano Ferilli^11,12,
Floriana Esposito^11,12,
Domenico Redavid¹³ &
…
Sergio Angelastro¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 733))

Included in the following conference series:

Italian Research Conference on Digital Libraries

459 Accesses

Abstract

Several high-level tasks in the management of Digital Libraries require the application of Natural Language Processing (NLP) techniques. In turn, most NLP solutions are based on linguistic resources that are costly to produce, and so motivate research for automated ways to build them. In particular, Language Identification is a crucial NLP task, that is preliminary to almost all the others, since different linguistic resources must be used for different languages. This paper investigates process mining and management approaches as a possible solution to the Language Identification problem. Specifically, it casts language identification as a process prediction task, and exploits the WoMan framework to carry it out. Experimental results are encouraging and suggest to further explore this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmed, B., Cha, S.H., Tappert, C.: Language identification from text using n-gram based cumulative frequency addition. In: Proceedings of Student/Faculty Research Day, p. 12-1. CSIS, Pace University (2004)
Google Scholar
Brown, P.F., deSouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)
Google Scholar
Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proceedings of 3rd Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1994), pp. 161–175 (1994)
Google Scholar
Ferilli, S.: WoMan: logic-based workflow learning and management. IEEE Trans. Syst. Man Cybern. Syst. 44, 744–756 (2014)
Article Google Scholar
Ferilli, S., Esposito, F.: A logic framework for incremental learning of process models. Fundamenta Informaticae 128, 413–443 (2013)
MathSciNet MATH Google Scholar
Ferilli, S., Esposito, F., Grieco, D.: Automatic learning of linguistic resources for stopword removal and stemming from text. Procedia Comput. Sci. 38(C), 116–123 (2014)
Article Google Scholar
Ferilli, S.: The WoMan formalism for expressing process models. In: Perner, P. (ed.) ICDM 2016. LNCS, vol. 9728, pp. 363–378. Springer, Cham (2016). doi:10.1007/978-3-319-41561-1_27
Chapter Google Scholar
Ferilli, S.: Natural language processing. In: Ferilli, S. (ed.) Automatic Digital Document Processing. Advances in Pattern Recognition, pp. 131–155. Springer, Cham (2015). doi:10.1007/978-0-85729-198-1_6
Google Scholar
van der Aalst, W., et al.: Process mining manifesto. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM 2011. LNBIP, vol. 99, pp. 169–194. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28108-2_19
Chapter Google Scholar
Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling (2016)
Google Scholar
Martins, B., Silva, M.: Language identification in web pages. In: Proceedings of the 2005 ACM symposium on Applied Computing, pp. 764–768. ACM (2005)
Google Scholar
Mathew, T.: Text categorization using n-grams and hidden Markov models (2006). http://www.slideshare.net/thomas_a_mathew/text-categorization-using-ngrams-and-hiddenmarkovmodel
Messito, A.: Cuntame nnu cuntu! PhotoCity (2014)
Google Scholar
Muggleton, S.: Inductive logic programming. New Gener. Comput. 8(4), 295–318 (1991)
Article MATH Google Scholar
Nagarajan, T., Murthy, H.: Language identification using parallel syllable like unit recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), vol. 1, p. I-401. IEEE (2004)
Google Scholar
Pesic, M., van der Aalst, W.M.P.: A declarative approach for flexible business processes management. In: Eder, J., Dustdar, S. (eds.) BPM 2006. LNCS, vol. 4103, pp. 169–180. Springer, Heidelberg (2006). doi:10.1007/11837862_18
Chapter Google Scholar
Pierce, J.: Symbols, Signals and Noise: The Nature and Process of Communication. Harper, New York (1961)
Google Scholar
Rotella, F., Leuzzi, F., Ferilli, S.: Learning and exploiting concept networks with connektion. Appl. Intell. 42(1), 87–111 (2015)
Article Google Scholar
Sibun, P., Reynar, J.C.: Language identification: examining the issues (1996)
Google Scholar
Vatanen, T., Väyrynen, J.J., Virpioja, S.: Language identification of short text segments with n-gram models. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). European Language Resources Association (ELRA), Valletta, May 2010
Google Scholar
Weijters, A., van der Aalst, W.: Rediscovering workflow models from event-based data. In: Proceedings of the 11th Dutch-Belgian Conference of Machine Learning (Benelearn 2001), pp. 93–100 (2001)
Google Scholar

Download references

Acknowledgments

This work was partially funded by the Italian PON 2007–2013 project PON02_00563_3489339 ‘Puglia@Service’.

Author information

Authors and Affiliations

Dipartimento di Informatica, Università di Bari, Bari, Italy
Stefano Ferilli, Floriana Esposito & Sergio Angelastro
Centro Interdipartimentale per la Logica e sue Applicazioni, Università di Bari, Bari, Italy
Stefano Ferilli & Floriana Esposito
Artificial Brain S.r.l., Bari, Italy
Domenico Redavid

Authors

Stefano Ferilli
View author publications
You can also search for this author in PubMed Google Scholar
Floriana Esposito
View author publications
You can also search for this author in PubMed Google Scholar
Domenico Redavid
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Angelastro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefano Ferilli .

Editor information

Editors and Affiliations

University of Modena and Reggio Emilia, Modena, Italy
Costantino Grana
University of Modena and Reggio Emilia, Modena, Italy
Lorenzo Baraldi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferilli, S., Esposito, F., Redavid, D., Angelastro, S. (2017). Language Identification as Process Prediction Using WoMan. In: Grana, C., Baraldi, L. (eds) Digital Libraries and Archives. IRCDL 2017. Communications in Computer and Information Science, vol 733. Springer, Cham. https://doi.org/10.1007/978-3-319-68130-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-68130-6_13
Published: 11 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68129-0
Online ISBN: 978-3-319-68130-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics