Abstract
At the moment, a huge amount of scientific articles is available, referring to a wide variety of topics like medicine, technology, economics, finance, and so on. Scientific papers show results of scientific interest and also present the evaluation and interpretation of relevant arguments. Due to the fact that these papers are created with a high frequency it is feasible to analyze how people write in a given domain. Within the discipline of natural language processing there are different approaches to analyze large amounts of text corpus. Identification patterns with semantic elements in a text, let us classify and examine the corpus to facilitate interpretation and management of information through computers. At the moment, a semiautomatic or automatic way to generate natural language patterns is not available or quite complicated. In the paper, it is shown how a tool developed for this research is tested in a domain of public health. The results obtained – by means of a tool and aided by graphs – provide groups of words that are used (to determine if they come from a specific vocabulary), most common grammatical categories, most repeated words in a domain, patterns found, and frequency of patterns found. A domain of public health has been selected containing 800 papers concerning different topics referring to genetics. The topics include mutations, genetic deafness, DNA, trinucleotide, suppressor genes, among others. An ontology of public health has been used to provide the basis of the study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abney, S.: Part-of-speech tagging and partial parsing. In: Young, S., Bloothooft, G. (eds.) Corpus-Based Methods in Language and Speech Processing. An ELSNET book. Bluwey Academic Publishers, Dordrecht (1997)
Alonso, L.: Herramientas Libres para Procesamiento del Lenguaje Natural. Facultad de Matemática, Astronomía y Física. UNC, Córdoba, Argentina. 5tas Jornadas Regionales de Software Libre. 20 de noviembre de 2005. http://www.cs.famaf.unc.edu.ar/~laura/freeNLP
Amsler, R.A.: A taxonomy for English nouns and verbs. In: Proceedings of the 19th Annual Meeting of the Association for Computational Linguistic, Stanford, California, pp. 133–138 (1981)
Carreras, X., Márquez, L.: Phrase recognition by filtering and ranking with perceptrons. In: Proceedings of the 4th RANLP Conference, Borovets, Bulgaria, September 2003
Cowie, J., Wilks, Y.: Information Extraction. In: Dale, R. (ed.) Handbook of Natural Language Processing, pp. 241–260. Marcel Dekker, New York (2000)
Dale, R.: Symbolic approaches to natural language processing. In: Dale, R. (ed.) Handbook of Natural Language Processing. Marcel Dekker, New York (2000)
Gómez-Pérez, A., Fernando-López, M., Corcho, O.: Ontological Engineering. Springer, London (2004)
Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages and Computations. Addison-Wesley, Reading (1979)
Llorens, J., Morato, J., Genova, G.: RSHP: an information representation model based on relationships. In: Damiani, E., Jain, L.C., Madravio, M. (eds.) Soft Computing in Software Engineering. Studies in Fuzziness and Soft Computing Series, vol. 159, pp. 221–253. Springer, Heidelberg (2004)
Llorens, J.: Definición de una Metodología y una Estructura de Repositorio orientadas a la Reutilización: el Tesauro de Software. Universidad Carlos III (1996)
Christopher, Manning: Foundations of Statistic Natural Language Processing, p. 81. Cambridge University, Cambridge (1999)
Martí, M.A., Llisterri, J.: Tratamiento del lenguaje natural, p. 207. Universitat de Barcelona, Barcelona (2002)
Moreno, V.: Representación del conocimiento de proyectos de software mediante técnicas automatizadas. Anteproyecto de Tesis Doctoral. Universidad Carlos III de Madrid, Marzo (2009)
Poesio, M.: Semantic Analysis. In: Dale, R. (ed.) Handbook of Natural Language Processing. Marcel Dekker, New York (2000)
Rehberg, C.P.: Automatic pattern generation in natural language processing. United States Patent. US 8,180,629 B2, 15 May 2012, January 2010
Riley, M.D.: Some applications of tree-based modeling to speech and language indexing. In: Proceedings of the DARPA Speech and Natural Language Workshop. Morgan Kaufmann, California, pp. 339–352 (1989)
Suarez, P., Moreno, V., Fraga, A., Llorens, J.: Automatic generation of semantic patterns using techniques of natural language processing. In: SKY, pp. 34–44 (2013)
Thomason, R.H.: What is Semantics? Version 2. 27 March 2012. http://web.eecs.umich.edu/~rthomaso/documents/general/what-is-semantics.html
Triviño, J.L., Morales Bueno, R.: A Spanish POS tagger with variable memory. In: Proceedings of the Sixth International Workshop on Parsing Technologies (IWPT-2000). ACL/SIGPARSE, Trento, Italia, pp. 254–265 (2000)
Weischedel, R., Metter, M., Schwartz, R., Ramshaw, L., Palmucci, J.: Coping with ambiguity and unknown through probabilistic models. Comput. Linguist. 19 369–382
Acknowledgements
The Authors Thank the AGO2 Project, Founded by the Ministry of Education of Spain for Aiding the Author in the Research and Production of This Paper.
The Research Leading to These Results Has Received Funding from the European Union’s Seventh Framework Program (FP7/2007-2013) for Crystal – Critical System Engineering Acceleration Joint Undertaking under Grant Agreement No 332830 and from Specific National Programs and/or Funding Authorities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Fraga, A., Llorens, J., Parra, E., Moreno, V. (2016). Automatic Pattern Generator of Natural Language Text Applied in Public Health. In: Fred, A., Dietz, J., Aveiro, D., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2015. Communications in Computer and Information Science, vol 631. Springer, Cham. https://doi.org/10.1007/978-3-319-52758-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-52758-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52757-4
Online ISBN: 978-3-319-52758-1
eBook Packages: Computer ScienceComputer Science (R0)