Abstract
The analysis of the co-occurrence patterns between words allows for a better understanding of the use (and meaning) of words and its most straightforward applications are lexicography and linguist description in general. Some tools already produce co-occurrence information about words taken from Portuguese corpora, but few can use lemmata or syntactic dependency information. Syntax Deep Explorer is a new tool that uses several association measures to quantify several co-occurrence types, defined on the syntactic dependencies (e.g. subject, complement, modifier) between a target word lemma and its co-locates. The resulting co-occurrence statistics is represented in lex-grams, that is, a synopsis of the syntactically-based co-occurrence patterns of a word distribution within a given corpus. These lex-grams are obtained from a large-sized Portuguese corpus processed by STRING [19] and are presented in a user-friendly way through a graphical interface. The Syntax Deep Explorer will allow the development of finer lexical resources and the improvement of STRING processing in general, as well as providing public access to co-occurrence information derived from parsed corpora.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
string.l2f.inesc-id.pt/demo/deepExplorer (last visit 29/02/2016).
- 2.
www.l2f.inesc-id.pt (last visit 29/02/2016).
- 3.
http://www.linguateca.pt/ACDC/ (last visit on 29/02/2016).
- 4.
http://grammarsoft.com/ (last visit 29/02/2016).
- 5.
http://gramtrans.com/gramtrans (last visit on 29/02/2016).
- 6.
http://www.sketchengine.co.uk/ (last visit 29/02/2016).
- 7.
http://www.sqlite.org/about (last visit 29/02/2016).
- 8.
https://bitbucket.org/xerial/sqlite-jdbc (last visit 29/02/2016).
- 9.
https://angularjs.org (last visit 29/02/2016).
References
Art-Mokhtar, S., Chanod, J.P., Roux, C.: Robustness beyond shallowness: incremental deep parsing. Nat. Lang. Eng. 8, 121–144 (2002)
Bick, E.: The Parsing System PALAVRAS. Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press, Aarhus (2000)
Bick, E.: DeepDict - a graphical corpus-based dictionary of word relations. In: Proceedings of NODALIDA 2009. NEALT Proceedings Series, vol. 4, pp. 268–271. Tartu University Library, Tartu (2009)
Biemann, C., Bordag, S., Heyer, G., Quasthoff, U., Wolff, C.: Language-independent methods for compiling monolingual lexical data. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 217–228. Springer, Heidelberg (2004)
Carapinha, F.: Extração Automática de Conteúdos Documentais. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, June 2013
Chen, P.: The entity-relationship model—toward a unified view of data. ACM Trans. Database Syst. 1(1), 9–36 (1976)
Church, K., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)
Codd, E.: A relational model of data for large shared data banks. Commun. ACM 26(6), 64–69 (1983)
Dice, L.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Diniz, C., Mamede, N., Pereira, J.: RuDriCo2 - a faster disambiguator and segmentation modifier. In: INFORUM II, pp. 573–584, September 2010
Diniz, C., Mamede, N., Pereira, J.D.: RuDriCo2 - a faster disambiguator and segmentation modifier. In: Simpósio de Informática - INForum, pp. 573–584. Universidade do Minho, Portugal (2010)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)
Hagège, C., Baptista, J., Mamede, N.: Identificação, Classificação e Normalização de Expressões Temporais em Português: a Experiência do Segundo HAREM e o Futuro. In: Mota, C., Santos, D. (eds.) Desafios na Avaliação Conjunta do Reconhecimento de Entidades Mencionadas: o Segundo HAREM, chap. 2, pp. 33–54. Linguateca (2008). http://www.inesc-id.pt/ficheiros/publicacoes/5758.pdf/
Hagège, C., Baptista, J., Mamede, N.: Portuguese temporal expressions recognition: from TE characterization to an effective TER module implementation. In: 7th Brazilian Symposium in Information and Human Language Technology, STIL 2009, pp. 1–5. Sociedade Brasileira de Computação, São Carlos (2009)
Hagège, C., Baptista, J., Mamede, N.J.: Reconhecimento de entidadesmencionadas com o xip: Uma colaboração entre o inesc-l2f e a xerox. In: Mota, C., Santos, D. (eds.) Desafios na avaliação conjunta doreconhecimento de entidades mencionadas: Actas do Encontro do Segundo HAREM (Aveiro, 11 de Setembro de 2008). Linguateca (2009)
Hagège, C., Baptista, J., Mamede, N.J.: Caracterização e processamento de expressões temporais em português. Linguamática 2(1), 63–76 (2010)
Kilgarriff, A., et al.: The sketch engine: ten years on. Lexicography 1(1), 7–36 (2014)
Kilgarriff, A., Rychly, P., Tugwell, D., Smrz, P.: The sketch engine. In: Proceedings of Euralex. vol. Demo Session, pp. 105–116. Lorient, France, July 2004
Mamede, N., Baptista, J., Diniz, C., Cabarrão, V.: STRING: an hybrid statistical and rule-based natural language processing chain for Portuguese. In: PROPOR 2012, vol. Demo Session, April 2012
Mamede, N.J., Baptista, J.: Nomenclature of chunks and dependencies in Portuguese XIP Grammar 4.5. Technical report, L2F-Spoken Language Laboratory, INESC-ID Lisboa, Lisboa, January 2016
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Marques, J.S.: Anaphora Resolution. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Lisboa (2013)
Maurício, A.: Identificação, Classificação e Normalização de Expressões Temporais. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Lisboa, November 2011
Nobre, N.: Resolução de Expressões Anafóricas. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, June 2011
Oliveira, D.: Extraction and Classification of Named Entities. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa (2010)
Pereira, S.: Linguistics Parameters for Zero Anaphora Resolution. Master’s thesis, Universidade do Algarve and University of Wolverhampton (2010)
Quasthoff, U., Richter, M., Biemann, C.: Corpus portal for search in monolingual corpora. In: Proceedings of the 5th LREC, pp. 1799–1802 (2006)
Ribeiro, R.: Anotação Morfossintática Desambiguada do Português. Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, March 2003
Rychly, P.: Manatee/Bonito - a modular corpus manager. In: Sojka, P., Horák, A. (eds.) RASLAN 2008, pp. 65–70. Masaryk University, Brno (2007)
Rychly, P.: A lexicographer-friendly association score. In: RASLAN 2008, pp. 6–9. Masarykova Univerzita, Brno (2008)
Santos, D., Rocha, P.: Evaluating CETEMPúblico, a free resource for Portuguese. In: Proceedings of the 39th Annual Meeting of ACL, ACL 2001, pp. 450–457. Association for Computational Linguistics, Stroudsburg (2001)
Silberschatz, A., Korth, H., Sudarshan, S.: Database System Concepts. Connect, learn, succeed. McGraw-Hill Education (2010)
Sinclair, J.: Corpus, Concordance, Collocation. Oxford University Press, Oxford (1991)
Smadja, F., McKeown, K., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: a statistical approach. Comput. Linguist. 22(1), 1–38 (1996)
Vicente, A.M.F.: LexMan: um Segmentador e Analisador Morfológico com Transdutores. Master’s thesis, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, June 2013
Acknowledgment
This work was supported by national funds through FCT–Fundação para a Ciência e a Tecnologia, ref. UID/CEC/50021/2013. Thanks to Neuza Costa (UAlg) for revising the final version of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Correia, J., Baptista, J., Mamede, N. (2016). Syntax Deep Explorer. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-41552-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41551-2
Online ISBN: 978-3-319-41552-9
eBook Packages: Computer ScienceComputer Science (R0)