Learning from syntax generalizations for automatic semantic annotation

Boella, Guido; Caro, Luigi Di; Ruggeri, Alice; Robaldo, Livio

doi:10.1007/s10844-014-0320-9

Learning from syntax generalizations for automatic semantic annotation

Published: 27 May 2014

Volume 43, pages 231–246, (2014)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Guido Boella¹,
Luigi Di Caro¹,
Alice Ruggeri¹ &
…
Livio Robaldo¹

593 Accesses
14 Citations
Explore all metrics

Abstract

Nowadays, there is a huge amount of textual data coming from on-line social communities like Twitter or encyclopedic data provided by Wikipedia and similar platforms. This Big Data Era created novel challenges to be faced in order to make sense of large data storages as well as to efficiently find specific information within them. In a more domain-specific scenario like the management of legal documents, the extraction of semantic knowledge can support domain engineers to find relevant information in more rapid ways, and to provide assistance within the process of constructing application-based legal ontologies. In this work, we face the problem of automatically extracting structured knowledge to improve semantic search and ontology creation on textual databases. To achieve this goal, we propose an approach that first relies on well-known Natural Language Processing techniques like Part-Of-Speech tagging and Syntactic Parsing. Then, we transform these information into generalized features that aim at capturing the surrounding linguistic variability of the target semantic units. These new featured data are finally fed into a Support Vector Machine classifier that computes a model to automate the semantic annotation. We first tested our technique on the problem of automatically extracting semantic entities and involved objects within legal texts. Then, we focus on the identification of hypernym relations and definitional sentences, demonstrating the validity of the approach on different tasks and domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Diksha Khurana, Aditya Koli, … Sukhdev Singh

Natural Language Processing

Information extraction from electronic medical documents: state of the art and future research directions

Article 08 November 2022

Mohamed Yassine Landolsi, Lobna Hlaoua & Lotfi Ben Romdhane

Notes

http://www.statisticbrain.com/twitter-statistics/
http://www.telegraph.co.uk/technology/twitter/9945505/Twitter-in-numbers.html
http://www.wikipedia.org/
http://nlp.stanford.edu/software/index.shtml
We only used the constraint that the hypernym has to be different from the hyponym.
http://thinknook.com/wp-content/uploads/2012/09/Sentiment-Analysis-Dataset.zip

References

Berland, M., & Charniak, E. (1999). Finding parts in very large corpora. In Annual meeting association for computational linguistics (Vol. 37, pp. 57–64). Association for computational linguistics.
Biagioli, C., Francesconi, E., Passerini, A., Montemagni, S., Soria, C. (2005). Automatic semantics extraction in law documents. In Proceedings of the 10th international conference on artificial intelligence and law: ICAIL (pp. 133–140). ACM.
Biemann, C. (2005). Ontology learning from text: a survey of methods. In LDV forum (Vol. 20, pp. 75–93).
Boella, G., di Caro, L., Humphreys, L., Robaldo, L., van der Torre, L. (2012). Nlp challenges for eunomos, a tool to build and manage legal knowledge. In Proceedings of the 8th international conference on language resources and evaluation (LREC).
Boella, G., & Di Caro, L. (2013). Supervised learning of syntactic contexts for uncovering definitions and extracting hypernym relations in text databases. In Machine learning and knowledge discovery in databases (pp. 64–79). Berlin Heidelberg: Springer.
Boella, G., Di Caro, L., Robaldo, L. (2013). Semantic relation extraction from legislative text using genera-lized syntactic dependencies and support vector machines. In Theory, practice, and applications of rules on the web (pp. 218–225). Berlin Heidelberg: Springer.
Boella, G., Martin, M., Rossi, P., van der Torre, L., Violato, A. (2012). Eunomos, a legal document and knowledge management system for regulatory compliance. In Proceedings of information systems: a crossroads for organization, management, accounting and engineering (ITAIS) conference. Berlin: Springer.
Borg, C., Rosner, M., Pace, G. (2009). Evolutionary algorithms for definition extraction. In Proceedings of the 1st workshop on definition extraction (pp. 26–32). Association for computational linguistics.
Buitelaar, P., Cimiano, P., Magnini, B. (2005). Ontology learning from text: an overview. Ontology learning from text: methods, evaluation and applications, 123, 3–12.
Google Scholar
Candan, K., Di Caro, L., Sapino, M. (2008). Creating tag hierarchies for effective navigation in social media. In Proceedings of the 2008 ACM workshop on search in social media (pp. 75–82). ACM.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297.
MATH Google Scholar
Cui, H., Kan, M.Y., Chua, T.S. (2007). Soft pattern matching models for definitional question answering. ACM transactions on information systems, 25(2). doi:10.1145/1229179.1229182.
Del Gaudio, R., & Branco, A. (2007). Automatic extraction of definitions in portuguese: a rule-based approach. Progress in artificial intelligence, 659–670.
Fahmi, I., & Bouma, G. (2006). Learning to identify definitions using syntactic features. In Proceedings of the EACL 2006 workshop on learning structured information in natural language applications (pp. 64–71).
Ferraresi, A., Zanchetta, E., Baroni, M., Bernardini, S. (2008). Introducing and evaluating ukwac, a very large web-derived corpus of english. In Proceedings of the 4th web as corpus workshop (WAC-4) can we beat Google (pp. 47–54).
Fortuna, B., Mladenič, D., Grobelnik, M. (2006). Semi-automatic construction of topic ontologies. Semantics, Web and Mining, 121–131.
Gibson, J. (1977). The concept of affordances. Perceiving, acting, and knowing, 67–82.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H. (2009). The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10–18.
Article Google Scholar
Harris, Z. (1954). Distributional structure. Word, 10(23), 146–162.
Google Scholar
Hearst, M. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on computational linguistics (Vol. 2, pp. 539–545). Association for computational linguistics.
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G. (2012). Yago2: a spatially and temporally enhanced knowledge base from wikipedia. Artificial intelligence.
Hovy, E., Philpot, A., Klavans, J., Germann, U., Davis, P., Popper, S. (2003). Extending metadata definitions by automatically extracting and organizing glossary definitions. In Proceedings of the 2003 annual national conference on digital government research (pp. 1–6). Digital Government Society of North America.
Klavans, J., & Muresan, S. (2001). Evaluation of the definder system for fully automatic glossary construction. In Proceedings of the AMIA symposium (p. 324). American medical informatics association.
Lesmo, L. (2009). The turin university parser at evalita 2009. Proceedings of EVALITA, 9.
Lesmo, L., Mazzei, A., Palmirani, M., Radicioni, D.P. (2013). Tulsi: an nlp system for extracting legal modificatory provisions. Artificial intelligence and law, 1–34.
de Maat, E., Krabben, K., Winkels, R. (2010). Machine learning versus knowledge based classification of legal texts. In Proceedings of legal knowledge and information systems conference: JURIX 2010 (pp. 87–96). IOS Press. http://portal.acm.org/citation.cfm?id=1940559.1940573.
Miller, G.A. (1995). Wordnet: a lexical database for english. Communications of the ACM, 38(11), 39–41.
Article Google Scholar
Moschitti, A., & Bejan, C.A. (2004). A semantic kernel for predicate argument classification. In CoNLL-2004.
Navigli, R., & Ponzetto, S.P. (2010). Babelnet: building a very large multilingual semantic network. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 216–225). Association for computational linguistics.
Navigli, R., & Velardi, P. (2010). Learning word-class lattices for definition and hypernym extraction. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 1318–1327). Uppsala: Association for computational linguistics. http://www.aclweb.org/anthology/P10-1134.
Navigli, R., Velardi, P., Ruiz-Martnez, J.M. (2010). An annotated dataset for extracting definitions and hypernyms from the web. In Proceedings of the 7th international conference on language resources and evaluation (LREC’10). Valletta: European Language Resources Association (ELRA).
Norman, D.A. (1999). Affordance, conventions, and design. Interactions, 6(3), 38–43.
Article Google Scholar
Ponzetto, S., & Strube, M. (2007). Deriving a large scale taxonomy from wikipedia. In Proceedings of the national conference on artificial intelligence (Vol. 22, p. 1440). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press. 1999.
Salton, G., Wong, A., Yang, C.S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620. doi:10.1145/361219.361220.
Article MATH Google Scholar
Storrer, A., & Wellinghoff, S. (2006). Automated detection and annotation of term definitions in german text corpora. In Proceedings of LREC (Vol. 2006).
Velardi, P., Faralli, S., Navigli, R. (2012). Ontolearn reloaded: a graph-based algorithm for taxonomy induction.
Westerhout, E. (2009). Definition extraction using linguistic and structural features. In Proceedings of the 1st workshop on definition extraction, WDE ’09 (pp. 61–67). Stroudsburg: association for computational linguistics. http://dl.acm.org/citation.cfm?id=1859765.1859775.
Yamada, I., Torisawa, K., Kazama, J., Kuroda, K., Murata, M., De Saeger, S., Bond, F., Sumida, A. (2009). Hypernym discovery based on distributional similarity and hierarchical structures. In Proceedings of the 2009 conference on empirical methods in natural language processing (Vol. 2, pp. 929–937). Association for computational linguistics.
Yang, H., & Callan, J. (2008). Ontology generation for large email collections. In Proceedings of the 2008 international conference on digital government research (pp. 254–261). Digital Government Society of North America.
Zhang, C., & Jiang, P. (2009). Automatic extraction of definitions. In: 2nd IEEE international conference on computer science and information technology, 2009. ICCSIT 2009 (pp. 364–368). doi:10.1109/ICCSIT.2009.5234687.

Download references

Author information

Authors and Affiliations

University of Turin, Torino, Italy
Guido Boella, Luigi Di Caro, Alice Ruggeri & Livio Robaldo

Authors

Guido Boella
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Di Caro
View author publications
You can also search for this author in PubMed Google Scholar
Alice Ruggeri
View author publications
You can also search for this author in PubMed Google Scholar
Livio Robaldo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luigi Di Caro.

Additional information

The work has been funded by the project ITxLaw with Compagnia di San Paolo

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boella, G., Caro, L.D., Ruggeri, A. et al. Learning from syntax generalizations for automatic semantic annotation. J Intell Inf Syst 43, 231–246 (2014). https://doi.org/10.1007/s10844-014-0320-9

Download citation

Received: 01 October 2013
Revised: 26 March 2014
Accepted: 30 March 2014
Published: 27 May 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s10844-014-0320-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning from syntax generalizations for automatic semantic annotation

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural Language Processing

Information extraction from electronic medical documents: state of the art and future research directions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning from syntax generalizations for automatic semantic annotation

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Natural Language Processing

Information extraction from electronic medical documents: state of the art and future research directions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation