Abstract
Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful. In recent years, there have been significant advances in the accuracy of parsing algorithms. In this article, we perform an empirical, task-oriented evaluation to determine how parsing accuracy influences the performance of a state-of-the-art rule-based sentiment analysis system that determines the polarity of sentences from their parse trees. In particular, we evaluate the system using four well-known dependency parsers, including both current models with state-of-the-art accuracy and more innacurate models which, however, require less computational resources. The experiments show that all of the parsers produce similarly good results in the sentiment analysis task, without their accuracy having any relevant influence on the results. Since parsing is currently a task with a relatively high computational cost that varies strongly between algorithms, this suggests that sentiment analysis researchers and users should prioritize speed over accuracy when choosing a parser; and parsing researchers should investigate models that improve speed further, even at some cost to accuracy.
Similar content being viewed by others
Notes
MaltParser often requires feature optimization to obtain acceptable results for the target language.
The results obtained in these corpora are slightly different from the ones reported by Vilares et al. (2017), due to the different tokenization techniques used in this work.
References
Andor D, Alberti C, Weiss D, Severyn A, Presta A, Ganchev K, Petrov S, Collins M (2016) Globally normalized transition-based neural networks. arXiv: 1603.06042 [cs.CL]
Asmi A, Ishaya T (2012) Negation identification and calculation in sentiment analysis. In: The second international conference on advances in information mining and management, pp 1–7
Aue A, Gamon M (2005) Customizing sentiment classifiers to new domains: a case study. In: Proceedings of the 5th international conference on recent advances in natural language processing (RANLP 2015), Borovets, Bulgaria. https://www.microsoft.com/en-us/research/publication/customizing-sentiment-classifiers-to-new-domains-a-case-study/
Ballesteros M, Nivre J (2012) Maltoptimizer: a system for maltparser optimization. In: Chair NCC, Choukri K, Declerck T, Dogan MU, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (eds) Proceedings of the eight international conference on language resources and evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul
Bender EM, Flickinger D, Oepen S, Zhang Y (2011) Parser evaluation over local and non-local deep dependencies in a large corpus. In: Proceedings of the 2011 conference on empirical methods in natural language processing, Association for Computational Linguistics, Edinburgh, Scotland, UK, pp 397–408. http://www.aclweb.org/anthology/D11-1037
Berzak Y, Huang Y, Barbu A, Korhonen A, Katz B (2016) Bias and agreement in syntactic annotations. arXiv:1605.04481 [cs.CL]
Branavan SRK, Silver D, Barzilay R (2012) Learning to win by reading manuals in a monte-carlo framework. J Artif Int Res 43(1):661–704. http://dl.acm.org/citation.cfm?id=2387915.2387932
Buyko E, Hahn U (2010) Evaluating the impact of alternative dependency graph encodings on solving event extraction tasks. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Association for Computational Linguistics, Cambridge, MA, pp 982–992. http://www.aclweb.org/anthology/D10-1096
Chen D, Manning C (2014) A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 740–750. http://www.aclweb.org/anthology/D14-1082
Choi JD, McCallum A (2013) Transition-based dependency parsing with selectional branching. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 1: long papers), Sofia, Bulgaria, pp 1052–1062. http://www.aclweb.org/anthology/P13-1104
Clark S, Copestake A, Curran JR, Zhang Y, Herbelot A, Haggerty J, Ahn BG, Wyk CV, Roesner J, Kummerfeld J, Dawborn T (2009) Large-scale syntactic processing: parsing the web. Technical report. Johns Hopkins University
Cohen SB, Gómez-Rodríguez C, Satta G (2011) Exact inference for generative probabilistic non-projective dependency parsing. In: Proceedings of the 2011 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, pp 1234–1245. http://www.aclweb.org/anthology/D11-1114
DeNeefe S, Knight K (2009) Synchronous tree adjoining machine translation. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Association for Computational Linguistics, Singapore, pp 727–736. http://www.aclweb.org/anthology/D/D09/D09-1076
Dyer C, Ballesteros M, Ling W, Matthews A, Smith NA (2015) Transition-based dependency parsing with stack long short-term memory. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), Association for Computational Linguistics, Beijing, China, pp 334–343. http://www.aclweb.org/anthology/P15-1033
Eisner J (1996) Three new probabilistic models for dependency parsing: an exploration. In: Proceedings of the 16th international conference on computational linguistics (COLING-96), San Francisco, CA, USA, pp 340–345
Farghaly A, Shaalan K (2009) Arabic natural language processing: challenges and solutions. ACM Trans Asian Lang Inf Process (TALIP) 8(4):14:1–14:22. doi:10.1145/1644879.1644881
Goldberg Y, Nivre J (2012) A dynamic oracle for arc-eager dependency parsing. In: Proceedings of the 24th international conference on computational linguistics (COLING), Association for Computational Linguistics, pp 959–976. http://aclweb.org/anthology/C/C12/C12-1059.pdf
Gómez-Rodríguez C (2016) Restricted non-projectivity: coverage vs efficiency. Comput Linguist 42(4):809–817. doi:10.1162/COLI_a_00267
Gómez-Rodríguez C, Carroll J, Weir D (2008) A deductive approach to dependency parsing. In: Proceedings of the 46th annual meeting of the Association for Computational Linguistics: human language technologies (ACL’08:HLT), Association for Computational Linguistics, pp 968–976. http://www.aclweb.org/anthology/P/P08/P08-1110
Gómez-Rodríguez C, Carroll JA, Weir DJ (2011) Dependency parsing schemata and mildly non-projective dependency parsing. Computat Linguist 37(3):541–586
Goto I, Utiyama M, Onishi T, Sumita E (2011) A comparison study of parsers for patent machine translation. In: Proceedings of the 13th machine translation summit (MT Summit XIII), International Association for Machine Translation, pp 448–455. http://www.mt-archive.info/MTS-2011-Goto.pdf
Huang L, Sagae K (2010) Dynamic programming for linear-time incremental parsing. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, ACL ’10, pp 1077–1086. http://portal.acm.org/citation.cfm?id=1858681.1858791
Jia L, Yu C, Meng W (2009) The effect of negation on sentiment analysis and retrieval effectiveness. CIKM’09 proceeding of the 18th ACM conference on information and knowledge management. ACM Press, Hong Kong, pp 1827–1830
Joshi M, Penstein-Rosé C (2009) Generalizing dependency features for opinion mining. In: Proceedings of the ACL-IJCNLP 2009 conference short papers, Association for Computational Linguistics, Stroudsburg, PA, USA, ACLShort ’09, pp 313–316
Kahane S, Mazziotta N (2015) Syntactic polygraphs. a formalism extending both constituency and dependency. In: Proceedings of the 14th meeting on the mathematics of language (MoL 2015), Association for Computational Linguistics, Chicago, USA, pp 152–164. http://www.aclweb.org/anthology/W15-2313
Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: The 52nd annual meeting of the association for computational linguistics. Proceedings of the conference. Volume 1: long papers, ACL, Baltimore, Maryland, USA, pp 655–665
Khan FH, Qamar U, Bashir S (2016a) Esap: a decision support framework for enhanced sentiment analysis and polarity classification. Inf Sci 367:862–873
Khan FH, Qamar U, Bashir S (2016b) Swims: semi-supervised subjective feature weighting and intelligent model selection for sentiment analysis. Knowl Based Syst 100:97–111
Kong L, Schneider N, Swayamdipta S, Bhatia A, Dyer C, Smith NA (2014) A dependency parser for tweets. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, pp 1001–1012. http://www.aclweb.org/anthology/D14-1108
Kuhlmann M, Gómez-Rodríguez C, Satta G (2011) Dynamic programming algorithms for transition-based dependency parsers. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies (ACL 2011), Association for Computational Linguistics, Portland, Oregon, USA, pp 673–682. http://www.aclweb.org/anthology/P11-1068
Liu Q, Gao Z, Liu B, Zhang Y (2016) Automated rule selection for opinion target extraction. Knowl Based Syst 104:74–88
Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330
Martins A, Smith N, Xing E, Aguiar P, Figueiredo M (2010) Turbo parsers: dependency parsing by approximate variational inference. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Association for Computational Linguistics, Cambridge, MA, pp 34–44. http://www.aclweb.org/anthology/D10-1004
Martins A, Almeida M, Smith NA (2013) Turning on the turbo: fast third-order non-projective turbo parsers. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 2: short papers), Sofia, Bulgaria, pp 617–622. http://www.aclweb.org/anthology/P13-2109
McDonald R, Nivre J (2007) Characterizing the errors of data-driven dependency parsing models. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 122–131
McDonald R, Satta G (2007) On the complexity of non-projective data-driven dependency parsing. In: IWPT 2007: proceedings of the 10th international conference on parsing technologies, pp 121–132
McDonald R, Pereira F, Ribarov K, Hajič J (2005) Non-projective dependency parsing using spanning tree algorithms. In: HLT/EMNLP 2005: proceedings of the conference on human language technology and empirical methods in natural language processing, pp 523–530
McDonald R, Nivre J, Quirmbach-brundage Y, Goldberg Y, Das D, Ganchev K, Hall K, Petrov S, Zhang H, Täckström O, Bedini C, Castelló N, Lee J (2013) Universal dependency annotation for multilingual parsing. In: Proceedings of the 51st annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 92–97
Miceli Barone AV, Attardi G (2015) Non-projective dependency-based pre-reordering with recurrent neural network for machine translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long Papers), Association for Computational Linguistics, Beijing, China, pp 846–856. http://www.aclweb.org/anthology/P15-1082
Miyao Y, Sætre R, Sagae K, Matsuzaki T, Tsujii J (2008) Task-oriented evaluation of syntactic parsers and their representations. In: Proceedings of ACL-08: HLT, association for computational linguistics, Columbus, Ohio, pp 46–54. http://www.aclweb.org/anthology/P/P08/P08-1006
Napoles C, Gormley M, Van Durme B (2012) Annotated gigaword. In: Proceedings of the joint workshop on automatic knowledge base construction and web-scale knowledge extraction, Association for Computational Linguistics, pp 95–100
Nivre J, Hall J, Nilsson J, Chanev A, Eryiǧit G, Kübler S, Marinov S, Marsi E (2007) Maltparser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13:95–135
Nivre J, Rimell L, McDonald R, Gómez Rodríguez C (2010) Evaluation of dependency parsers on unbounded dependencies. In: Proceedings of the 23rd international conference on computational linguistics (COLING 2010), Association for Computational Linguistics, pp 833–841. http://www.aclweb.org/anthology/C10-1094
Padó S, Noh TG, Stern A, Wang R, Zanoli R (2015) Design and realization of a modular architecture for textual entailment. Nat Lang Eng 21(2):167–200
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 271–278
Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 115–124
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. EMNLP 14:1532–1543
Pitler E, Kannan S, Marcus M (2013) Finding optimal 1-endpoint-crossing trees. Trans Assoc Comput Linguist 1:13–24. http://aclweb.org/anthology/Q13-1002
Popel M, Mareček D, Green N, Zabokrtsky Z (2011) Influence of parser choice on dependency-based mt. In: Proceedings of the sixth workshop on statistical machine translation, Association for Computational Linguistics, Edinburgh, Scotland, pp 433–439. http://www.aclweb.org/anthology/W11-2153
Poria S, Cambria E, Winterstein G, Huang GB (2014) Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl Based Syst 69:45–63
Quirk C, Corston-Oliver S (2006) The impact of parse quality on syntactically-informed statistical machine translation. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Association for Computational Linguistics, Sydney, Australia, pp 62–69. http://www.aclweb.org/anthology/W06-1608
Rajpurkar P, Zhang J, Konstantin L, Liang P (2016) SQuAD: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250
Rasooli MS, Tetreault JR (2015) Yara parser: a fast and accurate dependency parser. CoRR http://arxiv.org/abs/1503.06733
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP 2013. 2013 Conference on empirical methods in natural language processing. Proceedings of the Conference, ACL, Seattle, Washington, USA, pp 1631–1642
Song M, Kim WC, Lee D, Heo GE, Kang KY (2015) PKDE4J: entity and relation extraction for public knowledge discovery. J Biomed Inform 57:320–332. doi:10.1016/j.jbi.2015.08.008
Taboada M, Grieve J (2004) Analyzing appraisal automatically. In: Proceedings of AAAI spring symposium on exploring attitude and affect in text (AAAI Technical Report SS0407), Stanford University, CA, AAAI Press, pp 158–161
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307
Taulé M, Martí MA, Recasens M (2008) AnCora: multilevel annotated corpora for catalan and Spanish. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odjik J, Piperidis S, Tapias D (eds) Proceedings of the sixth international conference on language resources and evaluation (LREC’08), Marrakech, Morocco, pp 96–101
Vilares D, Alonso MA, Gómez-Rodríguez C (2015a) A linguistic approach for determining the topics of Spanish Twitter messages. J Inf Sci 41(02):127–145
Vilares D, Alonso MA, Gómez-Rodríguez C (2015b) A syntactic approach for opinion mining on Spanish reviews. Nat Lang Eng 21(01):139–163
Vilares D, Alonso MA, Gómez-Rodríguez C (2015c) On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages. J Assoc Inf Sci Sci Technol 66(9):1799–1816
Vilares D, Gómez-Rodríguez C, Alonso MA (2017) Universal, unsupervised (rule-based), uncovered sentiment analysis. Knowl Based Syst 118:45–55. doi:10.1016/j.knosys.2016.11.014
Volokh A (2013) Performance-oriented dependency parsing. Doctoral dissertation. Saarland University, Saarbrücken, Germany
Volokh A, Neumann G (2012) Task-oriented dependency parsing evaluation methodology. In: IEEE 13th international conference on information reuse and integration, IRI 2012, Las Vegas, NV, USA, 8–10 Aug 2012, pp 132–137. doi:10.1109/IRI.2012.6303001
Wu Y, Zhang Q, Huang X, Wu L (2009) Phrase dependency parsing for opinion mining. In: Proceedings of the 2009 conference on empirical methods in natural language processing, ACL, Singapore, pp 1533–1541
Xiao T, Zhu J, Zhang C, Liu T (2016) Syntactic skeleton-based translation. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, 12–17 Feb 2016, Phoenix, Arizona, USA, pp 2856–2862. http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11933
Yu M, Gormley MR, Dredze M (2015) Combining word embeddings and feature embeddings for fine-grained relation extraction. In: Proceedings of the 2015 conference of the north american chapter of the Association for Computational Linguistics: human language technologies, Association for Computational Linguistics, Denver, Colorado, pp 1374–1379. http://www.aclweb.org/anthology/N15-1155
Yuret D, Han A, Turgut Z (2010) Semeval-2010 task 12: Parser evaluation using textual entailments. In: Proceedings of the 5th international workshop on semantic evaluation, Association for Computational Linguistics, Uppsala, Sweden, pp 51–56. http://www.aclweb.org/anthology/S10-1009
Zhang Y, Nivre J (2011) Transition-based dependency parsing with rich non-local features. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers-volume 2, pp. 188–193 http://dl.acm.org/citation.cfm?id=2002736.2002777
Author information
Authors and Affiliations
Corresponding author
Additional information
Carlos Gómez-Rodríguez has received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, Grant Agreement No 714150), Ministerio de Economía y Competitividad (FFI2014-51978-C2-2-R), and the Oportunius Program (Xunta de Galicia). Iago Alonso-Alonso was funded by an Oportunius Program Grant (Xunta de Galicia). David Vilares has received funding from the Ministerio de Educación, Cultura y Deporte (FPU13/01180) and Ministerio de Economía y Competitividad (FFI2014-51978-C2-2-R).
Rights and permissions
About this article
Cite this article
Gómez-Rodríguez, C., Alonso-Alonso, I. & Vilares, D. How important is syntactic parsing accuracy? An empirical evaluation on rule-based sentiment analysis. Artif Intell Rev 52, 2081–2097 (2019). https://doi.org/10.1007/s10462-017-9584-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-017-9584-0