Skip to main content
Log in

How important is syntactic parsing accuracy? An empirical evaluation on rule-based sentiment analysis

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful. In recent years, there have been significant advances in the accuracy of parsing algorithms. In this article, we perform an empirical, task-oriented evaluation to determine how parsing accuracy influences the performance of a state-of-the-art rule-based sentiment analysis system that determines the polarity of sentences from their parse trees. In particular, we evaluate the system using four well-known dependency parsers, including both current models with state-of-the-art accuracy and more innacurate models which, however, require less computational resources. The experiments show that all of the parsers produce similarly good results in the sentiment analysis task, without their accuracy having any relevant influence on the results. Since parsing is currently a task with a relatively high computational cost that varies strongly between algorithms, this suggests that sentiment analysis researchers and users should prioritize speed over accuracy when choosing a parser; and parsing researchers should investigate models that improve speed further, even at some cost to accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. MaltParser often requires feature optimization to obtain acceptable results for the target language.

  2. http://nlp.stanford.edu/data/glove.6B.zip.

  3. The results obtained in these corpora are slightly different from the ones reported by Vilares et al. (2017), due to the different tokenization techniques used in this work.

References

  • Andor D, Alberti C, Weiss D, Severyn A, Presta A, Ganchev K, Petrov S, Collins M (2016) Globally normalized transition-based neural networks. arXiv: 1603.06042 [cs.CL]

  • Asmi A, Ishaya T (2012) Negation identification and calculation in sentiment analysis. In: The second international conference on advances in information mining and management, pp 1–7

  • Aue A, Gamon M (2005) Customizing sentiment classifiers to new domains: a case study. In: Proceedings of the 5th international conference on recent advances in natural language processing (RANLP 2015), Borovets, Bulgaria. https://www.microsoft.com/en-us/research/publication/customizing-sentiment-classifiers-to-new-domains-a-case-study/

  • Ballesteros M, Nivre J (2012) Maltoptimizer: a system for maltparser optimization. In: Chair NCC, Choukri K, Declerck T, Dogan MU, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (eds) Proceedings of the eight international conference on language resources and evaluation (LREC’12). European Language Resources Association (ELRA), Istanbul

    Google Scholar 

  • Bender EM, Flickinger D, Oepen S, Zhang Y (2011) Parser evaluation over local and non-local deep dependencies in a large corpus. In: Proceedings of the 2011 conference on empirical methods in natural language processing, Association for Computational Linguistics, Edinburgh, Scotland, UK, pp 397–408. http://www.aclweb.org/anthology/D11-1037

  • Berzak Y, Huang Y, Barbu A, Korhonen A, Katz B (2016) Bias and agreement in syntactic annotations. arXiv:1605.04481 [cs.CL]

  • Branavan SRK, Silver D, Barzilay R (2012) Learning to win by reading manuals in a monte-carlo framework. J Artif Int Res 43(1):661–704. http://dl.acm.org/citation.cfm?id=2387915.2387932

  • Buyko E, Hahn U (2010) Evaluating the impact of alternative dependency graph encodings on solving event extraction tasks. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Association for Computational Linguistics, Cambridge, MA, pp 982–992. http://www.aclweb.org/anthology/D10-1096

  • Chen D, Manning C (2014) A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar, pp 740–750. http://www.aclweb.org/anthology/D14-1082

  • Choi JD, McCallum A (2013) Transition-based dependency parsing with selectional branching. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 1: long papers), Sofia, Bulgaria, pp 1052–1062. http://www.aclweb.org/anthology/P13-1104

  • Clark S, Copestake A, Curran JR, Zhang Y, Herbelot A, Haggerty J, Ahn BG, Wyk CV, Roesner J, Kummerfeld J, Dawborn T (2009) Large-scale syntactic processing: parsing the web. Technical report. Johns Hopkins University

  • Cohen SB, Gómez-Rodríguez C, Satta G (2011) Exact inference for generative probabilistic non-projective dependency parsing. In: Proceedings of the 2011 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, pp 1234–1245. http://www.aclweb.org/anthology/D11-1114

  • DeNeefe S, Knight K (2009) Synchronous tree adjoining machine translation. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Association for Computational Linguistics, Singapore, pp 727–736. http://www.aclweb.org/anthology/D/D09/D09-1076

  • Dyer C, Ballesteros M, Ling W, Matthews A, Smith NA (2015) Transition-based dependency parsing with stack long short-term memory. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), Association for Computational Linguistics, Beijing, China, pp 334–343. http://www.aclweb.org/anthology/P15-1033

  • Eisner J (1996) Three new probabilistic models for dependency parsing: an exploration. In: Proceedings of the 16th international conference on computational linguistics (COLING-96), San Francisco, CA, USA, pp 340–345

  • Farghaly A, Shaalan K (2009) Arabic natural language processing: challenges and solutions. ACM Trans Asian Lang Inf Process (TALIP) 8(4):14:1–14:22. doi:10.1145/1644879.1644881

    Google Scholar 

  • Goldberg Y, Nivre J (2012) A dynamic oracle for arc-eager dependency parsing. In: Proceedings of the 24th international conference on computational linguistics (COLING), Association for Computational Linguistics, pp 959–976. http://aclweb.org/anthology/C/C12/C12-1059.pdf

  • Gómez-Rodríguez C (2016) Restricted non-projectivity: coverage vs efficiency. Comput Linguist 42(4):809–817. doi:10.1162/COLI_a_00267

    Article  MathSciNet  Google Scholar 

  • Gómez-Rodríguez C, Carroll J, Weir D (2008) A deductive approach to dependency parsing. In: Proceedings of the 46th annual meeting of the Association for Computational Linguistics: human language technologies (ACL’08:HLT), Association for Computational Linguistics, pp 968–976. http://www.aclweb.org/anthology/P/P08/P08-1110

  • Gómez-Rodríguez C, Carroll JA, Weir DJ (2011) Dependency parsing schemata and mildly non-projective dependency parsing. Computat Linguist 37(3):541–586

    Article  MathSciNet  Google Scholar 

  • Goto I, Utiyama M, Onishi T, Sumita E (2011) A comparison study of parsers for patent machine translation. In: Proceedings of the 13th machine translation summit (MT Summit XIII), International Association for Machine Translation, pp 448–455. http://www.mt-archive.info/MTS-2011-Goto.pdf

  • Huang L, Sagae K (2010) Dynamic programming for linear-time incremental parsing. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, ACL ’10, pp 1077–1086. http://portal.acm.org/citation.cfm?id=1858681.1858791

  • Jia L, Yu C, Meng W (2009) The effect of negation on sentiment analysis and retrieval effectiveness. CIKM’09 proceeding of the 18th ACM conference on information and knowledge management. ACM Press, Hong Kong, pp 1827–1830

    Chapter  Google Scholar 

  • Joshi M, Penstein-Rosé C (2009) Generalizing dependency features for opinion mining. In: Proceedings of the ACL-IJCNLP 2009 conference short papers, Association for Computational Linguistics, Stroudsburg, PA, USA, ACLShort ’09, pp 313–316

  • Kahane S, Mazziotta N (2015) Syntactic polygraphs. a formalism extending both constituency and dependency. In: Proceedings of the 14th meeting on the mathematics of language (MoL 2015), Association for Computational Linguistics, Chicago, USA, pp 152–164. http://www.aclweb.org/anthology/W15-2313

  • Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: The 52nd annual meeting of the association for computational linguistics. Proceedings of the conference. Volume 1: long papers, ACL, Baltimore, Maryland, USA, pp 655–665

  • Khan FH, Qamar U, Bashir S (2016a) Esap: a decision support framework for enhanced sentiment analysis and polarity classification. Inf Sci 367:862–873

    Article  Google Scholar 

  • Khan FH, Qamar U, Bashir S (2016b) Swims: semi-supervised subjective feature weighting and intelligent model selection for sentiment analysis. Knowl Based Syst 100:97–111

    Article  Google Scholar 

  • Kong L, Schneider N, Swayamdipta S, Bhatia A, Dyer C, Smith NA (2014) A dependency parser for tweets. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, pp 1001–1012. http://www.aclweb.org/anthology/D14-1108

  • Kuhlmann M, Gómez-Rodríguez C, Satta G (2011) Dynamic programming algorithms for transition-based dependency parsers. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies (ACL 2011), Association for Computational Linguistics, Portland, Oregon, USA, pp 673–682. http://www.aclweb.org/anthology/P11-1068

  • Liu Q, Gao Z, Liu B, Zhang Y (2016) Automated rule selection for opinion target extraction. Knowl Based Syst 104:74–88

    Article  Google Scholar 

  • Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330

    Google Scholar 

  • Martins A, Smith N, Xing E, Aguiar P, Figueiredo M (2010) Turbo parsers: dependency parsing by approximate variational inference. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Association for Computational Linguistics, Cambridge, MA, pp 34–44. http://www.aclweb.org/anthology/D10-1004

  • Martins A, Almeida M, Smith NA (2013) Turning on the turbo: fast third-order non-projective turbo parsers. In: Proceedings of the 51st annual meeting of the association for computational linguistics (volume 2: short papers), Sofia, Bulgaria, pp 617–622. http://www.aclweb.org/anthology/P13-2109

  • McDonald R, Nivre J (2007) Characterizing the errors of data-driven dependency parsing models. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 122–131

  • McDonald R, Satta G (2007) On the complexity of non-projective data-driven dependency parsing. In: IWPT 2007: proceedings of the 10th international conference on parsing technologies, pp 121–132

  • McDonald R, Pereira F, Ribarov K, Hajič J (2005) Non-projective dependency parsing using spanning tree algorithms. In: HLT/EMNLP 2005: proceedings of the conference on human language technology and empirical methods in natural language processing, pp 523–530

  • McDonald R, Nivre J, Quirmbach-brundage Y, Goldberg Y, Das D, Ganchev K, Hall K, Petrov S, Zhang H, Täckström O, Bedini C, Castelló N, Lee J (2013) Universal dependency annotation for multilingual parsing. In: Proceedings of the 51st annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 92–97

  • Miceli Barone AV, Attardi G (2015) Non-projective dependency-based pre-reordering with recurrent neural network for machine translation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: Long Papers), Association for Computational Linguistics, Beijing, China, pp 846–856. http://www.aclweb.org/anthology/P15-1082

  • Miyao Y, Sætre R, Sagae K, Matsuzaki T, Tsujii J (2008) Task-oriented evaluation of syntactic parsers and their representations. In: Proceedings of ACL-08: HLT, association for computational linguistics, Columbus, Ohio, pp 46–54. http://www.aclweb.org/anthology/P/P08/P08-1006

  • Napoles C, Gormley M, Van Durme B (2012) Annotated gigaword. In: Proceedings of the joint workshop on automatic knowledge base construction and web-scale knowledge extraction, Association for Computational Linguistics, pp 95–100

  • Nivre J, Hall J, Nilsson J, Chanev A, Eryiǧit G, Kübler S, Marinov S, Marsi E (2007) Maltparser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13:95–135

    Article  Google Scholar 

  • Nivre J, Rimell L, McDonald R, Gómez Rodríguez C (2010) Evaluation of dependency parsers on unbounded dependencies. In: Proceedings of the 23rd international conference on computational linguistics (COLING 2010), Association for Computational Linguistics, pp 833–841. http://www.aclweb.org/anthology/C10-1094

  • Padó S, Noh TG, Stern A, Wang R, Zanoli R (2015) Design and realization of a modular architecture for textual entailment. Nat Lang Eng 21(2):167–200

    Article  Google Scholar 

  • Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 271–278

  • Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 115–124

  • Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. EMNLP 14:1532–1543

    Google Scholar 

  • Pitler E, Kannan S, Marcus M (2013) Finding optimal 1-endpoint-crossing trees. Trans Assoc Comput Linguist 1:13–24. http://aclweb.org/anthology/Q13-1002

  • Popel M, Mareček D, Green N, Zabokrtsky Z (2011) Influence of parser choice on dependency-based mt. In: Proceedings of the sixth workshop on statistical machine translation, Association for Computational Linguistics, Edinburgh, Scotland, pp 433–439. http://www.aclweb.org/anthology/W11-2153

  • Poria S, Cambria E, Winterstein G, Huang GB (2014) Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl Based Syst 69:45–63

    Article  Google Scholar 

  • Quirk C, Corston-Oliver S (2006) The impact of parse quality on syntactically-informed statistical machine translation. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Association for Computational Linguistics, Sydney, Australia, pp 62–69. http://www.aclweb.org/anthology/W06-1608

  • Rajpurkar P, Zhang J, Konstantin L, Liang P (2016) SQuAD: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250

  • Rasooli MS, Tetreault JR (2015) Yara parser: a fast and accurate dependency parser. CoRR http://arxiv.org/abs/1503.06733

  • Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP 2013. 2013 Conference on empirical methods in natural language processing. Proceedings of the Conference, ACL, Seattle, Washington, USA, pp 1631–1642

  • Song M, Kim WC, Lee D, Heo GE, Kang KY (2015) PKDE4J: entity and relation extraction for public knowledge discovery. J Biomed Inform 57:320–332. doi:10.1016/j.jbi.2015.08.008

    Article  Google Scholar 

  • Taboada M, Grieve J (2004) Analyzing appraisal automatically. In: Proceedings of AAAI spring symposium on exploring attitude and affect in text (AAAI Technical Report SS0407), Stanford University, CA, AAAI Press, pp 158–161

  • Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307

    Article  Google Scholar 

  • Taulé M, Martí MA, Recasens M (2008) AnCora: multilevel annotated corpora for catalan and Spanish. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odjik J, Piperidis S, Tapias D (eds) Proceedings of the sixth international conference on language resources and evaluation (LREC’08), Marrakech, Morocco, pp 96–101

  • Vilares D, Alonso MA, Gómez-Rodríguez C (2015a) A linguistic approach for determining the topics of Spanish Twitter messages. J Inf Sci 41(02):127–145

    Article  Google Scholar 

  • Vilares D, Alonso MA, Gómez-Rodríguez C (2015b) A syntactic approach for opinion mining on Spanish reviews. Nat Lang Eng 21(01):139–163

    Article  Google Scholar 

  • Vilares D, Alonso MA, Gómez-Rodríguez C (2015c) On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages. J Assoc Inf Sci Sci Technol 66(9):1799–1816

    Article  Google Scholar 

  • Vilares D, Gómez-Rodríguez C, Alonso MA (2017) Universal, unsupervised (rule-based), uncovered sentiment analysis. Knowl Based Syst 118:45–55. doi:10.1016/j.knosys.2016.11.014

    Article  Google Scholar 

  • Volokh A (2013) Performance-oriented dependency parsing. Doctoral dissertation. Saarland University, Saarbrücken, Germany

  • Volokh A, Neumann G (2012) Task-oriented dependency parsing evaluation methodology. In: IEEE 13th international conference on information reuse and integration, IRI 2012, Las Vegas, NV, USA, 8–10 Aug 2012, pp 132–137. doi:10.1109/IRI.2012.6303001

  • Wu Y, Zhang Q, Huang X, Wu L (2009) Phrase dependency parsing for opinion mining. In: Proceedings of the 2009 conference on empirical methods in natural language processing, ACL, Singapore, pp 1533–1541

  • Xiao T, Zhu J, Zhang C, Liu T (2016) Syntactic skeleton-based translation. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, 12–17 Feb 2016, Phoenix, Arizona, USA, pp 2856–2862. http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11933

  • Yu M, Gormley MR, Dredze M (2015) Combining word embeddings and feature embeddings for fine-grained relation extraction. In: Proceedings of the 2015 conference of the north american chapter of the Association for Computational Linguistics: human language technologies, Association for Computational Linguistics, Denver, Colorado, pp 1374–1379. http://www.aclweb.org/anthology/N15-1155

  • Yuret D, Han A, Turgut Z (2010) Semeval-2010 task 12: Parser evaluation using textual entailments. In: Proceedings of the 5th international workshop on semantic evaluation, Association for Computational Linguistics, Uppsala, Sweden, pp 51–56. http://www.aclweb.org/anthology/S10-1009

  • Zhang Y, Nivre J (2011) Transition-based dependency parsing with rich non-local features. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers-volume 2, pp. 188–193 http://dl.acm.org/citation.cfm?id=2002736.2002777

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Gómez-Rodríguez.

Additional information

Carlos Gómez-Rodríguez has received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, Grant Agreement No 714150), Ministerio de Economía y Competitividad (FFI2014-51978-C2-2-R), and the Oportunius Program (Xunta de Galicia). Iago Alonso-Alonso was funded by an Oportunius Program Grant (Xunta de Galicia). David Vilares has received funding from the Ministerio de Educación, Cultura y Deporte (FPU13/01180) and Ministerio de Economía y Competitividad (FFI2014-51978-C2-2-R).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gómez-Rodríguez, C., Alonso-Alonso, I. & Vilares, D. How important is syntactic parsing accuracy? An empirical evaluation on rule-based sentiment analysis. Artif Intell Rev 52, 2081–2097 (2019). https://doi.org/10.1007/s10462-017-9584-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-017-9584-0

Keywords

Navigation