Pipeline Robustness

Wachsmuth, Henning

doi:10.1007/978-3-319-25741-9_5

Henning Wachsmuth¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9383))

1772 Accesses

Abstract

The ultimate purpose of text analysis pipelines is to infer new information from unknown input texts. To this end, the algorithms employed in pipelines are usually developed on known training texts from the anticipated domains of application (cf. Sect. 2.1). In many applications, however, the unknown texts significantly differ from the known texts, because a consideration of all possible domains within the development is practically infeasible (Blitzer et al. 2007). As a consequence, algorithms often fail to infer information effectively, especially when they rely on features of texts that are specific to the training domain . Such missing domain robustness constitutes a fundamental problem of text analysis (Turmo et al. 2006; Daumé and Marcu 2006). The missing robustness of an algorithm directly reduces the robustness of a pipeline it is employed in. This in turn limits the benefit of pipelines in all search engines and big data analytics applications, where the domains of texts cannot be anticipated. In this chapter, we present first substantial results of an approach that improves robustness by relying on novel structure-based features that are invariant across domains .

Section 5.1 discusses how to achieve ideal domain independence in theory. Since the domain robustness problem is very diverse, we then focus on a specific type of text analysis tasks (unlike in Chaps. 3 and 4). In particular, we consider tasks that deal with the classification of argumentative texts , like sentiment analysis , stance recognition , or automatic essay grading (cf. Sect. 2.1). In Sect. 5.2, we introduce a shallow model of such tasks, which captures the sequential overall structure of argumentative texts on the pragmatic level while abstracting from their content. For instance, we observe that review argumentation can be represented by the flow of local sentiment . Given the model , we demonstrate that common flow patternsexist in argumentative texts (Sect. 5.3). Our hypothesis is that such patterns generalize well across domains . In Sect. 5.4, we learn common flow patterns with a supervised variant of clustering . Then, we use each pattern as a single feature for classifying argumentative texts from different domains . Our results for sentiment analysis indicate the robustness of modeling overall structure (other tasks are left for future work). In addition, we can visually make results more intelligible based on the model (Sect. 5.5). Altogether, this chapter realizes the overall analysis within the approach of this book, highlighted in Fig. 5.1. Both robustness and intelligibility benefit the use of pipelines in ad-hoc large-scale text mining .

In making a speech one must study three points: first, the means of producing persuasion; second, the style, or language, to be used; third, the proper arrangement of the various parts of the speech.– Aristotle

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Twitter, http://www.twitter.com, accessed on June 15, 2015.
2.
In software engineering terms, the latter could be seen as some domain-specific language .
3.
According to Blitzer et al. (2008), domain dependence occurs in nearly every application of machine learning . As exemplified, it is not restricted to statistical approaches, though.
4.
The terms used here come from the area of machine learning (Hastie et al. 2009). However, our argumentation largely applies to rule-based text analysis approaches as well.
5.
On a different level, overall structure also play a role in sequence labeling tasks like named entity recognition (cf. Sect. 2.3). There, many approaches analyze the syntactic structure of a sentence for the decision whether some candidate text span denotes an entity .
6.
Notice, though, that all single text classification tasks target at the inference of information of one type \(C \) only, which remains implicit in the basic scenario from Sect. 1.2.
7.
As the examples demonstrate, the scheme of unit classes can, but needs not necessarily, be related to the scheme of the text class to be inferred.
8.
Besides different namings, the metamodel can be seen as a generalization of the review argumentation model from Wachsmuth et al. (2014a). However, we emphasize here that the semantic concepts contained in a discourse unit do not belong to the structure.
9.
The numbers of features vary depending on the processed training set , because only distributional features with some minimum occurrence are considered (cf. Sect. 2.1).
10.
The Sentiment Scale dataset is partitioned into four author datasets (cf. Sect. C.4). Here, we use the datasets of author c and d for training, b for validation, and a for testing.
11.
Research on domain adaptation often compares the accuracy of a classifier in its training domain to its accuracy in some other test domain (i.e., A2A vs. A2B), because training data from the test domain is assumed to be scarce. However, this leaves unclear whether an accuracy change may be caused by a varying difficulty of the task at hand across domains . For the analysis of domain invariance , we therefore put the comparison of different training domains for the same test domain (i.e., A2A vs. B2A) in the focus here.
12.
For information on the source code of the statistical analysis, see Appendix B.4.
13.
Alternatively, Mao and Lebanon (2007) propose to ignore the objective facts . Our according experiments did not yield new insights except for a higher frequency of trivial flows . For lack of relevance, we omit to present results on local sentiment flows here, but they can be easily reproduced using the provided source code (cf. Appendix B.4).
14.
In Wachsmuth et al. (2014b), we name these sequences argumentation flows . In the given more general context, we prefer a more task-specific naming in order to avoid confusion.
15.
In the evaluation at the end of this section, we present results on the extent to which the effectiveness of inferring \(C _\mathbf{f} \) affects the quality of the features based on the flow patterns .
16.
For clarity, we have included the computation of flows both in Pseudocode 5.1 and in Pseudocode 5.2. In practice, the flow of each text can be maintained during the whole process of feature determination and vector creation and, thus, needs to be computed only once.
17.
If the flow of each text from \(\mathbf D _T\) is computed only once during the whole process (see above), Ineq. 5.2 would even be reduced to \( \mathscr {O}(|\mathbf D _{T}| \cdot |\mathbf{f} _{max}|)\).
18.
As in Sect. 5.3, the numbers of features vary depending on the training set , because we take only those features whose frequency in the training texts exceeds some specified threshold (cf. Sect. 2.1). For instance, a word unigram is taken into account only if it occurs in at least 5 % of the hotel reviews or 10 % of the film reviews, respectively.
19.
In Wachsmuth et al. (2014a), we also evaluate the local sentiment on specific domain concepts in the given text. For lack of relevance, we leave out respective experiments here.
20.
We evaluate only the classification of scores for a focused discussion. In general, a more or less metric scale like sentiment scores suggests to use regression (cf. Sect. 2.1), as we have partly done in Wachsmuth et al. (2014a). Moreover, since our evaluation does not aim at achieving maximum effectiveness in the first place, for simplicity we do not explicitly incorporate knowledge about the neighborship between classes here, e.g. that score 1 is closer to score 2 than to score 3. An according approach has been proposed by Pang and Lee (2005).
21.
We suppose that the reason behind mainly lies in the limited accuracy of 74 % of our polarity classifier csp in the film domain (cf. Appendix A.2), which reduces the impact of all features that rely on local sentiment .
22.
Besides explanations , a common approach to improve intelligibility in tasks like information extraction , which we do not detail here, is to support verifiability, e.g. by linking back to the source documents from which the returned results have been inferred.
23.
For clarity, we omit the text span in case of discourse relations in Fig. 5.18. Relation types can easily be identified, as only they point to the information they are dependent on.
24.
Although \(\varPi _{sco}\) employs pdu, Fig. 5.18 contains no discourse unit annotations . This is because each discourse unit is classified as being a fact or an opinion by csb afterwards.
25.
In Apache UIMA , the algorithms’ interdependencies can be inferred from the descriptor files of the employed primitive analysis engines. For properties like quality estimations , we use a fixed notation in the description field of a descriptor file, just as we do for the expert system from Sect. 3.3.
26.
Amazon Mechanical Turk, http://www.mturk.com, accessed on June 15, 2015.
27.
Unfortunately, flow patterns were omitted, since they are not visualized in our application.

Author information

Authors and Affiliations

Bauhaus-Universität Weimar, Weimar, Germany
Henning Wachsmuth

Authors

Henning Wachsmuth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henning Wachsmuth .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wachsmuth, H. (2015). Pipeline Robustness. In: Text Analysis Pipelines. Lecture Notes in Computer Science(), vol 9383. Springer, Cham. https://doi.org/10.1007/978-3-319-25741-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-25741-9_5
Published: 03 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25740-2
Online ISBN: 978-3-319-25741-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics