Advertisement

Pipeline Design

  • Henning WachsmuthEmail author
Chapter
  • 1.4k Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9383)

Abstract

The realization of a text analysis as a sequential execution of the algorithms in a pipeline does not mimic the way humans approach text analysis . Humans simultaneously investigate lexical, syntactic, semantic, and pragmatic clues in and about a text (McCallum 2009) while skimming over the text to fastly focus on the portions of text for a task (Duggan and Payne 2009). From a machine viewpoint, however, the of a text analysis into single executable steps is a prerequisite for identifying relevant information and their interdependencies. Until today, this and the subsequent of a text analysis are mostly made manually, which prevents the use of pipelines for tasks in ad-hoc text . Moreover, such pipelines do not focus on the task-relevant portions of input , making their execution slower than necessary (cf. Sect.  2.2). In this chapter, we show that both parts of pipeline (i.e., and task-specific ) can be fully automated, once given adequate formalizations of text .

In Sect. 3.1, we discuss the optimality of text analysis and we introduce paradigms of an ideal pipeline and . For automatic , we model the expert knowledge underlying text analysis formally (Sect. 3.2). On this basis, we operationalize the cognitive skills of constructing pipelines through partial order  (Sect. 3.3). In our evaluation, the always takes near zero-time, thus enabling ad-hoc text . In Sect. 3.4, we then reinterpret text as the task to the portions of a text that contain relevant , i.e., to consistently imitate skimming. We realize this information-oriented view by equipping a pipeline with an input . Based on the dependencies between relevant information , the input determines for each employed algorithm in advance what portions of text its output is for (Sect. 3.5). Such an automatic truth of the relevant results in an optimal pipeline , since all unnecessary analyses of input texts are avoided. This does not only improve pipeline significantly in all our experiments, but it also creates the potential of pipeline that we target at in Chap.  4. In addition, it implies different ways of trading for , which we examine before (Sect. 3.6).

Keywords

Text Analysis Input Text Sentence Level Analysis Engine Text Unit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Bauhaus-Universität WeimarWeimarGermany

Personalised recommendations