- 1.4k Downloads
The ability of performing text mining ad-hoc in the large has the potential to essentially improve the way people find information today in terms of speed and quality, both in everyday web search and in big data analytics. More complex information needs can be fulfilled immediately, and previously hidden information can be accessed. At the heart of every text mining application, relevant information is inferred from natural language texts by a text analysis process. Mostly, such a process is realized in the form of a pipeline that sequentially executes a number of information extraction, text classification, and other natural language processing algorithms. As a matter of fact, text mining is studied in the field of computational linguistics, which we consider from a computer science perspective in this book.
Besides the fundamental challenge of inferring relevant information effectively, we have revealed the automatic design of a text analysis pipeline and the optimization of a pipeline’s run-time efficiency and domain robustness as major requirements for the enablement of ad-hoc large-scale text mining. Then, we have investigated the research question of how to exploit knowledge about a text analysis process and information obtained within the process to approach these requirements. To this end, we have developed different models and algorithms that can be employed to address information needsad-hoc on large numbers of texts. The algorithms rely on classical and statistical techniques from artificial intelligence, namely, planning, truth maintenance, and informed search as well as supervised and self-supervised learning. All algorithms have been analyzed formally, implemented as software, and evaluated experimentally.
In Sect. 6.1, we summarize our main findings and their contributions to different areas of computational linguistics. We outline that they have both scientific and practical impact on the state of the art in text mining. However, far from every problem of ad-hoc large-scale text mining has been solved or even approached at all in this book. In the words of Alan Turing, we can therefore already see plenty there that needs to be done in the given and in new directions of future research (Sect. 6.2). Also, some of our main ideas may be beneficial for other problems from computer science or even from other fields of application, as we finally sketch at the end.