Natural Language Processing for Search

Ceri, Stefano; Bozzon, Alessandro; Brambilla, Marco; Della Valle, Emanuele; Fraternali, Piero; Quarteroni, Silvia

doi:10.1007/978-3-642-39314-3_5

Stefano Ceri⁹,
Alessandro Bozzon⁹,
Marco Brambilla⁹,
Emanuele Della Valle⁹,
Piero Fraternali⁹ &
…
Silvia Quarteroni⁹

Part of the book series: Data-Centric Systems and Applications ((DCSA))

3452 Accesses
1 Citations

Abstract

Unstructured data, i.e., data that has not been created for computer usage, make up about 80 % of the entire amount of digital documents. Most of the time, unstructured data are textual documents written in natural language: clearly, this kind of data is a powerful information source that needs to be handled well. Access to unstructured data may be greatly improved with respect to traditional information retrieval methods by using deep language understanding methods. In this chapter, we provide a brief overview of the relationship between natural language processing and search applications. We describe some machine learning methods that are used for formalizing natural language problems in probabilistic terms. We then discuss the main challenges behind automatic text processing, focusing on question answering as a representative example of the application of various deep text processing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available at: alias-i.com/lingpipe.
2.
www.cis.upenn.edu/~ace.

References

S. Abney, M. Collins, A. Singhal, Answer extraction, in Proceedings of the Sixth Conference on Applied Natural Language Processing. ANLC’00 (Association for Computational Linguistics, Stroudsburg, 2000), pp. 296–301
Chapter Google Scholar
D. Beeferman, A. Berger, J. Lafferty, Statistical models for text segmentation. Mach. Learn. 34, 177–210 (1999) doi:10.1023/A:1007506220214
Article MATH Google Scholar
A. Carlson, C. Cumby, J. Rosen, D. Roth, The SNoW learning architecture, Technical report, Technical report UIUCDCS, 1999
Google Scholar
X. Carreras, L. Màrquez, Introduction to the CoNLL-2005 shared task: semantic role labeling, in Proceedings of the Ninth Conference on Computational Natural Language Learning, (Association for Computational Linguistics, Stroudsburg, 2005), pp. 152–164
Chapter Google Scholar
W.B. Cavnar, J.M. Trenkle, N-gram-based text categorization. Ann Arbor MI 48113(2), 161–175 (1994), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.9367
Google Scholar
E. Charniak, Statistical techniques for natural language parsing. AI Mag. 18, 33–44 (1997)
Google Scholar
M. Collins, Head-driven statistical models for natural language parsing, Ph.D. thesis, University of Pennsylvania, 1999
Google Scholar
M. Collins, N. Duffy, New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron, in ACL (2002)
Google Scholar
K. Collins-Thompson, J. Callan, A language modeling approach to predicting reading difficulty, in Proceedings of HLT/NAACL, vol. 4 (2004)
Google Scholar
A. Culotta, J. Sorensen, Dependency tree kernels for relation extraction, in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (Association for Computational Linguistics, Stroudsburg, 2004), p. 423
Google Scholar
A. Esuli, F. Sebastiani, Determining term subjectivity and term orientation for opinion mining, in Proceedings the 11th Meeting of the European Chapter of the Association for Computational Linguistics (EACL-2006) (2006), pp. 193–200
Google Scholar
D.C. Gondek, A. Lally, A. Kalyanpur, J.W. Murdock, P.A. Duboue, L. Zhang, Y. Pan, Z.M. Qiu, C. Welty, A framework for merging and ranking of answers in DeepQA. IBM J. Res. Dev. 56(3), 399–410 (2012)
Google Scholar
S. Grimes, Unstructured data and the 80 percent rule. Carabridge Bridgepoints (2008)
Google Scholar
X. Huang, A. Acero, H.W. Hon, et al., Spoken Language Processing, vol. 15 (Prentice Hall, New York, 2001)
Google Scholar
T. Joachims, Making Large-Scale Support Vector Machine Learning Practical (MIT Press, Cambridge, 1999), pp. 169–184
Google Scholar
A. Kalyanpur, B.K. Boguraev, S. Patwardhan, J.W. Murdock, A. Lally, C. Welty, J.M. Prager, B. Coppola, A. Fokoue-Nkoutche, L. Zhang, Y. Pan, Z.M. Qiu, Structured data and inference in DeepQA. IBM J. Res. Dev. 56(3.4), 10 (2012). doi:10.1147/JRD.2012.2188737
Google Scholar
V. Kešelj, F. Peng, N. Cercone, C. Thomas, N-gram-based author profiles for authorship attribution, in Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING’03 (2003)
Google Scholar
P. Kingsbury, M. Palmer, From TreeBank to PropBank, in Proceedings of LREC (2002)
Google Scholar
D. Klein, C.D. Manning, Accurate unlexicalized parsing, in Proceedings of ACL (Association for Computational Linguistics, Stroudsburg, 2003), pp. 423–430
Google Scholar
J.D. Lafferty, A. McCallum, F.C.N. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data, in ICML, ed. by C.E. Brodley, A.P. Danyluk (Morgan Kaufmann, San Mateo, 2001), pp. 282–289
Google Scholar
K.-F. Lee, Automatic Speech Recognition: the Development of the Sphinx Recognition System, vol. 62 (Kluwer Academic, Norwell, 1989)
Book Google Scholar
C. Lee, Y.-G. Hwang, M.-G. Jang, Fine-grained named entity recognition and relation extraction for question answering, in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR’07 (ACM, New York, 2007), pp. 799–800
Chapter Google Scholar
X. Li, D. Roth, Learning question classifiers, in Proceedings of the 19th International Conference on Computational Linguistics—Volume 1. COLING’02 (Association for Computational Linguistics, Stroudsburg, 2002), pp. 1–7
Chapter Google Scholar
A. Moschitti, S. Quarteroni, Linguistic kernels for answer re-ranking in question answering systems. Inf. Process. Manag. 47(6), 825–842 (2011)
Article Google Scholar
A. Moschitti, S. Quarteroni, R. Basili, S. Manandhar, Exploiting syntactic and shallow semantic kernels for question answer classification, in ACL (2007)
Google Scholar
L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank citation ranking: bringing order to the web, Technical report, Stanford InfoLab, 1999
Google Scholar
S. Quarteroni, S. Manandhar, Designing an interactive open-domain question answering system. Nat. Lang. Eng. 15(1), 73–95 (2009)
Article Google Scholar
S. Quarteroni, A.V. Ivanov, G. Riccardi, Simultaneous dialog act segmentation and classification from human–human spoken conversations, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE Press, New York, 2011), pp. 5596–5599
Chapter Google Scholar
L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
E. Saquete, P. Martinez-Barco, R. Munoz, J. Vicedo, Splitting complex temporal questions for question answering systems, in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (Association for Computational Linguistics, Stroudsburg, 2004), p. 566
Google Scholar
F. Sebastiani, Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article Google Scholar
C.C. Shilakes, J. Tylman, Enterprise Information Portals (Merrill, Columbus, 1998), p. 16
Google Scholar
R.F. Simmons, Answering English questions by computer: a survey. Commun. ACM 8(1), 53–70 (1965)
Article Google Scholar
A. Stolcke, SRILM-an extensible language modeling toolkit, in Seventh International Conference on Spoken Language Processing (2002)
Google Scholar
M. Surdeanu, M. Ciaramita, H. Zaragoza, Learning to rank answers to non-factoid questions from web collections. Comput. Linguist. 37(2), 351–383 (2011)
Article Google Scholar
Y. Yang, An evaluation of statistical approaches to text categorization. Inf. Retr. 1(1), 69–90 (1999)
Article Google Scholar
D. Zelenko, C. Aone, A. Richardella, Kernel methods for relation extraction, in JMLR (2003)
Google Scholar
C. Zhai, J. Lafferty, A study of smoothing methods for language models applied to ad hoc information retrieval, in Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR’01 (ACM, New York, 2001), pp. 334–342
Google Scholar
D. Zhang, W.S. Lee, Question classification using support vector machines, in Proceedings of SIGIR (ACM, New York, 2003)
Google Scholar
G.D. Zhou, J. Su, Named entity recognition using an HMM-based chunk tagger, in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Association for Computational Linguistics, Stroudsburg, 2002), pp. 473–480
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milan, Italy
Stefano Ceri, Alessandro Bozzon, Marco Brambilla, Emanuele Della Valle, Piero Fraternali & Silvia Quarteroni

Authors

Stefano Ceri
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Bozzon
View author publications
You can also search for this author in PubMed Google Scholar
Marco Brambilla
View author publications
You can also search for this author in PubMed Google Scholar
Emanuele Della Valle
View author publications
You can also search for this author in PubMed Google Scholar
Piero Fraternali
View author publications
You can also search for this author in PubMed Google Scholar
Silvia Quarteroni
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ceri, S., Bozzon, A., Brambilla, M., Della Valle, E., Fraternali, P., Quarteroni, S. (2013). Natural Language Processing for Search. In: Web Information Retrieval. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39314-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-39314-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39313-6
Online ISBN: 978-3-642-39314-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics