Breaking Through the Syntax Barrier: Searching with Entities and Relations

Chakrabarti, Soumen

doi:10.1007/978-3-540-30115-8_3

Breaking Through the Syntax Barrier: Searching with Entities and Relations

Soumen Chakrabarti²²

Conference paper

3941 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3201))

Abstract

The next wave in search technology will be driven by the identification, extraction, and exploitation of real-world entities represented in unstructured textual sources. Search systems will either let users express information needs naturally and analyze them more intelligently, or allow simple enhancements that add more user control on the search process. The data model will exploit graph structure where available, but not impose structure by fiat. First generation Web search, which uses graph information at the macroscopic level of inter-page hyperlinks, will be enhanced to use fine-grained graph models involving page regions, tables, sentences, phrases, and real-world-entities. New algorithms will combine probabilistic evidence from diverse features to produce responses that are not URLs or pages, but entities and their relationships, or explanations of how multiple entities are related.

Download to read the full chapter text

Chapter PDF

References

Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: International Conference on Digital Libraries (DL), vol. 5, ACM, New York (2000)
Google Scholar
Agichtein, E., Lawrence, S., Gravano, L.: Learning search engine specific query transformations for question answering. In: WWW Conference, pp. 169–178 (2001)
Google Scholar
Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: A system for keyword-based search over relational databases. In: ICDE, San Jose, CA, IEEE, Los Alamitos (2002)
Google Scholar
Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases-an introduction. Journal of Language Engineering 1(1), 29–81 (1995)
Google Scholar
Balmin, A., Hristidis, V., Papakonstantinou, Y.: Authority-based keyword queries in databases using ObjectRank. In: VLDB, Toronto (2004)
Google Scholar
Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using BANKS. In: ICDE, San Jose, CA, IEEE, Los Alamitos (2002)
Google Scholar
Bikel, D.M., Schwartz, R.L., Weischedel, R.M.: An algorithm that learns what’s in a name. Machine Learning 34(1-3), 211–231 (1999)
Article MATH Google Scholar
Brin, S.: Extracting patterns and relations from the World Wide Web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)
Chapter Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th World-Wide Web Conference, WWW7 (1998)
Google Scholar
Cohen, W., Richman, J.: Learning to match and cluster entity names. In: SIGKDD, vol. 8 (2002)
Google Scholar
Dietterich, T.G.: Machine learning for sequential data: A review. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 15–30. Springer, Heidelberg (2002)
Chapter Google Scholar
Dumais, S., Banko, M., Brill, E., Lin, J., Ng, A.: Web question answering: Is more always better? In: SIGIR, pp. 291–298 (2002)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soder-land, S., Weld, D.S., Yates, A.: Web-scale information extraction in KnowItAll. In: WWW Conference, ACM, New York (2004)
Google Scholar
Faloutsos, C., McCurley, K.S., Tomkins, A.: Connection subgraphs in social networks. In: Workshop on Link Analysis, Counterterrorism, and Privacy, SIAM International Conference on Data Mining (2004)
Google Scholar
Gildea, D., Jurafsky, D.: Automatic labeling of semantic roles. Computational Linguistics 28(3), 245–288 (2002)
Article Google Scholar
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: International Conference on Computational Linguistics, vol. 14, pp. 539–545 (1992)
Google Scholar
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: VLDB, pp. 850–861 (2003)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (2001)
Google Scholar
Mooney, R.J.: Learning semantic parsers: An important but under-studied problem. In: AAAI Spring Symposium on Language Learning: An Interdisciplinary Perspective, March 2004, pp. 39–44 (2004)
Google Scholar
Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: ACL, vol. 40 (2002)
Google Scholar
Popescu, A., Etzioni, O., Kautz, H.: Towards a theory of natural language interfaces to databases. In: Intelligent User Interfaces, Miami, pp. 149–157. ACM, New York (2003)
Google Scholar
Prager, J., Brown, E., Coden, A., Radev, D.: Question-answering by predictive annotation. In: SIGIR, pp. 184–191. ACM, New York (2000)
Chapter Google Scholar
Ramakrishnan, G., Chakrabarti, S., Paranjpe, D.A., Bhattacharyya, P.: Is question answering an acquired skill? In: WWW Conference, New York, pp. 111–120 (2004)
Google Scholar
Turney, P.D.: Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, p. 491. Springer, Heidelberg (2001)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

IIT Bombay,
Soumen Chakrabarti

Authors

Soumen Chakrabarti
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut
Dipartimento di Informatica, Università degli Studi di Bari,
Floriana Esposito
Pisa KDD Laboratory, ISTI - CNR, Area della Ricerca di Pisa, Via Giuseppe Moruzzi 1, Pisa, Italy
Fosca Giannotti
Dipartimento di Informatica, Via F. Buonarroti 2, 56127, Pisa, Italy
Dino Pedreschi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chakrabarti, S. (2004). Breaking Through the Syntax Barrier: Searching with Entities and Relations. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Machine Learning: ECML 2004. ECML 2004. Lecture Notes in Computer Science(), vol 3201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30115-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-30115-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23105-9
Online ISBN: 978-3-540-30115-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics