Abstract
The next wave in search technology will be driven by the identification, extraction, and exploitation of real-world entities represented in unstructured textual sources. Search systems will either let users express information needs naturally and analyze them more intelligently, or allow simple enhancements that add more user control on the search process. The data model will exploit graph structure where available, but not impose structure by fiat. First generation Web search, which uses graph information at the macroscopic level of inter-page hyperlinks, will be enhanced to use fine-grained graph models involving page regions, tables, sentences, phrases, and real-world-entities. New algorithms will combine probabilistic evidence from diverse features to produce responses that are not URLs or pages, but entities and their relationships, or explanations of how multiple entities are related.
Chapter PDF
References
Agichtein, E., Gravano, L.: Snowball: Extracting relations from large plain-text collections. In: International Conference on Digital Libraries (DL), vol. 5, ACM, New York (2000)
Agichtein, E., Lawrence, S., Gravano, L.: Learning search engine specific query transformations for question answering. In: WWW Conference, pp. 169–178 (2001)
Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: A system for keyword-based search over relational databases. In: ICDE, San Jose, CA, IEEE, Los Alamitos (2002)
Androutsopoulos, I., Ritchie, G.D., Thanisch, P.: Natural language interfaces to databases-an introduction. Journal of Language Engineering 1(1), 29–81 (1995)
Balmin, A., Hristidis, V., Papakonstantinou, Y.: Authority-based keyword queries in databases using ObjectRank. In: VLDB, Toronto (2004)
Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using BANKS. In: ICDE, San Jose, CA, IEEE, Los Alamitos (2002)
Bikel, D.M., Schwartz, R.L., Weischedel, R.M.: An algorithm that learns what’s in a name. Machine Learning 34(1-3), 211–231 (1999)
Brin, S.: Extracting patterns and relations from the World Wide Web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th World-Wide Web Conference, WWW7 (1998)
Cohen, W., Richman, J.: Learning to match and cluster entity names. In: SIGKDD, vol. 8 (2002)
Dietterich, T.G.: Machine learning for sequential data: A review. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 15–30. Springer, Heidelberg (2002)
Dumais, S., Banko, M., Brill, E., Lin, J., Ng, A.: Web question answering: Is more always better? In: SIGIR, pp. 291–298 (2002)
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soder-land, S., Weld, D.S., Yates, A.: Web-scale information extraction in KnowItAll. In: WWW Conference, ACM, New York (2004)
Faloutsos, C., McCurley, K.S., Tomkins, A.: Connection subgraphs in social networks. In: Workshop on Link Analysis, Counterterrorism, and Privacy, SIAM International Conference on Data Mining (2004)
Gildea, D., Jurafsky, D.: Automatic labeling of semantic roles. Computational Linguistics 28(3), 245–288 (2002)
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: International Conference on Computational Linguistics, vol. 14, pp. 539–545 (1992)
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: VLDB, pp. 850–861 (2003)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML (2001)
Mooney, R.J.: Learning semantic parsers: An important but under-studied problem. In: AAAI Spring Symposium on Language Learning: An Interdisciplinary Perspective, March 2004, pp. 39–44 (2004)
Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: ACL, vol. 40 (2002)
Popescu, A., Etzioni, O., Kautz, H.: Towards a theory of natural language interfaces to databases. In: Intelligent User Interfaces, Miami, pp. 149–157. ACM, New York (2003)
Prager, J., Brown, E., Coden, A., Radev, D.: Question-answering by predictive annotation. In: SIGIR, pp. 184–191. ACM, New York (2000)
Ramakrishnan, G., Chakrabarti, S., Paranjpe, D.A., Bhattacharyya, P.: Is question answering an acquired skill? In: WWW Conference, New York, pp. 111–120 (2004)
Turney, P.D.: Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, p. 491. Springer, Heidelberg (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chakrabarti, S. (2004). Breaking Through the Syntax Barrier: Searching with Entities and Relations. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Machine Learning: ECML 2004. ECML 2004. Lecture Notes in Computer Science(), vol 3201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30115-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-30115-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23105-9
Online ISBN: 978-3-540-30115-8
eBook Packages: Springer Book Archive