Abstract
While the problem to find needed information on the Web is being solved by the major search engines, access to the information in Big text, large-scale text datasets, and documents (Biomedical literature, e-books, conference proceedings, etc.) is still very rudimentary (Lin and Cohen (2010) A very fast method for clustering big text datasets. In: ECAI, Lisbon). Thus, keyword-search is often the only way to find the needle in the haystack. There is abundance of relevant research results in the Semantic Web research community that offers more robust access interfaces compared to keyword-search. Here we describe a new information retrieval engine that offers advanced user experience combining keyword-search with navigation over an automatically inferred hierarchical document index. The internal representation of the browsing index as a collection of UFOs (Gubanov et al. (2009) Ibm ufo repository. In: VLDB, Lyon; Gubanov et al. (2011) Learning unified famous objects (ufo) to bootstrap information integration. In: IEEE IRI, Las Vegas) yields more relevant search results and improves user experience.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adelberg B (1998) NoDoSE – a tool for semi-automatically extracting structured and semistructured data from text documents. In: SIGMOD record, Seattle
Agichtein E, Gravano L (2000) Snowball: extracting relations from large plain-text collections. In: ACM DL, San Antonio
Agichtein E, Ipeirotis P, Gravano L (2003) Modeling query-based access to text databases. In: WebDB, San Diego
Agichtein E, Brill E, Dumais S (2006) Improving web search ranking by incorporating user behavior information. In: SIGIR, Seattle
Agrawal S, Chaudhuri S, Das G (2002) Dbxplorer: a system for keyword-based search over relational databases. In: ICDE, San Jose
Anyanwu K, Maduko A, Sheth A (2007) Sparq2l: towards support for subgraph extraction queries in rdf databases. In: WWW, Banff
Arocena GO, Mendelzon AO (1998) Weboql: restructuring documents, databases, and webs. In: ICDE, Orlando
Banko M, Brill E, Dumais S, Lin J (2002) Askmsr: question answering using the worldwide web. In: EMNLP, Philadelphia
Brin S (1998) Extracting patterns and relations from the world wide web. In: EDBT, Valencia
Cai Y, Dong XL, Halevy A, Liu JM, Madhavan J (2005) Personal information management with semex. In: SIGMOD, Baltimore
Califf ME, Mooney RJ (1998) Relational learning of pattern-match rules for information extraction. In: AAAI, Madison
Chakrabarti S (2007) Dynamic personalized pagerank in entity-relation graphs. In: WWW, Banff
Cheng T, Yan X, Chang KCC (2007) Entityrank: searching entities directly and holistically. In: VLDB, Vienna
Crescenzi V, Mecca G (1998) Grammars have exceptions. J Inf Syst (Special issue on Semistructured Data) 23(9):539–565
Crescenzi V, Mecca G, Merialdo P (2001) Roadrunner: towards automatic data extraction from large web sites. In: VLDB, Roma
Crestani F (1997) Application of spreading activation techniques in information retrieval. Artif Intell Rev 11:453
Diederich J, Balke WT, Thaden U (2007) Demonstrating the semantic growbag: automatically creating topic facets for faceteddblp. In: JCDL, Vancouver
Dong X, Halevy A (2007) Indexing dataspaces. In: SIGMOD, Beijing
Downey D, Etzioni O, Soderland S, Weld D (2004) Learning text patterns for web information extraction and assessment. In: AAAI, San Jose
Embley DW, Campbell DM, Jiang YS, Liddle SW, Ng YK, Quass D, Smith, RD (1999) Conceptual-model-based data extraction from multiple-record web pages. Data Knowl Eng 31:227–251
Etzioni O, Cafarella M, Downey D, Kok S, Popescu A, Shaked T, Soderland S, Weld D, Yates A (2004) Web-scale information extraction in knowitall. In: WWW, Manhattan
Freitag D (1998) Machine learning for information extraction in informal domains. Ph.D. thesis, Carnegie Mellon University
Gubanov M, Shapiro L (2011) Using unified famous objects (ufo) to automate Alzheimer’s disease diagnosis. In: IEEE BIBM, Atlanta
Gubanov MN, Popa L, Ho H, Pirahesh H, Chang P, Chen L (2009) Ibm ufo repository. In: VLDB, Lyon
Gubanov M, Shapiro L, Pyayt A (2011) Learning unified famous objects (ufo) to bootstrap information integration. In: IEEE IRI, Las Vegas
Hammer J, McHugh J, Garcia-Molina H (1997) Semistructured data: the TSIMMIS experience. In: Proceedings of the East-European workshop on advances in databases and information systems, St. Petersburg
He H, Wang H, Yang J, Yu PS (2007) Blinks: ranked keyword searches on graphs. In: SIGMOD, Beijing
Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. Technical report S2K-92-09
Hristidis V, Papakonstantinou Y (2002) Discover: keyword search in relational databases. In: VLDB, Hong Kong
Hsu CN, Dung MT (1998) Generating finite-state transducers for semi-structured data extraction from the web. J Inf Syst (Special issue on Semistructured Data) 23(9):521–538
Järvelin K, Kekäläinen J (2000) IR evaluation methods for retrieving highly relevant documents. In: SIGIR, Athens
Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 60:493–502
Klein D, Manning C (2007) Fast exact inference with a factored model for natural language parsing. In: NIPS, Vancouver
Kushmerick N (2000) Wrapper induction: efficiency and expressiveness. Artif Intell 118:15–68
Laender A, Ribeiro-Neto B, Silva A, Teixeira J (2002) A brief survey of web data extraction tools. In: SIGMOD record, Madison,
Laender AHF, Ribeiro-Neto B, da Silva AS (2002) Debye – date extraction by example. Data Knowl Eng 40(2):121–154
Lin F, Cohen WW (2010) A very fast method for clustering big text datasets. In: ECAI, Lisbon
Liu L, Pu C, Han W (2000) XWRAP: an XML-enabled wrapper construction system for web information sources. In: ICDE, San Diego
Madhavan J, Cohen S, Dong X, Halevy A, Jeffery S, Ko D, Yu C (2007) Navigating the seas of structured web data. In: CIDR, Asilomar
Nie Z, Ma Y, Shi S, Wen JR, Ma WY (2007) Web object retrieval. In: WWW, Banff
Ribeiro-Neto BA, Laender AHF, da Silva AS (1999) Extracting semi-structured data through examples. In: CIKM, Kansas City
Sahuguet A, Azavant F (2001) Building intelligent web applications using lightweight wrappers. Data Knowl Eng 36:283–316
Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun ACM 18:613–620
Sayyadian M, LeKhac H, Doan A, Gravano L (2007) Efficient keyword search across heterogeneous relational databases. In: ICDE, Istanbul
Sekine S (2006) On-demand information extraction. In: COLING/ACL, Sydney
Soderland S (1999) Learning information extraction rules for semi-structured and free text. Mach Learn 34:233
Udrea O, Getoor L, Miller RJ (2007) Leveraging data and structure in ontology integration. In: SIGMOD, Beijing
Vanderwende L, Kacmarcik G, Suzuki H, Menezes A (2005) Mindnet: an automatically-created lexical resource. In: HLT/EMNLP, Vancouver
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Wien
About this chapter
Cite this chapter
Gubanov, M., Shapiro, L., Pyayt, A. (2013). ReadFast: Structural Information Retrieval from Biomedical Big Text by Natural Language Processing. In: Özyer, T., Kianmehr, K., Tan, M., Zeng, J. (eds) Information Reuse and Integration in Academia and Industry. Springer, Vienna. https://doi.org/10.1007/978-3-7091-1538-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-7091-1538-1_9
Published:
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-1537-4
Online ISBN: 978-3-7091-1538-1
eBook Packages: Computer ScienceComputer Science (R0)