Skip to main content
Log in

CLASCN: Candidate Network Selection for Efficient Top-k Keyword Queries over Databases

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Keyword Search Over Relational Databases (KSORD) enables casual or Web users easily access databases through free-form keyword queries. Improving the performance of KSORD systems is a critical issue in this area. In this paper, a new approach CLASCN (Classification, Learning And Selection of Candidate Network) is developed to efficiently perform top-k keyword queries in schema-graph-based online KSORD systems. In this approach, the Candidate Networks (CNs) from trained keyword queries or executed user queries are classified and stored in the databases, and top-k results from the CNs are learned for constructing CN Language Models (CNLMs). The CNLMs are used to compute the similarity scores between a new user query and the CNs from the query. The CNs with relatively large similarity score, which are the most promising ones to produce top-k results, will be selected and performed. Currently, CLASCN is only applicable for past queries and New All-keyword-Used (NAU) queries which are frequently submitted queries. Extensive experiments also show the efficiency and effectiveness of our CLASCN approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wang S, Zhang K. Searching databases with keywords. J. Comput. Sci. & Technol., 2005, 20(1): 55–62.

    Article  Google Scholar 

  2. Su Q, Widom J. Indexing relational database content offline for efficient keyword-based search. In Proc. 9th Int. Database Engineering & Application Symposium (IDEAS’05), Montreal, Canada, 2005, pp.297–306.

  3. Wen J, Wang S. SEEKER: Keyword-based information retrieval over relational databases. Journal of Software, 2005, 16(7): 1270–1281.

    Article  Google Scholar 

  4. Hristidis V, Gravano L, Papakonstantinou Y. Efficient IR-style keyword search over relational databases. In Proc. 29th Int. Conf. Very Large Data Bases (VLDB’03), Berlin, Germany, 2003, pp.850–861.

  5. Hristidis V, Papakonstantinou Y. DISCOVER: Keyword search in relational databases. In Proc. 28th Int. Conf. Very Large Data Bases (VLDB’02), Hong Kong, China, 2002, pp.670–681.

  6. Bhalotia G, Hulgeri A, Nakhe C et al. Keyword searching and browsing in databases using BANKS. In Proc. 18th Int. Conf. Data Engineering (ICDE’02), San Jose, Canada, 2002, pp.431–440.

  7. Kacholia V, Pandit S, Chakrabarti S et al. Bidirectional expansion for keyword search on graph databases. In Proc. 31st Int. Conf. Very Large Data Bases (VLDB’05), Trondheim, Norway, 2005, pp.505–516.

  8. Agrawal S et al. DBXplorer: A system for keyword search over relational databases. In Proc. 18th Int. Conf. Data Engineering (ICDE’02), San Jose, Canada, 2002, pp.5–16.

  9. Baeza-Yates R, Ribeiro-Neto B et al. Modern Information Retrieval. ACM Press, 1999, pp.27–30.

  10. Hassan M, Alhajj R et al. Simplified access to structured databases by adapting keyword search and database selection. In Proc. 19th Annual ACM Symp. Applied Computing (SAC’04), Nicosia, Cyprus, 2004, pp.674–678.

  11. Callan J, Lu Z, Croft W. Searching distributed collections with inference networks. In Proc. 18th Annual ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR’95), Seattle, Washington, USA, 1995, pp.21–28.

  12. Meng W, Yu C, Liu K. Building efficient and effective metasearch engines. ACM Computing Surveys, 2002, 34(1): 48–84.

    Article  Google Scholar 

  13. Callan J P, Connell M E, Du A. Automatic discovery of language models for text databases. In Proc. ACM SIGMOD Int. Conf. Management of Data (SIGMOD’99), Pennsylvania, USA, 1999, pp.479–490.

  14. Gauch S, Wang G, Gomez M. Profusion: Intelligent fusion from multiple, distributed search engines. J. Univ. Comput. Sci., 1996, 2(9): 637–649.

    Google Scholar 

  15. Salton G, Buckley C. Term-weighting approaches in automatic retrieval. Information Processing and Management, 1998, 24(5): 513–523.

    Article  Google Scholar 

  16. Bruno N, Chaudhuri S, Gravano L. STHoles: A multidimensional workload-aware histogram. In Proc. ACM SIGMOD Int. Conf. Management of Data (SIGMOD’01), Santa Barbara, CA, USA, 2001, pp.211–222.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Zhang.

Additional information

This work is supported by the National Natural Science Foundation of China under Grant Nos. 60473069 and 60496325.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, J., Peng, ZH., Wang, S. et al. CLASCN: Candidate Network Selection for Efficient Top-k Keyword Queries over Databases. J Comput Sci Technol 22, 197–207 (2007). https://doi.org/10.1007/s11390-007-9026-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-007-9026-6

Keywords

Navigation