Advertisement

Journal of Intelligent Information Systems

, Volume 30, Issue 1, pp 1–32 | Cite as

Answering form-based web queries using the data-mining approach

  • Xiaochun Yang
  • Yiu-Kai Ng
Article

Abstract

Web users often post queries through form-based interfaces on the Web to retrieve data from the Web; however, answers to these queries are mostly computed according to keywords entered into different fields specified in a query interface, and their precision and recall could be low. The precision and recall ratios in answering this type of query can be improved by considering closely related previous queries submitted through the same interface, along with their answers. In this paper, we present an approach for enhancing the retrieval of relevant answers to a form-based Web query by adopting the data-mining approach using previous, relevant queries and their answers. Experimental results on a randomly selected set of 3,800 documents retrieved from various Web sites show that our data-mining, query-rewriting approach achieves average precision and true positive ratios on rewritten queries in the upper 80% range, whereas the average false positive ratio is less than 2.0%.

Keywords

Inferred rules Query interface Query-rewriting approach 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Segoufin, L., & Vianu, V. (2001). Representing and querying XML with incomplete information. In Proceedings of the ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, Santa Barbara, CA (pp. 40–50). New York: ACM.Google Scholar
  2. Afrati, F., Li, C., & Ullman, J. D. (2001). Generating efficient plans for queries using views. In Proceedings of the 2001 ACM SIGMOD international conference on management of data, Santa Barbara, CA (pp. 319–330). New York: ACM. (May)CrossRefGoogle Scholar
  3. Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern information retrieval. New York: Addison-Wesley.Google Scholar
  4. Berry, M. W., Dumais, S. T., & O’Brien, G. W. (1995). Using linear algebra for intelligent information retrieval. SIAM Review, 37(4), 573–595.MATHCrossRefMathSciNetGoogle Scholar
  5. Blair, D. C., & Maron, M. E. (1985). An evaluation of retrieval effectiveness for a full-text document-retrieval system. Communications of the ACM, 28(3), 280–299.CrossRefGoogle Scholar
  6. Blockeel, H., & Raedt, L. D. (1998). Top–down induction of first-order logical decision trees. Artificial Intelligence, 101, 1–2.CrossRefMathSciNetGoogle Scholar
  7. Calvanese, D., De Giacomo, G., Lenzerini, M., & Vardi, M. (2000). View-based query processing for regular path queries with inverse. In Proceedings of the ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, Dallas, TX (pp. 58–66). New York: ACM.Google Scholar
  8. Chaudhuri, S., Krishnamurthy, R., Potamianos, S., & Shim, K. (1995). Optimizing queries with materialized views. In Proceedings of the 11th international conference on data engineering (ICDE’95), Taipei, Taiwan (pp. 190–200). Washington, DC: IEEE Computer Society.Google Scholar
  9. Fernandez, M., Suciu, D., & Tan, W.C. (2000). SilkRoute: Trading between relations and XML. In Proceedings of the 9th international conference on World Wide Web, Amsterdam, The Netherlands.Google Scholar
  10. Han, J. & Kamber, M. (2001). Data mining: Concepts and techniques. San Francisco, CA: Morgan Kaufmann.Google Scholar
  11. Konopnicki, D., & Shmueli, O. (1995). W3QS: A query system for the world-wide web. In Proceedings of the 21st international conference on very large data bases, Zurich, Switzerland (pp. 54–65). New York: ACM.Google Scholar
  12. Korfhage, R.R. (1997). Information storage and retrieval. New York: Wiley.Google Scholar
  13. Lakshmanan, L.V.S., Sadri, F., & Subramanian, I.N. (1996). A declarative language for querying and restructuring the web. In Post-ICDE IEEE workshop on research issues in data engineering, New Orleans, LA, February 1996 (p. 12). Washington, DC: IEEE Computer Society.Google Scholar
  14. Lam-Adesian, A.M., & Jones, G. (2001). Applying summarization techniques for term selection in relevance feedback. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans, LA (pp. 1–9). New York: ACM.CrossRefGoogle Scholar
  15. Last, M., Shapira, B., Elovici, Y., Zaafrany, O., & Kandel, A. (2003). Content-based methodology for anomaly detection on the web. In Proceedings of atlantic web intelligence conference (AWIC’03): Advances in web intelligence, Lecture Notes in Artificial Intelligence, vol. 2663 (pp. 113–123). Berlin Heidelberg New York: Springer. (May)Google Scholar
  16. Levy, A.Y., Mendelzon, A.O., Sagiv, Y., & Sivastava, D. (1995). Answering queries using views. In Proceedings of the ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (pp. 95–104). New York: ACM.Google Scholar
  17. Levy, A.Y., Rajaraman, A., & Ordille, J. (1996). Querying heterogeneous information sources using source descriptions. In Proceedings of the 22nd international conference on very large data bases (pp. 251–262).Google Scholar
  18. Liu, S., Liu, F., Yu, C., & Meng, W. (2004). An effective approach to document retrieval via utilizing WordNet and recognizing phrases. In Proceedings of the 27th annual international ACM SIGIR conference (pp. 266–272). New York: ACM.Google Scholar
  19. Martin, J. & Hirschberg, D. (1996). The complexity of learning decision trees. In Proceedings of the international symposium on artificial intelligence & mathematics, Fort Lauderdale, FL (pp. 112–115).Google Scholar
  20. Mendelzon, A. O., & Milo, T. (1997). Formal models of web queries. In Proceedings of the ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, Tucson, AZ (pp. 134–143). New York: ACM. (May)Google Scholar
  21. Mitchell, T. M. (1997). Machine learning. New York: McGraw-Hill.MATHGoogle Scholar
  22. Popa, L., Deutsch, A., Sahuguet, A., & tannen, V. (2000). A chase too far?. In Proceedings of the 2000 ACM SIGMOD international conference on management of data, Dallas, TX (pp. 273–284). New York: ACM.CrossRefGoogle Scholar
  23. Pottinger, R., & Levy, A.Y. (2000). A scalable algorithm for answering queries using views. In Proceedings of the 26th international conference on very large data bases, Cairo, Egypt (pp. 484–495). San Francisco, CA: Morgan-Kaufmann.Google Scholar
  24. Sequeira, K., & Zaki, M. (2002). ADMIT: Anomaly-based data mining for intrusions. In Proceedings of the eight ACM SIGKDD international conference on knowledge discovery and data mining, Alberta, Canada (pp. 386–395). New York: ACM.CrossRefGoogle Scholar
  25. Theeramunkong, T. (2004). Applying passage in Web text mining. International Journal of Intelligent Systems, 19(1-2), 149–158.CrossRefGoogle Scholar
  26. Ullman, J.D. (1997). Information integration using logical views. In Proceedings of the international conference on database theory (pp. 19–40).Google Scholar
  27. Yerra, R., & Ng, Y. -K. (2005). Detecting similar HTML documents using a fuzzy set information retrieval approach. In Proceedings of the IEEE international conference on granular computing (IEEE GrC’05), Beijing, China (pp. 693–699). Washington, DC: IEEE Computer Society.Google Scholar
  28. Zwillinger, D., Krantz, S.G., & Rosen, K.H. (Eds.) (1996) Standard mathematical tables and formulae (30th edition). Boca Raton, FL: CRC Press.MATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.School of Information, Science and EngineeringNortheastern UniversityShenyangChina
  2. 2.Computer Science DepartmentBrigham Young UniversityProvoUSA

Personalised recommendations