Structure-Based Queries over the World Wide Web

  • Tao Guan
  • Miao Liu
  • Lawrence V. Saxton
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1507)


With the increasing importance of the World Wide Web as an information repository, how to locate documents of interest becomes more and more significant. The current practice is to send keywords to search engines. However, these search engines lack the capability to take the structure of the Web into consideration. We thus present a novel query language, NetQL and its implementation, for accessing the World Wide Web. Rather than working on global text-full search, NetQL is designed for local structure-based queries. It not only exploits the topology of web pages given by hyperlinks, but also supports queries involving information inside pages. A novel approach to extract information from web pages is presented. In addition, the methods to control the complexity of query processing are also addressed in this paper.


World Wide Query Processing Noun Phrase Query Language Textual Line 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adeberg, B.: NoDOSE - A tool for semi-automatically extracting structured and semistructured data from text documents. In: Proc. of the ACM SIGMOD International Conference on Management of Data (1998)Google Scholar
  2. 2.
    Ashish, N., Knoblock, C.:Wrapper generation for semi-structured Internetsources. In: 1st Workshop on Management of Semistructured Data, Arizona (1997)Google Scholar
  3. 3.
    Atzeni, P., Mecca, G., Merialdo, P.:Semistructured and structured data in theWeb: going back and forth. In: 1st Workshop on Management of Semistructured Data (1997)Google Scholar
  4. 4.
    Costantino, M., Morgan, R.G., Collingham, R.J., Garigliano, R.: Natural language processing and information extraction: Qualitative analysis of financial news articles. In: Proc. of the Conf. on Computational Intelligence for Financial Engineering (1997)Google Scholar
  5. 5.
    Francis, W.N., Kucera, H.: Frequency analysis of English usage: lexicon and grammar. Houghton Mifflin (1982)Google Scholar
  6. 6.
    Fernandez, M., Suciu, D.: Query optimizations for semi-structured data using graph schema. In: ICDE 1998 (1998)Google Scholar
  7. 7.
    Goldman, R., Widom, J.: Interactive query and search in semistructured databases. Technical Report, Stanford University (1998)Google Scholar
  8. 8.
    Hammer, J., Molina, H.G., Cho, J., Aranha, R., Crespo, A.: Extracting semistructured information from the Web. In: 1st Workshop on Management of Semistructured Data, Arizona (1997)Google Scholar
  9. 9.
    Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proc. of Int’l Conf. on Research on Computational Linguistics, Taiwan (1997)Google Scholar
  10. 10.
    Kozima, H., Furugori, T.: Similarity between words computed by spreading activation on an English dictionary. In: Proc. of EACL-1993(Utrecht), pp. 232–239 (1993)Google Scholar
  11. 11.
    Konopnicki, D., Shmueli, O.: W3QS: A query system for the world wide web. In: VLDB 1995, Zurich, pp. 54–65 (1995)Google Scholar
  12. 12.
    Lacrox, Z., Sahuguet, A., Chandrasekar, R., Srinivas, B.: A novel approach to querying the Web: Integrating Retrieval and Browsing. In: Embley, D.W. (ed.) ER 1997. LNCS, vol. 1331. Springer, Heidelberg (1997)Google Scholar
  13. 13.
    Lakshmanan, L.V.S., Sadri, F., Subramanian, I.N.: A declarative language for querying and restructuring the Web. In: Proc. of 6th. International Workshop on Research Issues in Data Engineering, RIDE 1996, New Orleans (February 1996)Google Scholar
  14. 14.
    Liu, M.: NetQL: an intelligent web query language. Master Thesis, University of ReginaGoogle Scholar
  15. 15.
    Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: an on-line lexical database. International Journal of Lexicography (1993)Google Scholar
  16. 16.
    Mendelzon, A., Mihaila, G., Milo, T.: Querying the World Wide Web. In: 1st Int. Conf. on Parallel and Distributed Information System (1996)Google Scholar
  17. 17.
    Smith, D., Lopez, M.: Information extraction for semi-structured documents. In: 1st Workshop on Management of Semistructured Data, Arizona (1997)Google Scholar
  18. 18.
    Soderland, S.: Learning to extract text-based information from the world wide wed. In: Proc. of 3rd International Conf. on Knowledge Discovery and Data Mining (KDD 1997) (1997)Google Scholar
  19. 19.
    Smeaton, A.F., Quigley, I.: Experiments on using semantics distances betweenwords in image caption retrieval. In: SIGIR 1996 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Tao Guan
    • 1
  • Miao Liu
    • 1
  • Lawrence V. Saxton
    • 1
  1. 1.Department of Computer ScienceUniversity of ReginaReginaCanada

Personalised recommendations