Skip to main content
Log in

Structural Queries in Electronic Corpora

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We present a methodology for automatically constructing structural hyperlinks in electronic technical corpora. A structural hyperlink connects components of a document that have specified structural properties with word-based content similarity. Our approach enables queries that may be posed in terms of keywords, as well as structural segments such as definitions, figures, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. J. Allan, “Automatic hypertext construction,” Ph.D. thesis, Department of Computer Science, Cornell University, 1995.

  2. J. Allan, “Automatic hypertext link typing,” in Proc. of Hypertext 1996, Bethesda, Maryland, 1996.

  3. B. Donald, J. Jennings, and D. Rus, “Analyzing teams of cooperating mobile robots,” in Proc. of the Int. Conf. on Robotics and Automation, San Diego, 1994.

  4. H. Fujisawa, Y. Nakano, and K. Kurino, “Segmentation methods for character recognition: From segmentation to document structure analysis,” Proc. of the IEEE, Vol. 80, No. 7, 1992.

  5. M. Fuller, E. Mackie, R. Sacks-Davis, and R. Wilkinson, “Structured answers for large structured document collections,” in Proc. of the Sixteenth Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 1993, pp. 205–213.

  6. L. Gravano, H. Garcia-Molina, and A. Tomasic, “The efficacy of GlOSS for the text database discovery problem,” Technical Report no. STAN-CS-TN-93-01, Computer Science Department, Stanford University, 1993.

  7. M. Hearst and C. Plaunt, “Subtopic structuring for full-length document access,” in Proc. of the Sixteenth Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 1993, pp. 59–68.

  8. D. Huttenlocher, G. Klanderman, and W. Rucklidge, “Comparing images using the Hausdorff distance,” IEEE Transactions on Pattern Matching and Machine Intelligence, 1993.

  9. A. Jain and S. Bhattacharjee, “Address block location on envelopes using Gabor filters,” Pattern Recognition, Vol. 25, No. 12, 1992.

  10. P. Kilpelainen and H. Mannila, “Retrieval from hierarchical texts by partial patterns,” in Proc. of the Sixteenth Annual Int. ACMSIGIR Conf. on Research and Development in Information Retrieval, 1993, pp. 214–222.

  11. M. Mizuno, Y. Tsuji, T. Tanaka, H. Tanaka, M. Iwashita, and T. Temma, “Document recognition system with layout structure generator,” NEC Research and Development, Vol. 32, No. 3, 1991.

  12. G. Nagy, S. Seth, and M. Vishwanathan, “Aprototype document image analysis system for technical journals,” Computer, Vol. 25, No. 7, 1992.

  13. D. Rus and D. Subramanian, “Customizing information access,” ACM Computing Surveys, Vol. 27, No. 4, pp. 627–662, 1996.

    Article  Google Scholar 

  14. D. Rus and D. Subramanian, “Information retrieval, information structure, and information agents,” ACM Transactions on Information Systems, Vol. 15, No. 1, pp. 67–101, 1997.

    Article  Google Scholar 

  15. D. Rus and K. Summers, “Using whitespace for automated document structuring,” to appear in Advances in Digital Libraries, N. Adam, B. Bhargava, and Y. Yesha (Eds.), Springer-Verlag, Lecture Notes in Computer Science, 1995.

  16. D. Rus and K. Summers, “Geometric algorithms and experiments for automated document structuring,” Journal of Mathematical and Computer Modelling, Vol. 6, No. 1, pp. 55–83, 1997.

    Article  Google Scholar 

  17. G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, 1989.

  18. G. Salton, “The smart document retrieval project,” in Proc. of the Fourteenth Annual Int. ACM/SIGIR Conf. on Research and Development in Information Retrieval, 1991, pp. 356–358.

  19. G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw-Hill: New York, 1983.

    Google Scholar 

  20. G. Salton and J. Allan, “Selective text utilization and text traversal,” in Hypertext' 93 Proc., Seattle, Washington, 1993, pp. 131–144.

  21. G. Salton, J. Allan, C. Buckley, and A. Singhal, “Automatic analysis, theme generation, and summarization of machine-readable texts,” Science, Vol. 264, pp. 1421–1426, 1994.

    Google Scholar 

  22. G. Salton and A. Singhal, “Automatic text theme generation and the analysis of text structure,” Technical Report TR94-1438, Cornell University, Department of Computer Science, 1994.

  23. Y. Tanosaki, K. Suzuki, K. Kikuchi, and M. Kurihara, “A logical structure analysis system for documents,” Proc. of the Second Int. Symposium on Interoperable Information Systems, 1988.

  24. S. Tsujimoto and H. Asada, “Major components of a complete text reading system,” in Proc. of the IEEE, 1992, Vol. 80, No. 7.

  25. H. Turtle, “Inference networks for document retrieval,” Ph.D. thesis, University of Massachusetts, Amherst, 1990.

    Google Scholar 

  26. D. Wang and S. Srihari, “Classification of newspaper image blocks using texture analysis,” Computer Vision, Graphics, and Image Processing, Vol. 47, 1989.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rus, D., Allan, J. Structural Queries in Electronic Corpora. Multimedia Tools and Applications 6, 153–169 (1998). https://doi.org/10.1023/A:1009656615964

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009656615964

Navigation