Abstract
We present a methodology for automatically constructing structural hyperlinks in electronic technical corpora. A structural hyperlink connects components of a document that have specified structural properties with word-based content similarity. Our approach enables queries that may be posed in terms of keywords, as well as structural segments such as definitions, figures, etc.
Similar content being viewed by others
References
J. Allan, “Automatic hypertext construction,” Ph.D. thesis, Department of Computer Science, Cornell University, 1995.
J. Allan, “Automatic hypertext link typing,” in Proc. of Hypertext 1996, Bethesda, Maryland, 1996.
B. Donald, J. Jennings, and D. Rus, “Analyzing teams of cooperating mobile robots,” in Proc. of the Int. Conf. on Robotics and Automation, San Diego, 1994.
H. Fujisawa, Y. Nakano, and K. Kurino, “Segmentation methods for character recognition: From segmentation to document structure analysis,” Proc. of the IEEE, Vol. 80, No. 7, 1992.
M. Fuller, E. Mackie, R. Sacks-Davis, and R. Wilkinson, “Structured answers for large structured document collections,” in Proc. of the Sixteenth Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 1993, pp. 205–213.
L. Gravano, H. Garcia-Molina, and A. Tomasic, “The efficacy of GlOSS for the text database discovery problem,” Technical Report no. STAN-CS-TN-93-01, Computer Science Department, Stanford University, 1993.
M. Hearst and C. Plaunt, “Subtopic structuring for full-length document access,” in Proc. of the Sixteenth Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 1993, pp. 59–68.
D. Huttenlocher, G. Klanderman, and W. Rucklidge, “Comparing images using the Hausdorff distance,” IEEE Transactions on Pattern Matching and Machine Intelligence, 1993.
A. Jain and S. Bhattacharjee, “Address block location on envelopes using Gabor filters,” Pattern Recognition, Vol. 25, No. 12, 1992.
P. Kilpelainen and H. Mannila, “Retrieval from hierarchical texts by partial patterns,” in Proc. of the Sixteenth Annual Int. ACMSIGIR Conf. on Research and Development in Information Retrieval, 1993, pp. 214–222.
M. Mizuno, Y. Tsuji, T. Tanaka, H. Tanaka, M. Iwashita, and T. Temma, “Document recognition system with layout structure generator,” NEC Research and Development, Vol. 32, No. 3, 1991.
G. Nagy, S. Seth, and M. Vishwanathan, “Aprototype document image analysis system for technical journals,” Computer, Vol. 25, No. 7, 1992.
D. Rus and D. Subramanian, “Customizing information access,” ACM Computing Surveys, Vol. 27, No. 4, pp. 627–662, 1996.
D. Rus and D. Subramanian, “Information retrieval, information structure, and information agents,” ACM Transactions on Information Systems, Vol. 15, No. 1, pp. 67–101, 1997.
D. Rus and K. Summers, “Using whitespace for automated document structuring,” to appear in Advances in Digital Libraries, N. Adam, B. Bhargava, and Y. Yesha (Eds.), Springer-Verlag, Lecture Notes in Computer Science, 1995.
D. Rus and K. Summers, “Geometric algorithms and experiments for automated document structuring,” Journal of Mathematical and Computer Modelling, Vol. 6, No. 1, pp. 55–83, 1997.
G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, 1989.
G. Salton, “The smart document retrieval project,” in Proc. of the Fourteenth Annual Int. ACM/SIGIR Conf. on Research and Development in Information Retrieval, 1991, pp. 356–358.
G. Salton and M. McGill, Introduction to Modern Information Retrieval, McGraw-Hill: New York, 1983.
G. Salton and J. Allan, “Selective text utilization and text traversal,” in Hypertext' 93 Proc., Seattle, Washington, 1993, pp. 131–144.
G. Salton, J. Allan, C. Buckley, and A. Singhal, “Automatic analysis, theme generation, and summarization of machine-readable texts,” Science, Vol. 264, pp. 1421–1426, 1994.
G. Salton and A. Singhal, “Automatic text theme generation and the analysis of text structure,” Technical Report TR94-1438, Cornell University, Department of Computer Science, 1994.
Y. Tanosaki, K. Suzuki, K. Kikuchi, and M. Kurihara, “A logical structure analysis system for documents,” Proc. of the Second Int. Symposium on Interoperable Information Systems, 1988.
S. Tsujimoto and H. Asada, “Major components of a complete text reading system,” in Proc. of the IEEE, 1992, Vol. 80, No. 7.
H. Turtle, “Inference networks for document retrieval,” Ph.D. thesis, University of Massachusetts, Amherst, 1990.
D. Wang and S. Srihari, “Classification of newspaper image blocks using texture analysis,” Computer Vision, Graphics, and Image Processing, Vol. 47, 1989.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Rus, D., Allan, J. Structural Queries in Electronic Corpora. Multimedia Tools and Applications 6, 153–169 (1998). https://doi.org/10.1023/A:1009656615964
Issue Date:
DOI: https://doi.org/10.1023/A:1009656615964