Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alam, H., Hartono, R., and Rahman, A.F.R. (2004). Extraction and management of content from HTML documents. In: A. Antonacopoulos and J. Hu (Eds.). Web Document Analysis: Challenges and Opportunities. Singapore: World Scientific, pp. 95-112.
Antonacopoulos, A., Karatzas, D., and Ortiz Lopez, J. (2001). Accessing textual information embedded in internet images. Proceedings of SPIE Internet Imaging II, San Jose, USA, pp. 198-205.
Antonacopoulos, A. and Karatzas, D. (2002). Fuzzy segmentation of characters in Web images based on human colour perception. In: D. Lopresti, J. Hu, and R. Kashi (Eds.). Document Analysis Systems V. London: Springer, LNCS 2423, pp. 295-306.
Antonacopoulos, A. and Delporte, F. (1999). Automated interpretation of visual representations: extracting textual information from WWW images. In: R. Paton and I. Neilson (Eds.). Visual Representations and Interpretations. London: Springer.
Baird, H.S. and Popat, K. (2004). Web security and document image analysis. In: A. Antonacopoulos and J. Hu (Eds.). Web Document Analysis: Challenges and Opportunities. Singapore: World Scientific.
Blood, R. Weblogs: a history and perspective. http://www.rebeccablood.net/essays/weblog history.html.
Breuel, T.M., Janssen, W.C., Popat, K., and Baird, H.S. (2004). Reflowable document images. In: A. Antonacopoulos and J. Hu (Eds.). Web Document Analysis: Challenges and Opportunities. Singapore: World Scientific.
Brown, M.K., Glinski, S.C., and Schmult, B.C. (2001). Web page analysis for voice browsing. Proceedings of the First International Workshop on Web Document Analysis (WDA2001), Seattle, USA.
Chen, L.Q., Xie, X., Ma, W.Y., and Zhang, H.J. (2003). Dress: a slicing tree based web page representation for various display sizes. WWW2003 (poster), Budapest, Hungary.
Chen, Y., Ma, W., and Zhang, H.J. (2003). Detecting web page structure for adaptive viewing on small form-factor devices. WWW2003, Budapest, Hungary.
Cohen, W.W., Hurst, M., and Jensen, L.S. (2004). A wrapper induction system for complex documents and its application to tabular data on the web. In: A. Antonacopoulos and J. Hu (Eds.). Web Document Analysis: Challenges and Opportunities. Singapore: World Scientific, pp. 155-178.
Di Iorio, A. and Vitali, F. (2003). A xanalogical collaborative editing environment. In: A. Antonacopoulos and J. Hu (Eds.). Second International Workshop on Web Document Analysis (WDA2003).
Gupta, S., Kaiser, G., Neistadt, D., and Grimm, P. (2003). Dom based content extraction of html documents. WWW2003, Budapest, Hungary.
Hsu, C. and Dung, M. (1998). Generating finite-state transducers for semi-structured data extraction from the web. Journal of Information Systems, 23, pp. 521-538.
Hu, J. and Bagga, A. (2004). Functional categorization of images in web documents. IEEE Multimedia Special Issue on Content Repurposing.
International workshop on web document analysis. http://www.csc.liv.ac.uk/{∼wda2001∼wda2003}.
Jain, A.K. and Yu, B. (1998). Automatic text location in images and video frames. Pattern Recognition, 31(12), pp. 2055-2076.
Ashish, N. and Knoblock, C. (1997). Wrapper generation for semi-structured internet sources. Proceedings of PODS/SIGMOD'97.
Yee, K.P. CritLink: Public Web Annotation. http://zesty.ca/crit.
Kanungo, T., Lee, C.H., and Bradford, R. (2001). What fraction of images on the web contain text? Proceedings of the First International Workshop on Web Document Analysis (WDA2001), Seattle, USA, pp. 43-46.
Karatzas, D. and Antonacopoulos, A. (2004). Text extraction from web images based on a split-and-merge segmentation method using colour perception. Proceedings of the Seventeenth International Conference on Pattern Recognition (ICPR2004), Cambridge, UK. Silver Spring, MD: IEEECS Press, pp. 634-637.
Kasik, D.J.(2004). Strategies for consistent image partitioning. IEEE Multimedia Special Issue on Content Repurposing.
Kushmerick, N., Weld, D. and Doorenbos, R. (1997). Wrapper induction for information extraction. Proceedings of the Fifteenth International Conference on Artificial Intelligence, pp. 729-735.
Lai, W.C., Chang, E.Y., and Cheng, K.T. (2004). An anatomy of a large-scale image search engine. In: A. Antonacopoulos and J. Hu (Eds.). Web Document Analysis: Challenges and Opportunities. Singapore: World Scientific.
Leuf, B. and Cummingham, W. (2001). The Wiki way. New York: Addison-Wesley.
Lopresti, D. and Wilfong, G. (2004). Applications of graph probing to web document analysis. In: A. Antonacopoulos and J. Hu (Eds.). Web Document Analysis: Challenges and Opportunities. Singapore: World Scientific.
Lopresti, D. and Zhou, J. (2000). Locating and recognizing text in WWW images. Information Retrieval, 2(2/3), pp. 177-206.
Mukherjee, S., Yang, G., Tan, W., and Ramakrishnan, I.V. (2003). Automatic discovery of semantic structures in html documents. Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR2003), Edinburgh, Scotland.
Muslea, I. (1999). Extracting patterns for information extraction tasks: a survey. AAAI-99 Workshop on Machine Learning for Information Extraction.
Nanno, T., Saito, S., and Okumura, M. (2003). Structuring web pages based on repetition of elements. In: A. Antonacopoulos and J. Hu (Eds.). Second International Workshop on Web Document Analysis (WDA2003).
Narayan, M., Williams, C., Perugini, S., and Ramakrishnan, N. (2004). Staging transformations for multimodal web interaction management. WWW2004. New York, USA, pp. 212-223.
Penn, G., Hu, J., Luo, H., and McDonald, R. (2001). Flexible web document analysis for delivery to narrow-bandwidth devices. Proceedings of the Sixth International Conference on Document Analysis and Recognition (ICDAR01), Seattle, WA, USA, pp. 1074-1078.
Perantonis, S.J., Gatos, B., and Maragos, V. (2003). A novel Web image processing algorithm for text area identification that helps commercial OCR engines to improve their Web image recognition efficiency. Proceedings of the Second International Workshop on Web Document Analysis (WDA2003), Edinburgh, Scotland, pp. 61-64.
Ramachandran, S. and Kashi, R. (2003). An architecture for ink annotations on web documents. Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR2003), Edinburgh, Scotland.
Ramakrishnan, I.V., Stent, A., and Yang, G. (2004). Hearsay: enabling audio browsing on hypertext content. WWW2004, New York, USA, pp. 80-89.
Schenker, Last, M., Bunke, H., and Kandel, A. (2004). Clustering of web documents using a graph model. In: A. Antonacopoulos and J. Hu (Eds.). Web Document Analysis: Challenges and Opportunities. Singapore: World Scientific.
Shih, L.K. and Karger, D.R. (2004). Using URLs and table layout for web classification tasks. WWW2004, New York, USA, pp. 193-202.
Singh, G. (2004). Content repurposing. IEEE Multimedia Special Issue on Content Repurposing.
Tao, C. and Munson, E.V. (2003). A relevance model for web image search. Proceedings of the Second International Workshop on Web Document Analysis (WDA2003), Edinburgh, Scotland, pp. 58-60.
The ACM Symposium on Document Engineering. http://www.documentengineering.org..
Thuong, T.T. and Roisin, C. (2004). Structured media for authoring multi-media documents. In: A. Antonacopoulos and J. Hu (Eds.). Web Document Analysis: Challenges and Opportunities. Singapore: World Scientific.
van Ossenbruggen, J., Rutledge, L., and Hardman, L. (2003). Towards a multimedia formatting vocabulary. WWW2003, Budapest, Hungary.
Villard, L., Roisin, C., and Layaida, N. (2000). An XML based multimedia document processing model for content adaptation. Digital Documents and Electronic Publishing Conference (DDEP00), pp. 1-12.
Wang, Y. and Hu, J. (2002). A machine learning based approach for table detection on the web. WWW2002, Honolulu, Hawaii, USA.
Yang, Y., Chen, Y., and Zhang, H.J. (2004). HTML page analysis based on visual cues. In: A. Antonacopoulos and J. Hu (Eds.). Web Document Analysis: Challenges and Opportunities. Singapore: World Scientific.
Yoshida, M., Torisawa, K., and Tsujii, J. (2004). Extracting attributes and their values from web pages. In: A. Antonacopoulos and J. Hu (Eds.). Web Document Analysis: Challenges and Opportunities. Singapore: World Scientific.
Zhou, J., Lopresti, D., and Tasdizen, T. (1998). Finding text in color images. Proceedings of the IS&T/SPIE Symposium on Electronic Imaging, San Jose, California, pp. 130-140.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag London Limited
About this chapter
Cite this chapter
Antonacopoulos, A., Hu, J. (2007). Web Document Analysis. In: Chaudhuri, B.B. (eds) Digital Document Processing. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84628-726-8_18
Download citation
DOI: https://doi.org/10.1007/978-1-84628-726-8_18
Publisher Name: Springer, London
Print ISBN: 978-1-84628-501-1
Online ISBN: 978-1-84628-726-8
eBook Packages: Computer ScienceComputer Science (R0)