Abstract
This paper argues that the World Wide Web could be regarded not only as an information resource but also as a dynamic, multilingual, least controlled, easy to access and untagged language corpus. In order to support this idea, we realized a method, which is able to extract bilingual lexicons from parallel WWW pages by two-stage alignment. Language pairs of German, English and Chinese have been selected but the realization is independent of any natural language, domain or markup.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Die, N., Bonhomme, P., Romary, L.: XCES: An XML-based Encoding Standard for Linguistic Corpora. Proceedings of the Second International Conference on Language Resources and Evaluation, Athens, Greece, (2000) 121–126
Mengel, A., Lezius, W.: An XML-based Representation Format for Syntactically Annotated corpora. Proceedings of the Second International Conference on Language Resources and Evaluation, Athens, Greece, (2000)
Resnik, P.: Parallel Strands: A Preliminary Investigation into Mining Web for bilingual Text, in: Farwell, D., Gerber, L., Hovy, E. (ed.): Machine Translation and the Information Soup: Third Conference of the Association for Machine Translation in the Americas (AMTA-98), Langhorne, PA, Lecture Notes in Artificial Intelligence 1529, Springer, October, 1998
Li, F., Sheng, HY., Weisweber, W.: Extracting and aligning Bilingual Text from Internet Resources. Proceedings 5th Natural Language Processing Pacific Rim Symposium Beijing, China, Nov. 5-7 (1999) 1–5
Gale, W., Church, K.W.: A Program for Aligning Sentences in Bilingual Corpora. Computational Linguistics Vol. 19 (1993) 75–102
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, F., Sheng, H., Weisweber, W. (2001). World Wide Web — A Multilingual Language Resource. In: Zhong, N., Yao, Y., Liu, J., Ohsuga, S. (eds) Web Intelligence: Research and Development. WI 2001. Lecture Notes in Computer Science(), vol 2198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45490-X_46
Download citation
DOI: https://doi.org/10.1007/3-540-45490-X_46
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42730-8
Online ISBN: 978-3-540-45490-8
eBook Packages: Springer Book Archive