Abstract
The development of the Internet and the World Wide Web can be either a threat to the survival of indigenous languages or an opportunity for their development. The choice between cultural diversity and linguistic uniformity is in our hands and the outcome depends on our capability to devise, design and use tools and techniques for the processing of natural languages. Unfortunately natural language processing requires extensive expertise and large collections of reference data. Our research is concerned with the economical and therefore semi-automatic or automatic acquisition of such linguistic information necessary for the development of indigenous or multilingual information systems. In this paper, we propose new methods and variants of existing methods for part-of- speech tagging. We comparatively and empirically analyze the proposed methods and existing reference methods using the Brown English language corpus and we present some preliminary remarks on experiments with an Indonesian language Corpus.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Biber, D.: Co-occurrence Patterns among Collocations: A Tool for Corpus-based Lexical Knowledge Acquisition. Computational Linguistics 19(3), 531–538 (1993)
Brill, E., Magerman, D., Marcus, M., Santorini, B.: Deducing Linguistic Structure from the Statistics of Large Corpora. In: Proceedings of the DARPA Speech and Natural Language Workshop, pp. 275–282 (1990)
Brill, E.: Automatic Grammar Induction and Parsing Free Text: A Transformation-based Approach. In: Proceedings of ACL 31, Columbus OH (1993)
Charniak, E., Hendrickson, C., Jacobson, N., Perkowitz, M.: Equations for Part-of-speech Tagging. In: Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 784–789 (1993)
Church, K.W.: A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In: Proceedings of ICASSP-S9, Glasgow, Scotland (1989)
Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A Practical Part-of-Speech Tagger. In: The 3rd Conference on Applied Natural Language Processing, Trento, Italy (1991)
Cutting, D.R., Pedersen, J.O., Karger, D., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: Proceedings of SIGIR ’g2, pp. 318–329 (1992)
Elman, J.L.: Finding Structure in Time. Cognitive Science, 179–211 (1990)
Finch, S., Chater, N.: Bootstrapping Syntactic Categories using Statistical Methods. In: Daelemans, W., Powers, D. (eds.) Background and Experiments in Machine Learning of Natural Language, Tilburg University, Institute for Language Technology and AI, pp. 229–235 (1992)
Finch, S.: Finding Structure in Language. Ph.D. Thesis. University of Edinburgh, Scotland (1993)
Francis, W.N., Kucera, F.: Frequency Analysis of English Usage. Houghton Mifflin, Boston (1982)
Jelinek, F.: Robust Part-of-speech Tagging using a Hidden Markov Model. Technical Report. IBM, T.J. Watson Research Center (1985)
Kneser, R., Ney, H.: Forming Word Classes by Statistical Clustering for Statistical Language Modelling. In: Kohler, R., Rieger, B.B. (eds.) Contributions to Quantitative Linguistics, Dordrecht, The Netherlands, pp. 221–226 (1993)
Kupiec, J.: Robust Part-of-Speech Tagging using a Hidden Markov Model. Computer Speech and Language 6, 225–242 (1992)
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Marcus, M., Kim, G., Marcinkiewicz, M., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: Annotating Predicate Argument Structure. In: ARPA Human Language Technology Workshop (1994)
Miller, B.: National Institute of Standards and Technology (2000), http://math.nist.gov/javanumerics/jama/
Schutze, H.: Distributional Part-of-speech Tagging. In: EACL7, pp. 141–148 (1999)
van Dongen, S.: Graph Clustering by Flow Simulation. Ph.D. Thesis. University of Utrecht, The Netherlands (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bressan, S., Indradjaja, L.S. (2004). Part-of-Speech Tagging Without Training. In: Aagesen, F.A., Anutariya, C., Wuwongse, V. (eds) Intelligence in Communication Systems. INTELLCOMM 2004. Lecture Notes in Computer Science, vol 3283. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30179-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-30179-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23893-5
Online ISBN: 978-3-540-30179-0
eBook Packages: Springer Book Archive