Part-of-Speech Tagging Without Training

Bressan, Stéphane; Indradjaja, Lily Suryana

doi:10.1007/978-3-540-30179-0_10

Stéphane Bressan¹⁹ &
Lily Suryana Indradjaja¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3283))

Included in the following conference series:

International Conference on Intelligence in Communication Systems

561 Accesses
1 Citations

Abstract

The development of the Internet and the World Wide Web can be either a threat to the survival of indigenous languages or an opportunity for their development. The choice between cultural diversity and linguistic uniformity is in our hands and the outcome depends on our capability to devise, design and use tools and techniques for the processing of natural languages. Unfortunately natural language processing requires extensive expertise and large collections of reference data. Our research is concerned with the economical and therefore semi-automatic or automatic acquisition of such linguistic information necessary for the development of indigenous or multilingual information systems. In this paper, we propose new methods and variants of existing methods for part-of- speech tagging. We comparatively and empirically analyze the proposed methods and existing reference methods using the Brown English language corpus and we present some preliminary remarks on experiments with an Indonesian language Corpus.

Download to read the full chapter text

Chapter PDF

Part of Speech Tagging for Polish: State of the Art and Future Perspectives

Part-of-Speech (POS) Tagging for the Nyishi Language

From 0 to 10 million annotated words: part-of-speech tagging for Middle High German

Article 08 April 2019

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Biber, D.: Co-occurrence Patterns among Collocations: A Tool for Corpus-based Lexical Knowledge Acquisition. Computational Linguistics 19(3), 531–538 (1993)
Google Scholar
Brill, E., Magerman, D., Marcus, M., Santorini, B.: Deducing Linguistic Structure from the Statistics of Large Corpora. In: Proceedings of the DARPA Speech and Natural Language Workshop, pp. 275–282 (1990)
Google Scholar
Brill, E.: Automatic Grammar Induction and Parsing Free Text: A Transformation-based Approach. In: Proceedings of ACL 31, Columbus OH (1993)
Google Scholar
Charniak, E., Hendrickson, C., Jacobson, N., Perkowitz, M.: Equations for Part-of-speech Tagging. In: Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 784–789 (1993)
Google Scholar
Church, K.W.: A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In: Proceedings of ICASSP-S9, Glasgow, Scotland (1989)
Google Scholar
Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A Practical Part-of-Speech Tagger. In: The 3rd Conference on Applied Natural Language Processing, Trento, Italy (1991)
Google Scholar
Cutting, D.R., Pedersen, J.O., Karger, D., Tukey, J.W.: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In: Proceedings of SIGIR ’g2, pp. 318–329 (1992)
Google Scholar
Elman, J.L.: Finding Structure in Time. Cognitive Science, 179–211 (1990)
Google Scholar
Finch, S., Chater, N.: Bootstrapping Syntactic Categories using Statistical Methods. In: Daelemans, W., Powers, D. (eds.) Background and Experiments in Machine Learning of Natural Language, Tilburg University, Institute for Language Technology and AI, pp. 229–235 (1992)
Google Scholar
Finch, S.: Finding Structure in Language. Ph.D. Thesis. University of Edinburgh, Scotland (1993)
Google Scholar
Francis, W.N., Kucera, F.: Frequency Analysis of English Usage. Houghton Mifflin, Boston (1982)
Google Scholar
Jelinek, F.: Robust Part-of-speech Tagging using a Hidden Markov Model. Technical Report. IBM, T.J. Watson Research Center (1985)
Google Scholar
Kneser, R., Ney, H.: Forming Word Classes by Statistical Clustering for Statistical Language Modelling. In: Kohler, R., Rieger, B.B. (eds.) Contributions to Quantitative Linguistics, Dordrecht, The Netherlands, pp. 221–226 (1993)
Google Scholar
Kupiec, J.: Robust Part-of-Speech Tagging using a Hidden Markov Model. Computer Speech and Language 6, 225–242 (1992)
Article Google Scholar
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Marcus, M., Kim, G., Marcinkiewicz, M., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: Annotating Predicate Argument Structure. In: ARPA Human Language Technology Workshop (1994)
Google Scholar
Miller, B.: National Institute of Standards and Technology (2000), http://math.nist.gov/javanumerics/jama/
Schutze, H.: Distributional Part-of-speech Tagging. In: EACL7, pp. 141–148 (1999)
Google Scholar
van Dongen, S.: Graph Clustering by Flow Simulation. Ph.D. Thesis. University of Utrecht, The Netherlands (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, National University of Singapore,
Stéphane Bressan & Lily Suryana Indradjaja

Authors

Stéphane Bressan
View author publications
You can also search for this author in PubMed Google Scholar
Lily Suryana Indradjaja
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Telematics, Norwegian University of Science and Technology (NTNU), N7491, Trondheim, Norway
Finn Arve Aagesen
Shinawatra University, 99 Moo 10 Bangtoey, 12160, Samkok, Pathum Thani, Thailand
Chutiporn Anutariya
School of Engineering and Technology, Asian Institute of Technology, P.O. Box 4, 12120, Klong Luang, Pathum Thani, Thailand
Vilas Wuwongse

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bressan, S., Indradjaja, L.S. (2004). Part-of-Speech Tagging Without Training. In: Aagesen, F.A., Anutariya, C., Wuwongse, V. (eds) Intelligence in Communication Systems. INTELLCOMM 2004. Lecture Notes in Computer Science, vol 3283. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30179-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-30179-0_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23893-5
Online ISBN: 978-3-540-30179-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Part-of-Speech Tagging Without Training

Abstract

Chapter PDF

Similar content being viewed by others

Part of Speech Tagging for Polish: State of the Art and Future Perspectives

Part-of-Speech (POS) Tagging for the Nyishi Language

From 0 to 10 million annotated words: part-of-speech tagging for Middle High German

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Part-of-Speech Tagging Without Training

Abstract

Chapter PDF

Similar content being viewed by others

Part of Speech Tagging for Polish: State of the Art and Future Perspectives

Part-of-Speech (POS) Tagging for the Nyishi Language

From 0 to 10 million annotated words: part-of-speech tagging for Middle High German

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation