A New Algorithm for Unsupervised Induction of Concatenative Morphology

  • Harald Hammarström
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4002)


This paper sketches a new algorithm for unsupervised induction of concatenative morphology. The algorithm differs markedly from previous approaches in both segmentation and paradigm induction. It is illustrated here with the respect to suffixes, using the following notation:

W: the set (not bag) of words in the corpus

\(s \triangleleft w\): s is a suffix of the word w i.e there exists a (possibly empty) string x such that w = xs

Stems(s) = {x|xsW}: the set of all strings (“stems”) that make a word in the corpus if appended with s

\(f(s) = |\{w \in W|s \triangleleft w\}|\): the number of words with suffix s (equals |Stems(s)|)

s i (w): the suffix of w that begins at position 0 ≤i ≤ |w|

Q(w) = {s i (w)| i < |w|}: the set of (non-empty) suffixes of s

S = ∪ w ∈ W Q(w): all suffixes in the corpus

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Harald Hammarström
    • 1
  1. 1.Chalmers University of TechnologyGothenburgSweden

Personalised recommendations