Abstract
Waka is a form of traditional Japanese poetry with a 1300-year history. In this paper, we attempt to discover characteristics common to a collection of Waka poems. As a formalism for characteristics, we use regular patterns where the constant parts are limited to sequences of auxiliary verbs and postpositional particles. We call such patterns fushi. The problem is to find automatically significant fushi patterns that characterize the poems
Solving this problem requires a reliable significance measure for the patterns. Brāzma et al. (1996) proposed such a measure according to the MDL principle. Using this method, we report successful results in finding patterns from five anthologies. Some of the results are quite stimulating, and we hope that they will lead to new discoveries. Based on our experience, we also propose a pattern-based text data mining system. Further research into waka poetry is now proceeding using this system
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
H. Ahonen, O. Heinonen, M. Klementtinen, and A.I. Verkamo: Mining in the phrasal frontier. In Proc. 1st European Symposium on Principles of Data Mining and Knowledge Discovery(PKDD’97), 343–350, 1997.
D. Angluin: Finding patterns common to a set of strings. In Proc. 11th Annual Symposium on Theory of Computing, 130–141, 1979.
H. Arimura, T. Shinohara, and S. Otsuki: Finding minimal generalizations for unions of pattern languages and its application to inductive inference from positive data. In Proc. 11th Annual Symposium on Theoretical Aspects of Computer Science (STACS’94), 649–660, 1994.
G. Bownas and A. Thwaite: The Penguin Book of Japanese verse. Penguin Books Ltd., 1964.
A. Brāzma, E. Ukkonen, and J. Vilo:Discovering unbounded unions of regular pattern languages from positive examples. In Proc. 7th International Symposium on Algorithms and Computation (ISAAC’96), 95–104, 1996.
R. Feldman and I. Dagan: Knowledge discovery in textual databases (KDT). In Proc. 1st International Conference on Knowledge Discovery and Data Mining (KDD’95), 112–117, 1995.
E. M. Gold: Language identification in the limit. Information and Control, 10: 447–474, 1967.
J. Rissanen: Modeling by the shortest data description. Automatica, 14: 465–471, 1978.
T. Shinohara: Polynomial-time inference of pattern languages and its applications. In Proc. 7th IBM Symposium on Mathematical Foundations of Computer Science, 191–209, 1982.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yamasaki, M., Takeda, M., Fukuda, T., Nanri, I. (1998). Discovering Characteristic Patterns from Collections of Classical Japanese Poems. In: Arikawa, S., Motoda, H. (eds) Discovey Science. DS 1998. Lecture Notes in Computer Science(), vol 1532. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49292-5_12
Download citation
DOI: https://doi.org/10.1007/3-540-49292-5_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65390-5
Online ISBN: 978-3-540-49292-4
eBook Packages: Springer Book Archive