Advertisement

Document Structure Analysis with Syntactic Model and Parsers: Application to Legal Judgments

  • Hirokazu Igari
  • Akira Shimazu
  • Koichiro Ochimizu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7258)

Abstract

The structure of a type of documents described in a common format like legal judgments can be expressed by and extracted by using syntax rules. In this paper, we propose a novel method for document structure analysis, based on a method to describe syntactic structure of documents with an abstract document model, and a method to implement a document structure parser by a combination of syntactic parsers. The parser implemented with this method has high generality and extensibility, thus it works well for a variety of document types with common description format, especially for legal documents such as judgments and legislations, while achieving high accuracy.

Keywords

document structure analysis document model document structure parser 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bacci, L., Spinosa, P., Marchetti, C., Battistoni, R., Senate, I.: Automatic mark-up of legislative documents and its application to parallel text generation. IDT, 45 (2009)Google Scholar
  2. 2.
    Barzilay, R., Lee, L.: Catching the drift: Probabilistic content models, with applications to generation and summarization. In: Proceedings of HLT-NAACL, vol. 2004 (2004)Google Scholar
  3. 3.
    Blei, D.M., Lafferty, J.D.: Topic models. Text Mining: Classification, Clustering, and Applications 10, 71 (2009)CrossRefGoogle Scholar
  4. 4.
    Ford, B.: Parsing expression grammars: a recognition-based syntactic foundation. ACM SIGPLAN Notices 39, 111–122 (2004)CrossRefGoogle Scholar
  5. 5.
    Hutton, G., Meijer, E.: Monadic parsing in haskell. Journal of Functional Programming 8(4), 437–444 (1998)zbMATHCrossRefGoogle Scholar
  6. 6.
    Klink, S., Dengel, A., Kieninger, T.: Document structure analysis based on layout and textual features. In: Proc. of International Workshop on Document Analysis Systems, DAS 2000, pp. 99–111. Citeseer (2000)Google Scholar
  7. 7.
    Lee, K.H., Choy, Y.C., Cho, S.B.: Logical structure analysis and generation for structured documents: a syntactic approach. IEEE Transactions on Knowledge and Data Engineering, 1277–1294 (2003)Google Scholar
  8. 8.
    Li, W., McCallum, A.: Pachinko allocation: Scalable mixture models of topic correlations. J. of Machine Learning Research (2008) (submitted)Google Scholar
  9. 9.
    Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Proc. SPIE Electronic Imaging, vol. 5010, pp. 197–207. Citeseer (2003)Google Scholar
  10. 10.
    Moens, M.F., Uyttendaele, C.: Automatic text structuring and categorization as a first step in summarizing legal cases. Information Processing & Management 33(6), 727–737 (1997)CrossRefGoogle Scholar
  11. 11.
    Moors, A., Piessens, F., Odersky, M.: Parser combinators in scala. CW Reports, vol. CW491. Department of Computer Science, KU Leuven (2008)Google Scholar
  12. 12.
    Namboodiri, A., Jain, A.: Document structure and layout analysis. In: Digital Document Processing, pp. 29–48 (2007)Google Scholar
  13. 13.
    Odersky, M., Altherr, P., Cremet, V., Emir, B., Maneth, S., Micheloud, S., Mihaylov, N., Schinz, M., Stenman, E., Zenger, M.: An overview of the scala programming language. Technical report. Citeseer (2004)Google Scholar
  14. 14.
    Rangoni, Y., Belaïd, A.: Document Logical Structure Analysis Based on Perceptive Cycles. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 117–128. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  15. 15.
    Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Handbook of Latent Semantic Analysis, vol. 427(7), pp. 424–440 (2007)Google Scholar
  16. 16.
    Summers, K.: Automatic discovery of logical document structure (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hirokazu Igari
    • 1
  • Akira Shimazu
    • 1
  • Koichiro Ochimizu
    • 1
  1. 1.School of Information ScienceJapan Advanced Institute of Science and TechnologyIshikawaJapan

Personalised recommendations