Abstract
We show how a probabilistic interpretation of an ill defined problem, the problem of finding line breaks in a paragraph, can lead to an efficient new algorithm that performs well. The graphical model that results from the probabilistic interpretation has the advantage that it is easy to tune due to the probabilistic approach. Furthermore, the algorithm optimizes the probability a break up is acceptable over the whole paragraph, it does not show threshold effects and it allows for easy incorporation of subtle typographical rules. Thanks to the architecture of the Bayesian network, the algorithm is linear in the number of characters in a paragraph. Empirical evidence suggests that this algorithm performs closer to results published through desk top publishing than a number of existing systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fine, J.: Line breaking and page breaking. TUGBoat 21(3), 210–221 (2000)
Freytag, A.: Line Breaking Properties Unicode Standard Annex #14 (part of the Unicode Standard). Technical Report (2002)
Knuth, D.E.: Computers & Typesetting Volume A, The TeXbook. The TeXbook, vol. A. Addison-Wesley, Reading (1984)
Knuth, D.E., Plass, M.F.: Breaking Paragraphs into Lines. Software—Practice and Experience 11, 1119–1184 (1981)
Lauritzen, S.L., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their applications to expert systems (with discussion). Journal of the Royal Statistical Society B 50, 157–224 (1988)
Liang, F.M.: Word Hy-phen-a-tion by Computer. Ph.D. Thesis, Department of Computer Science, Stanford University (August 1983)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems, Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1998)
Extensible Stylesheet Language (XSL). Version 1.0, W3C Recommendation, October 15 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bouckaert, R.R. (2003). A Probabilistic Line Breaking Algorithm. In: Gedeon, T.(.D., Fung, L.C.C. (eds) AI 2003: Advances in Artificial Intelligence. AI 2003. Lecture Notes in Computer Science(), vol 2903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24581-0_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-24581-0_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20646-0
Online ISBN: 978-3-540-24581-0
eBook Packages: Springer Book Archive