Abstract
CAPTCHAs are small puzzles which should be easily solvable by human beings but hard to solve for computers. They build a security cornerstone of the modern Internet service landscape, deployed in essentially any kind of login service, allowing to distinguish authorized human beings from automated attacks. One of the most popular and successful systems today is reCAPTCHA. As many other systems, reCAPTCHA is based on distorted images of words, where the distortion system evolves over time and determines different generations of the system. In this work, we analyze three recent generations of reCAPTCHA and present an algorithm that is capable of solving at least 5% of the challenges generated by these versions. We achieve this by applying a specialized variant of shape contexts proposed by Belongie et al. to match entire words at once. In order to handle the ellipse shaped distortions employed in one of the generations, we propose a machine learning algorithm that virtually eliminates the distortion. Finally, an improved shape matching strategy allows us to use word dictionaries of a reasonable size (with approximately 20,000 entries).
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: Human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008) Cited on page 1
Belongie, S., Malik, J., Puzicha, J.: Shape context: A new descriptor for shape matching and object recognition. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) NIPS, pp. 831–837. MIT Press, Cambridge (2000) Cited on pages 2 and 4
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 679–698 (1986), http://portal.acm.org/citation.cfm?id=11274.11275 Cited on page 6
Chellapilla, K., Larson, K., Simard, P.Y., Czerwinski, M.: Building segmentation based human-friendly human interaction proofs (HIPs). In: Baird, H.S., Lopresti, D.P. (eds.) HIP 2005. LNCS, vol. 3517, pp. 1–26. Springer, Heidelberg (2005) Cited on page 4
Chellapilla, K., Larson, K., Simard, P.Y., Czerwinski, M.: Computers beat humans at single character recognition in reading based human interaction proofs (HIPs). In: CEAS (2005) Cited on page 4
Govindaraju, V., Krishnamurthy, R.K.: Holistic handwritten word recognition using temporal features derived from off-line images. Pattern Recognition Letters 17(5), 537–540 (1996) Cited on page 5
Houck, C.W.: Decoding recaptcha (2010), http://www.n3on.org/projects/reCAPTCHA/docs/reCAPTCHA.docx Cited on pages 3 and 6
Lavrenko, V., Rath, T.M., Manmatha, R.: Holistic word recognition for handwritten historical documents. In: DIAL, pp. 278–287. IEEE Computer Society Press, Los Alamitos (2004) Cited on page 5
Lladós, J., Roy, P.P., Rodríguez, J.A., Sánchez, G.: Word spotting in archive documents using shape contexts. In: Martí, J., Benedí, J.M., Mendonça, A.M., Serrat, J. (eds.) IbPRIA 2007. LNCS, vol. 4478, pp. 290–297. Springer, Heidelberg (2007) Cited on page 4
Madhvanath, S., Govindaraju, V.: Contour-based image preprocessing for holistic handwritten word recognition. In: ICDAR, pp. 536–539. IEEE Computer Society Press, Los Alamitos (1997) Cited on page 5
Madhvanath, S., Govindaraju, V.: The role of holistic paradigms in handwritten word recognition. IEEE Trans. Pattern Anal. Mach. Intell. 23(2), 149–164 (2001) Cited on page 5
Mori, G., Belongie, S., Malik, J.: Shape contexts enable efficient retrieval of similar shapes. In: CVPR, vol. 1, pp. 723–730. IEEE Computer Society Press, Los Alamitos (2001) Cited on page 4
Mori, G., Belongie, S.J., Malik, J.: Efficient shape matching using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1832–1837 (2005) Cited on page 9
Mori, G., Malik, J.: Recognizing objects in adversarial clutter: Breaking a visual CAPTCHA. In: CVPR, vol. 1, pp. 134–144. IEEE Computer Society Press, Los Alamitos (2003) Cited on page 4
Vertanen, K.: Words in 10 lists (2010), http://www.keithv.com/software/ Cited on page 10
Wilkins, J.: Strong CAPTCHA guidelines v1.2 (2009), http://www.bitland.net/ Cited on page 3
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 IFIP International Federation for Information Processing
About this paper
Cite this paper
Baecher, P., Büscher, N., Fischlin, M., Milde, B. (2011). Breaking reCAPTCHA: A Holistic Approach via Shape Recognition. In: Camenisch, J., Fischer-Hübner, S., Murayama, Y., Portmann, A., Rieder, C. (eds) Future Challenges in Security and Privacy for Academia and Industry. SEC 2011. IFIP Advances in Information and Communication Technology, vol 354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21424-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-21424-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21423-3
Online ISBN: 978-3-642-21424-0
eBook Packages: Computer ScienceComputer Science (R0)