Object Recognition and Localisation for Item Picking

  • Oytun Akman
  • Pieter Jonker


One of the challenges of future retail warehouses is automating the order-picking process. To achieve this, items in an order tote must be automatically detected and grasped under various conditions. A product recognition and localisation system for automated order-picking in retail warehouses was investigated, which is capable of recognising objects that have a descriptor in the warehouse product database containing both 2D and 3D features. The 2D features are derived from normal CMOS camera images and the 3D features from time-of-flight camera images. 2D features perform best when the object is relatively rigid, illuminated uniformly, and has enough texture. They can cope with partial occlusions and are invariant to rotation, translation, scale, and affine transformations up to some level. 3D features can be fruitfully used for the detection and localisation of objects without texture or dominant colour. The 2D system has a performance of 2–3 frames-per-second (fps) at about 400 extracted features, good enough for a pick-and-place robot. Almost all rigid items with enough texture could be recognised. The method can cope with partial occlusions. The 3D system is insensitive to lighting conditions and finds 3D point clouds, from which geometric descriptions of planes and edges are derived as well as their pose in 3D. The 3D system is a welcome addition to the 2D system, mainly for box-shaped objects without much texture or.


Point Cloud Partial Occlusion Dominant Colour Order Picking Current Seed 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Akman O, Bayramoglu N, Alatan AA, Jonker P (2010) Utilization of spatial information for point cloud segmentation. In: 3DTV-Conference: the true vision-capture, transmission and display of 3D video (3DTV-CON), pp 1–4Google Scholar
  2. 2.
    Akman O, Jonker P (2009) Exploitation of 3d information for directing visual attention and object recognition. In: Proceedings of the eleventh IAPR conference on machine vision applications, pp 50–53Google Scholar
  3. 3.
    Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110:346–359CrossRefGoogle Scholar
  4. 4.
    Bayramoglu N, Akman O, Alatan AA, Jonker P (2009) Integration of 2d images and range data for object segmentation and recognition. In: Proceedings of the twelfth international conference on climbing and walking robots and the support technologies for mobile machines, pp 927–933Google Scholar
  5. 5.
    Fischler MA, Bolles RC (1981) Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24:381–395MathSciNetCrossRefGoogle Scholar
  6. 6.
    Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110CrossRefGoogle Scholar
  7. 7.
    Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27:1615–1630CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited  2012

Authors and Affiliations

  1. 1.Faculty of Mechanical, Maritime and Materials EngineeringDelft University of TechnologyDelftThe Netherlands

Personalised recommendations