Compare and contrast Marr and Nishihara’s and Biederman’s theories of object recognition. How well do they explain how we are able to recognize three dimensional objects despite changes in viewing angle?
Humphreys and Bruce (1989) proposed a model of object recognition that fits a wider context of cognition. According to them, the recognition of objects occurs in a series of stages. First, sensory input is generated, leading to perceptual classification, where the information is compared with previously stored descriptions of objects. Then, the object is recognized and can be semantically classified and subsequently named. This approach is, however, over-simplified. Other theories like Marr and Nishihara’s and Biederman’s explain in more detail the processes involved in the stages of perceptual and semantic classification. This essay will compare and contrast these two latter theories and evaluate their contribution to 3D object recognition. In doing so, it will consider the viewpoint invariant or viewpoint dependent debate and compare both approaches to others like Tarr and Bülthoff’s and Foster and Gilson’s. According to Humphreys and Bruce (1989), the first stage of object recognition is the early visual processing of the retinal image, as for example Marr’s primal sketch, in which a two dimensional description is formed. In the second stage a description of the object is generated, as for example Marr’s 2 ½ D sketch, in which a description of depth and orientation of visible surfaces is formed in relation to the view point of the observer and is therefore viewpoint dependent. In the third stage (perceptual classification) a structural description is created, similar to the processes forming Marr’s 3D model representation. The main focus of both Marr and Biedermann theories appear to be on the second and third stages of this sequence. Marr and Nishihara (1978) proposed a theory of object recognition based on generating a 3D object-centered representation, which allows the object to be recognized by any angle. According to them, this representation was based on a canonical coordinate frame which is achieved by defining the central axis of an object. To locate the main axis, the shape of the object is generated from the information provided by the 2 ½ D sketch based on the object’s occluding contours. The boundaries of the object’s silhouette are used to generate the contour of the object and are referred to as contour generator. Once the shape of the object is generated the main axis is located. Areas of concavity and convexity are then used to divide the object into smaller parts. Following, the axes for each sub-section are identified and each component is represented via a generalized cone known as primitive. In this way a 3D image of the object is generated and a match between the arrangement of components and a stored 3D model description is performed to identify the object. These 3D models are hierarchical and include both global and detailed information stored in a hierarchically organized catalog (Kaye, 2010).
Marr’s ideas about object recognition have been extended and adapted by Biedermann. Like Marr and Nishihara’s, Biedermann’s theory is also based on representing complex objects using a series of more simple primitives. However, Biedermann’s primitives are not limited to generalized cones. Instead, he proposes that complex objects are made up of arrangements of basic component parts such as cylinders and cubes known as geons. Similar to Marr and Nishihara theory, this division into component parts is also based on geometrical properties of occluding contours in the image, in particular that parts are defined in relation to sharp concavities on contours. However, different from Marr, Biedermann claims that contour generation is not needed to recover a 3D shape. Instead, he proposes that each geon has a key feature that remains invariant independently of the viewpoint. So, first the key...
References: Eysenck, M.W. and Keane, M.T. (1995) “Cognitive psychology: A student’s handbook”, East Sussex, Psychology Press.
Hayward, W.G. (Oct 2003) ‘After the viewpoint debate: where next in object recognition?’, Trends in Cognitive Sciences, vol 7, no.10, pp. 425–7.
Humphreys, G.W. and Bruce, V. (1989) Visual Cognition: Computational, Experimental and Neuropsychological Perspectives, Hove, Lawrence Erlbaum Associates Ltd, in Kaye, H. (2010) (Ed) “Cognitive Psychology”, Milton Keynes, The Open University, p.106.
Kaye, H. (2010) (Ed) “Cognitive Psychology”, Milton Keynes, The Open University.
Warrington, E.K. and Taylor, A.M. (1978) ‘Two categorical stages of object recognition’, Perception, vol.7, pp.695–705, in Kaye, H. (2010) (Ed) “Cognitive Psychology”, Milton Keynes, The Open University, p.123.
Please join StudyMode to read the full document