Chris Walker's Tech Blog: Stereo Vision Basics

Intro

Reconstructing the 3D geometry of a scene requires at least two images taken by two separate cameras at the same time. This is called Stereo Reconstruction or Stereo Vision. With Stereo Vision the distances of objects relative to the cameras can be determined. Stereo Vision can be used with 3D Pattern Matching and Object Tracking and is therefore used in applications such as bin picking, surveillance, robotics, and inspection of object surfaces, height, shape, etc.

How it works

The relative orientations of the two cameras must be known. The images produced from two calibrated cameras provide disparity information (or distance) between two corresponding points in the two images. The resulting “disparity map” is used to determine the relative depths of objects in the scene.

1. Typical Stereo Vision System[i]

Since the camera lenses used will introduce some distortion, it is necessary to calibrate the stereo vision system. Calibrating also provides the needed spatial coordinates: (X, Y, Z) world coordinates and (R_X, R_Y, R_Z) rotation of the cameras. Calibration is performed by introducing a calibration grid at various poses in the two camera images, as shown below. The real-world distance between dots on the calibration grid are know, for example, in centimeters.

2. Two Cameras in Stereo Vision Geometry showing point Pj of the Calibration Grid projected onto the Camera 1 Scene and Camera 2 Scene

After several poses of the Calibration Grid (at various angles and positions) are provided, the resulting calibration is used to produce a rectified image in which both projected image planes are reconstructed to lie on the same plane. Rectifying the images simplifies the geometry of the system and greatly reduces the calculations required in transforming the scene into 3D information. The resulting geometry is called epipolar standard geometry. With this transformation, both cameras are treated as if they were in the same plane and vertically aligned. The pixel to real-world coordinate transformation is also used to correct lens distortion. The best part is that the transformation can be performed one time and the resulting map stored in a table, allowing online images to be rectified much faster.

3. Epipolar Standard Geometry. Both Image planes are vertically and horizontally aligned. The epipolar line for a point is the line that has the same row coordinate.

Depth Resolution

For a simple stereo vision system in the epipolar standard geometry, the depth of a point is given by the following formula:

Depth = f * b/d, where f = focal length, b = distance between cameras, and d = the disparity or distance between corresponding points.

4. Simple Stereo Vision Model[ii]

In this model, the disparity is defined as u_L - u_R, the distance between projected points on the image plane. Note that depth and disparity are inversely proportional and as the depth increases, the disparity increases exponentially. Also, the accuracy of depth measurements increases as the distance between the cameras increases.

Point Grey, a manufacturer of a variety of camera products including stereo vision products, provides a Stereo Vision accuracy chart downloadable from their website[iii]. Various values such as focal length, camera baseline, and the stereo resolution can be modified with the resulting depth resolution provided at various disparities. See http://www.ptgrey.com/support/kb/data/stereoaccuracy.xls . Objects in the scene close to the camera, with a large separation between cameras and a large focal length, will provide more accurate depth measurements.

Examples in the Lab

1. Two cameras were mounted in a general stereo geometry.

5. Stereo Vision Setup in Lab

2. In this example, I used the National Instrument’s Labview Vision Development Module example “Calibrate Stereo Vision System.vi”.

3. A calibration grid of dots was introduced to both cameras and images taken simultaneously. This was repeated for several angles and positions of the calibration grid.

6. Calibration Grid in both Camera Images

4. Labview’s built-in functions provided a rectified image and disparity map.

Starbucks in 3D

7. Left - Camera 1 Rectified Image; Right - Camera 2 Rectified Image

8. Center of Starbucks Cup is 30.56cm away, blue straw is 35.25cm away

My hand in 3D

9. Hand in 3D

Deep in Thought

10. Left - Camera 1 Rectified Image; Right - Camera 2 Rectified Image

11. Deep in Thought - Stereo 3D Image

3D Peace

12. Left - Camera 1 Rectified Image; Right - Camera 2 Rectified Image

13. 3D Reconstruction - Stereo Vision

Halcon 3D Stereo Example

The example below is a screenshot from the MVTec Halcon HDevelop Example “locate_pipe_joints_stereo.hdev”; uses multi-view camera images and surface based 3D Matching algorithms.

14. Halcon Software Stereo Vision Example of Robot Bin Picking and 3D Pattern Matching

[i] Picture taken from http://www.ni.com/white-paper/14103/en/

[ii] Picture taken from “A Guide to Stereovision and 3D Imaging”, by Dinesh Nair, Chief Architect at National Instruments, Austin, TX. See http://www.techbriefs.com/component/content/article/14925

[iii] See http://www.ptgrey.com/support/kb/data/stereoaccuracy.xls

Chris Walker's Tech Blog

3/25/2014

Stereo Vision Basics

No comments:

Post a Comment