Computer Vision Tracking

coatiarfAI and Robotics

Oct 17, 2013 (3 years and 10 months ago)

73 views

Computer Vision TrackingStarting simple: Marker tracking
 Has been done for more than 10 years
 Some phones today are faster than computers at that time
 Several open source solutions exist
 Fairly simple to implement
 Standard computer vision methods
 A rectangular marker provides 4 corner points
 Enough for pose estimation!Marker Tracking Pipeline
Goal: Do all this in less than 20 milliseconds on a mobile phone…Marker Tracking – OverviewMarker Tracking – Fiducial
Detection
 Threshold the whole image to black and white
 Search scanline by scanline for edges (white to
black)
 Follow edge until either
 Back to starting pixel
 Image border
 Check for size
 Reject fiducials early that are too small (or too large)Marker Tracking – Rectangle
Fitting
 Start with an arbitrary point “x” on the contour
 The point with maximum distance must be a corner c0
 Create a diagonal through the center
 Find points c1 & c2 with maximum distance left and right of
diag.
 New diagonal from c1 to c2
 Find point c3 right of diagonal with maximum distanceMarker Tracking – Pattern checking
 Calculate homography using the 4 corner points
 “Direct Linear Transform” algorithm
 Maps normalized coordinates to marker coordinates
(simple perspective projection, no camera model)
 Extract pattern by sampling
 Check pattern
 Id (implicit encoding)
 Template (normalized cross correlation)Marker Tracking – Corner
refinement
 Refine corner coordinates
 Critical for high quality tracking
 Remember: 4 points is the bare minimum!
 So these 4 points should better be accurate…
 Detect sub-pixel coordinates
 E.g. Harris corner detector
- Specialized methods can be faster and more accurate
 Strongly reduces jitter!
 Undistort corner coordinates
 Remove radial distortion from lensMarker tracking – Pose
estimation
 Calculates marker position and rotation
relative to the camera
 Initial estimation directly from homography
 Very fast, but coarse
 Jitters a lot…
 Refinement via Gauss-Newton iteration
 6 parameters (3 for position, 3 for rotation) to refine
 At each iteration we optimize on the reprojection errorMarker tracking – Reprojection
error
o
oMarker tracking – Pose
estimation
 Calculates marker position and rotation relative to the
camera
 Initial estimation directly from homography
 Very fast, but coarse
 Jitters a lot…
 Refinement via Gauss-Newton iteration
 6 parameters (3 for position, 3 for rotation) to refine
 At each iteration
- Calculate reprojection error ε0
- Calculate Jacobian matrix J (matrix of all first-order partial derivates)
- Solve the equation JT J d = -JT ε0 for d (e.g. using Cholesky factorization)
- Add d to pose
- Quit if accurate enough or if max. steps reachedTracking for Handheld AR
SLIDE 92
Tracking challenges in ARToolKit
Jittering
Occlusion Unfocused camera, Dark/unevenly lit
(image by M. Fiala)
(Photoshop illustration)
motion blur scene, vignetting
Image noise
False positives and inter-marker
(e.g. poor lens, block coding /
compression, neon tube)
confusionTracking for Handheld AR
SLIDE 93
Later improvements 1.
ARTag by Mark Fiala, NRC Canada
 Edge-based border detection w/ image gradients
 No thresholding (uneven illumination OK)
 Immune to partial occlusions
 Binary encoded marker patterns (11 bits)
 Large marker set w/o speed penalty (2048-46 illegal = 2002
IDs)
 CRC & FEC
 Binary available for Win, Linux, Mac OSX
 Issues
 More complex (slower)
 Prone to unsharp imagesTracking for Handheld AR
SLIDE 94
Later improvements 2.
 Visual Codes by Michael Rohs, ETHZ
 “Barcode-like” large marker set + error
correction
 Embed meta-info for enhanced interaction
 Symbian and WinCE implementation
 StbTracker by Daniel Wagner, TU Graz
 Successor of ARToolKitPlus
 Improved performance
 Several marker types
 Win2k/XP, WinCE, Symbian
 …Tracking for Handheld AR
SLIDE 95
Studierstube Tracker (StbTracker)
 Daniel Wagner, Graz University of Technology and
Istvan Barakonyi, Imagination GmbH, Austria
 Main features
 Various new marker types
 Performance optimizations for low-end devices
 Numerous configurable and extendable features
 No “out-of-the-box “ solution  targeted at experienced users
 Great frame rates on off-the-shelf mobile devices!
 Closed source – Qualcomm licensedTracking for Handheld AR
SLIDE 96
New marker types
 ID-based
 Simple or BCH binary encoding
 4096 markers, <= 4 errors
 DataMatrix
 ISO matrix code, ECC <= 60%
 Virtually unlimited marker set
 Frame
 Data encoded in marker border
 Arbitrary visual info in marker
 Split
 Data encoded in 2 disjoint marker parts
 Arbitrary visual info in marker
 Grid
 Track regular textured planar objects (e.g.
maps)
 Requires adding dots
 Template
 ARToolKit-like template matchingLimitations of fiducial markers
 Max. amount of embeddable information
 ~ few 10s of bits
 E.g. w/ error corr. ARTK+ 12 bits, ARTag 11 bits, Visual Codes 76 bits
 Not enough for storing URIs
 Limited error correction capabilities
 Real life: dirt, suboptimal illumination, tears, poor printing ...
 Insufficient for massively multi-object handheld AR scenarios
multi-user
Check out barcode technology!Tracking for Handheld AR
SLIDE 98
Barcode technology 1
 Linear barcodes
 E.g. UPC/EAN, Code 39, etc.
 Data: few bits (<30)
UPC
 Stacked barcodes
 1D codes stacked on top of each other
 E.g. PDF 417, CodaBlock
 Data: 100s of bytes
PDF 417
 Common disadvantages
 No pose detection support
 Resolution requirement too high for handhelds
(e.g. w/ non-orthogonal camera & read from the side)Barcode technology 2
 Matrix codes
 „True“ 2D barcodes
 E.g. QR Code, DataMatrix, MaxiCode, Aztec Code
 Data: 1000s of bytes + cascading possible
 Advanced error correction (e.g. up to 60% w/ RS code in
DataMatrix)
 More compact than stacked codes
 Basic barcode localization support BUT no pose estimation!
QR Code DataMatrix MaxiCode Aztec CodeBarcode size comparison
50 characters, Code 39
50 characters, PDF 417
Matrix code size & density
depends on amount of
embedded information &
error correction level
50 characters, DataMatrix ECC 200Barcode reading on handhelds –
QRCode
 Denso Wave Inc. (announced in 1994), automotive industry
 ISO & Japanese standard
 Supported by > 30 million cell phones in Japan
 Encode URLs, vCards, phone numbers, SMS, etc.Barcode reading on handhelds -
DataMatrix
 International Data Matrix (finalized in 1995)
 ISO standard
 More popular in USA & EuropeTracking for Handheld AR
SLIDE 103
DataMatrix vs. QR Code
 Similarities
 Square, dark & light modules = bits
 Size depends on data amount & error correction level
 Advanced error checking & correction w/ Reed-Solomon codes
 Public domain
 ISO standard
 Differences
 QR Code can encode ~2x more info
 DataMatrix uses 30-60% less space (Micro QR Code not tested)
 Higher error correction levels for DataMatrix
 Lower contrast ratio requirements for DataMatrix
Tested for single markers, based on http://semacode.org/about/technical/ whitepaper/best_2d_code.pdf+
-
+
-
Fiducial markers vs. barcodes
Fiducial markers Barcodes
pose tracking
+
information embedding
-
working volume
+
data density
-Taking the best from both sides
Simultaneous
barcode recognition
& pose trackingVision: The Internet of Augmented
Things
 Embedding 3D models into physical objects
 Downloading previously unknown content
1. 2.
3.Natural feature tracking
 Tracking from features of the surrounding
environment
 Corners, edges, blobs, ...
 Generally more difficult than marker tracking
 Markers are designed for their purpose
 The natural environment is not…
 Less well-established methods
 Every year new ideas are proposed
 Usually much slower than marker trackingTracking by detection
Camera Image
 This is what most trackers do…
 Targets are detected every frame
Keypoint detection
 Popular because
tracking and detection
Descriptor creation
and matching
are solved simultaneously
Outlier Removal
Pose estimation
and refinement
PoseNatural feature tracking – What is a
keypoint?
 It depends on the detector you use!
 For high performance use the FAST corner
detector
E. Rosten and T. Drummond (May 2006). "Machine learning for high-speed corner detection". Natural feature tracking – What is a
keypoint?
 It depends on the detector you use!
 For high performance we use the FAST corner
detector
 Apply FAST to all pixels of your image
 Obtain a set of keypoints for your image
- Reduce the amount of corners using non-maximum
suppression
 Describe the keypoints
E. Rosten and T. Drummond (May 2006). "Machine learning for high-speed corner detection". Natural feature tracking –
Descriptors
 Again depends on your choice of a descriptor!
 Can use SIFT
 Estimate the dominant keypoint
orientation using gradients
 Compensate for detected
orientation
 Describe the keypoints in terms
of the gradients surrounding it
Wagner D., Reitmayr G., Mulloni A., Drummond T., Schmalstieg D.,
Real-Time Detection and Tracking for Augmented Reality on Mobile Phones.
IEEE Transactions on Visualization and Computer Graphics, May/June, 2010 NFT – Database creation
 Offline step
 Searching for corners in a static image
 For robustness look at corners on multiple
scales
 Some corners are more descriptive at larger or smaller
scales
 We don’t know how far users will be from our image
 Build a database file with all descriptors and
their position on the original imageNatural feature tracking – Real-
time tracking
 Search for keypoints
Camera Image
in the video image
 Create the descriptors
Keypoint detection
 Match the descriptors from the
Descriptor creation
live video against those
and matching
in the database
Outlier Removal
 Brute force is not an option
 Need the speed-up of special
Pose estimation
data structures
and refinement
- E.g., we use multiple spill trees
PoseNFT – Outlier removal
 Cascade of removal techniques
 Start with cheapest, finish with most
expensive…
 First simple geometric tests
- E.g., line tests
• Select 2 points to form a line
• Check all other points being on correct side of line
 Then, homography-based testsNFT – Pose refinement
 Pose from homography makes good starting
point
 Based on Gauss-Newton iteration
 Try to minimize the reprojection error of the keypoints
 Part of tracking pipeline that mostly benefits
from floating point usage
 Can still be implemented effectively in fixed
point
 Typically 2-4 iterations are enough…NFT – Real-time tracking
 Search for keypoints
Camera Image
in the video image
 Create the descriptors
Keypoint detection
 Match the descriptors from the
Descriptor creation
live video against those
and matching
in the database
Outlier Removal
 Remove the keypoints that
are outliers
Pose estimation
and refinement
 Use the remaining keypoints
to calculate the pose
Pose
of the cameraNFT – Results
Wagner D., Reitmayr G., Mulloni A., Drummond T., Schmalstieg D.,
Real-Time Detection and Tracking for Augmented Reality on Mobile Phones.
IEEE Transactions on Visualization and Computer Graphics, May/June, 2010 Marker vs. natural feature
tracking
 Marker tracking
 Usually requires no database to be stored
 Markers can be an eye-catcher
 Tracking is less demanding
 The environment must be instrumented with markers
 Markers usually work only when fully in view
 Natural feature tracking
 A database of keypoints must be stored/downloaded
 Natural feature targets might catch the attention less
 Natural feature targets are potentially everywhere
 Natural feature targets work also if partially in viewNFT in unknown environments
 We just saw an example of how to track from
databases of keypoints created offline
 Idea: we can create the database on the fly on
our device!Panorama as map
 Pure camera rotation
 Cylindrical map
 We consider the surface of the
cylinder as a planar tracking target
 Filled in by consecutive framesPanorama as map
 Map as tracking target
 2D Interest points
 Tracked in input imageMapping/Tracking LoopVideo
Wagner, D., Mulloni, A., Langlotz, T., and Schmalstieg, D.,
Real-time panoramic mapping and tracking on mobile phones,
IEEE Virtual Reality Conference 2010 NFT in unknown environments
 We can also build a 3D
database of keypoints
Georg Klein and David Murray
Parallel Tracking and Mapping on a Camera Phone
In Proc. International Symposium on Mixed and
Augmented Reality (ISMAR'09)
http://www.robots.ox.ac.uk/~gk/Informed vs. uninformed
tracking
 Informed tracking
 Requires knowing the environment
 Requires storing a large database of information
 Users can point the phone at anything described in the
database
 Allows for adding semantic information to the database
- E.g., where is the ground plane?
 Uninformed tracking
 Works also for unknown environments
 Requires creating a database of keypoints on the fly
- Prone to drift
- Prone to corruption of the database
 User must move smoothly to build the database incrementally
 Requires “tricks” to understand where the ground isOther vision tracking
 Other type of natural features can be used, e.g.
blobs
Nate Hagbi, Oriel Bergig, Jihad El-Sana, Mark Billinghurst, Shape recognition and pose estimation for
mobile augmented reality, IEEE International Symposium on Mixed and Augmented Reality, 2009 Deployment issues: uncalibrated
Camera
 Modern mobile phones compensate
for most radial distortion
 We still need to know
 Focal length
 Principle point
 Two options
 Let the user calibrate the camera (not really an option)
 Create a calibration database for all supported devicesA look at the future
 Natural feature tracking is the future
 Model-based
- More robust
- Requires modeling the environment
 Without models (SLAM)
- Works anywhere without previous knowledge
- Requires online creation of a model
 Large-scale tracking
 Not even working really well on the PC…Sensor tracking
 Used by many “AR browsers”
 GPS, Compass, Accelerometer, (Gyroscope)
 Not sufficient aloneWhy do we need sensors?
 Combining sensors and vision
 Sensors
- Produce noisy output (= jittering augmentations)
- Are not sufficiently accurate (= wrongly placed augmentations)
- Gives us first information on where we are in the world,
and what we are looking at
 Vision
- Is more accurate (= stable and correct augmentations)
- Requires choosing the correct keypoint database to track from
- Requires registering our local coordinate frame (online-generated
model) to the global one (world)ResourcesPlatform – Recommended
reading
 Lots of low level information
on the complete ARM family
 Valuable tool for driver and
framework developers
 Not that important for pure
application developersPlatform – Recommended
reading
 Very low level and targeted for PCs
 Most information outdated on PC
 Effective memory usage one
of the most important
optimization strategies
on mobile devices!Tracking – Recommended
reading
 Lots of the basics on the
Computer Vision you will
need for AR tracking
 Several code and pseudo-code
snippetsTracking – Recommended
reading
 All about the geometry
you will need for
a tracking system
 Camera models
 Projection
 Epipolar geometry
 Homographies
 …Graphics – Recommended
reading
 Mobile 3D Graphics
(all about OpenGL ES 1.x)
 OpenGL ES 2.0
Programming Guide
 OpenGL ES 2.0 Man Pages
http://www.khronos.org/opengles/sdk/docs/man/
 ShaderX7
Chapter on “ Augmented Reality on Mobile
Phones”OpenGL ES Resources
 Khronos Group OpenGL ES Page
 http://www.khronos.org/opengles/
 OpenGL ES 2.0 Book
 http://www.opengles-book.com/
 AMDs OpenGL ES 2.0 Emulator