Computer Vision TrackingStarting simple: Marker tracking
Has been done for more than 10 years
Some phones today are faster than computers at that time
Several open source solutions exist
Fairly simple to implement
Standard computer vision methods
A rectangular marker provides 4 corner points
Enough for pose estimation!Marker Tracking Pipeline
Goal: Do all this in less than 20 milliseconds on a mobile phone…Marker Tracking – OverviewMarker Tracking – Fiducial
Detection
Threshold the whole image to black and white
Search scanline by scanline for edges (white to
black)
Follow edge until either
Back to starting pixel
Image border
Check for size
Reject fiducials early that are too small (or too large)Marker Tracking – Rectangle
Fitting
Start with an arbitrary point “x” on the contour
The point with maximum distance must be a corner c0
Create a diagonal through the center
Find points c1 & c2 with maximum distance left and right of
diag.
New diagonal from c1 to c2
Find point c3 right of diagonal with maximum distanceMarker Tracking – Pattern checking
Calculate homography using the 4 corner points
“Direct Linear Transform” algorithm
Maps normalized coordinates to marker coordinates
(simple perspective projection, no camera model)
Extract pattern by sampling
Check pattern
Id (implicit encoding)
Template (normalized cross correlation)Marker Tracking – Corner
refinement
Refine corner coordinates
Critical for high quality tracking
Remember: 4 points is the bare minimum!
So these 4 points should better be accurate…
Detect sub-pixel coordinates
E.g. Harris corner detector
- Specialized methods can be faster and more accurate
Strongly reduces jitter!
Undistort corner coordinates
Remove radial distortion from lensMarker tracking – Pose
estimation
Calculates marker position and rotation
relative to the camera
Initial estimation directly from homography
Very fast, but coarse
Jitters a lot…
Refinement via Gauss-Newton iteration
6 parameters (3 for position, 3 for rotation) to refine
At each iteration we optimize on the reprojection errorMarker tracking – Reprojection
error
o
oMarker tracking – Pose
estimation
Calculates marker position and rotation relative to the
camera
Initial estimation directly from homography
Very fast, but coarse
Jitters a lot…
Refinement via Gauss-Newton iteration
6 parameters (3 for position, 3 for rotation) to refine
At each iteration
- Calculate reprojection error ε0
- Calculate Jacobian matrix J (matrix of all first-order partial derivates)
- Solve the equation JT J d = -JT ε0 for d (e.g. using Cholesky factorization)
- Add d to pose
- Quit if accurate enough or if max. steps reachedTracking for Handheld AR
SLIDE 92
Tracking challenges in ARToolKit
Jittering
Occlusion Unfocused camera, Dark/unevenly lit
(image by M. Fiala)
(Photoshop illustration)
motion blur scene, vignetting
Image noise
False positives and inter-marker
(e.g. poor lens, block coding /
compression, neon tube)
confusionTracking for Handheld AR
SLIDE 93
Later improvements 1.
ARTag by Mark Fiala, NRC Canada
Edge-based border detection w/ image gradients
No thresholding (uneven illumination OK)
Immune to partial occlusions
Binary encoded marker patterns (11 bits)
Large marker set w/o speed penalty (2048-46 illegal = 2002
IDs)
CRC & FEC
Binary available for Win, Linux, Mac OSX
Issues
More complex (slower)
Prone to unsharp imagesTracking for Handheld AR
SLIDE 94
Later improvements 2.
Visual Codes by Michael Rohs, ETHZ
“Barcode-like” large marker set + error
correction
Embed meta-info for enhanced interaction
Symbian and WinCE implementation
StbTracker by Daniel Wagner, TU Graz
Successor of ARToolKitPlus
Improved performance
Several marker types
Win2k/XP, WinCE, Symbian
…Tracking for Handheld AR
SLIDE 95
Studierstube Tracker (StbTracker)
Daniel Wagner, Graz University of Technology and
Istvan Barakonyi, Imagination GmbH, Austria
Main features
Various new marker types
Performance optimizations for low-end devices
Numerous configurable and extendable features
No “out-of-the-box “ solution targeted at experienced users
Great frame rates on off-the-shelf mobile devices!
Closed source – Qualcomm licensedTracking for Handheld AR
SLIDE 96
New marker types
ID-based
Simple or BCH binary encoding
4096 markers, <= 4 errors
DataMatrix
ISO matrix code, ECC <= 60%
Virtually unlimited marker set
Frame
Data encoded in marker border
Arbitrary visual info in marker
Split
Data encoded in 2 disjoint marker parts
Arbitrary visual info in marker
Grid
Track regular textured planar objects (e.g.
maps)
Requires adding dots
Template
ARToolKit-like template matchingLimitations of fiducial markers
Max. amount of embeddable information
~ few 10s of bits
E.g. w/ error corr. ARTK+ 12 bits, ARTag 11 bits, Visual Codes 76 bits
Not enough for storing URIs
Limited error correction capabilities
Real life: dirt, suboptimal illumination, tears, poor printing ...
Insufficient for massively multi-object handheld AR scenarios
multi-user
Check out barcode technology!Tracking for Handheld AR
SLIDE 98
Barcode technology 1
Linear barcodes
E.g. UPC/EAN, Code 39, etc.
Data: few bits (<30)
UPC
Stacked barcodes
1D codes stacked on top of each other
E.g. PDF 417, CodaBlock
Data: 100s of bytes
PDF 417
Common disadvantages
No pose detection support
Resolution requirement too high for handhelds
(e.g. w/ non-orthogonal camera & read from the side)Barcode technology 2
Matrix codes
„True“ 2D barcodes
E.g. QR Code, DataMatrix, MaxiCode, Aztec Code
Data: 1000s of bytes + cascading possible
Advanced error correction (e.g. up to 60% w/ RS code in
DataMatrix)
More compact than stacked codes
Basic barcode localization support BUT no pose estimation!
QR Code DataMatrix MaxiCode Aztec CodeBarcode size comparison
50 characters, Code 39
50 characters, PDF 417
Matrix code size & density
depends on amount of
embedded information &
error correction level
50 characters, DataMatrix ECC 200Barcode reading on handhelds –
QRCode
Denso Wave Inc. (announced in 1994), automotive industry
ISO & Japanese standard
Supported by > 30 million cell phones in Japan
Encode URLs, vCards, phone numbers, SMS, etc.Barcode reading on handhelds -
DataMatrix
International Data Matrix (finalized in 1995)
ISO standard
More popular in USA & EuropeTracking for Handheld AR
SLIDE 103
DataMatrix vs. QR Code
Similarities
Square, dark & light modules = bits
Size depends on data amount & error correction level
Advanced error checking & correction w/ Reed-Solomon codes
Public domain
ISO standard
Differences
QR Code can encode ~2x more info
DataMatrix uses 30-60% less space (Micro QR Code not tested)
Higher error correction levels for DataMatrix
Lower contrast ratio requirements for DataMatrix
Tested for single markers, based on http://semacode.org/about/technical/ whitepaper/best_2d_code.pdf+
-
+
-
Fiducial markers vs. barcodes
Fiducial markers Barcodes
pose tracking
+
information embedding
-
working volume
+
data density
-Taking the best from both sides
Simultaneous
barcode recognition
& pose trackingVision: The Internet of Augmented
Things
Embedding 3D models into physical objects
Downloading previously unknown content
1. 2.
3.Natural feature tracking
Tracking from features of the surrounding
environment
Corners, edges, blobs, ...
Generally more difficult than marker tracking
Markers are designed for their purpose
The natural environment is not…
Less well-established methods
Every year new ideas are proposed
Usually much slower than marker trackingTracking by detection
Camera Image
This is what most trackers do…
Targets are detected every frame
Keypoint detection
Popular because
tracking and detection
Descriptor creation
and matching
are solved simultaneously
Outlier Removal
Pose estimation
and refinement
PoseNatural feature tracking – What is a
keypoint?
It depends on the detector you use!
For high performance use the FAST corner
detector
E. Rosten and T. Drummond (May 2006). "Machine learning for high-speed corner detection". Natural feature tracking – What is a
keypoint?
It depends on the detector you use!
For high performance we use the FAST corner
detector
Apply FAST to all pixels of your image
Obtain a set of keypoints for your image
- Reduce the amount of corners using non-maximum
suppression
Describe the keypoints
E. Rosten and T. Drummond (May 2006). "Machine learning for high-speed corner detection". Natural feature tracking –
Descriptors
Again depends on your choice of a descriptor!
Can use SIFT
Estimate the dominant keypoint
orientation using gradients
Compensate for detected
orientation
Describe the keypoints in terms
of the gradients surrounding it
Wagner D., Reitmayr G., Mulloni A., Drummond T., Schmalstieg D.,
Real-Time Detection and Tracking for Augmented Reality on Mobile Phones.
IEEE Transactions on Visualization and Computer Graphics, May/June, 2010 NFT – Database creation
Offline step
Searching for corners in a static image
For robustness look at corners on multiple
scales
Some corners are more descriptive at larger or smaller
scales
We don’t know how far users will be from our image
Build a database file with all descriptors and
their position on the original imageNatural feature tracking – Real-
time tracking
Search for keypoints
Camera Image
in the video image
Create the descriptors
Keypoint detection
Match the descriptors from the
Descriptor creation
live video against those
and matching
in the database
Outlier Removal
Brute force is not an option
Need the speed-up of special
Pose estimation
data structures
and refinement
- E.g., we use multiple spill trees
PoseNFT – Outlier removal
Cascade of removal techniques
Start with cheapest, finish with most
expensive…
First simple geometric tests
- E.g., line tests
• Select 2 points to form a line
• Check all other points being on correct side of line
Then, homography-based testsNFT – Pose refinement
Pose from homography makes good starting
point
Based on Gauss-Newton iteration
Try to minimize the reprojection error of the keypoints
Part of tracking pipeline that mostly benefits
from floating point usage
Can still be implemented effectively in fixed
point
Typically 2-4 iterations are enough…NFT – Real-time tracking
Search for keypoints
Camera Image
in the video image
Create the descriptors
Keypoint detection
Match the descriptors from the
Descriptor creation
live video against those
and matching
in the database
Outlier Removal
Remove the keypoints that
are outliers
Pose estimation
and refinement
Use the remaining keypoints
to calculate the pose
Pose
of the cameraNFT – Results
Wagner D., Reitmayr G., Mulloni A., Drummond T., Schmalstieg D.,
Real-Time Detection and Tracking for Augmented Reality on Mobile Phones.
IEEE Transactions on Visualization and Computer Graphics, May/June, 2010 Marker vs. natural feature
tracking
Marker tracking
Usually requires no database to be stored
Markers can be an eye-catcher
Tracking is less demanding
The environment must be instrumented with markers
Markers usually work only when fully in view
Natural feature tracking
A database of keypoints must be stored/downloaded
Natural feature targets might catch the attention less
Natural feature targets are potentially everywhere
Natural feature targets work also if partially in viewNFT in unknown environments
We just saw an example of how to track from
databases of keypoints created offline
Idea: we can create the database on the fly on
our device!Panorama as map
Pure camera rotation
Cylindrical map
We consider the surface of the
cylinder as a planar tracking target
Filled in by consecutive framesPanorama as map
Map as tracking target
2D Interest points
Tracked in input imageMapping/Tracking LoopVideo
Wagner, D., Mulloni, A., Langlotz, T., and Schmalstieg, D.,
Real-time panoramic mapping and tracking on mobile phones,
IEEE Virtual Reality Conference 2010 NFT in unknown environments
We can also build a 3D
database of keypoints
Georg Klein and David Murray
Parallel Tracking and Mapping on a Camera Phone
In Proc. International Symposium on Mixed and
Augmented Reality (ISMAR'09)
http://www.robots.ox.ac.uk/~gk/Informed vs. uninformed
tracking
Informed tracking
Requires knowing the environment
Requires storing a large database of information
Users can point the phone at anything described in the
database
Allows for adding semantic information to the database
- E.g., where is the ground plane?
Uninformed tracking
Works also for unknown environments
Requires creating a database of keypoints on the fly
- Prone to drift
- Prone to corruption of the database
User must move smoothly to build the database incrementally
Requires “tricks” to understand where the ground isOther vision tracking
Other type of natural features can be used, e.g.
blobs
Nate Hagbi, Oriel Bergig, Jihad El-Sana, Mark Billinghurst, Shape recognition and pose estimation for
mobile augmented reality, IEEE International Symposium on Mixed and Augmented Reality, 2009 Deployment issues: uncalibrated
Camera
Modern mobile phones compensate
for most radial distortion
We still need to know
Focal length
Principle point
Two options
Let the user calibrate the camera (not really an option)
Create a calibration database for all supported devicesA look at the future
Natural feature tracking is the future
Model-based
- More robust
- Requires modeling the environment
Without models (SLAM)
- Works anywhere without previous knowledge
- Requires online creation of a model
Large-scale tracking
Not even working really well on the PC…Sensor tracking
Used by many “AR browsers”
GPS, Compass, Accelerometer, (Gyroscope)
Not sufficient aloneWhy do we need sensors?
Combining sensors and vision
Sensors
- Produce noisy output (= jittering augmentations)
- Are not sufficiently accurate (= wrongly placed augmentations)
- Gives us first information on where we are in the world,
and what we are looking at
Vision
- Is more accurate (= stable and correct augmentations)
- Requires choosing the correct keypoint database to track from
- Requires registering our local coordinate frame (online-generated
model) to the global one (world)ResourcesPlatform – Recommended
reading
Lots of low level information
on the complete ARM family
Valuable tool for driver and
framework developers
Not that important for pure
application developersPlatform – Recommended
reading
Very low level and targeted for PCs
Most information outdated on PC
Effective memory usage one
of the most important
optimization strategies
on mobile devices!Tracking – Recommended
reading
Lots of the basics on the
Computer Vision you will
need for AR tracking
Several code and pseudo-code
snippetsTracking – Recommended
reading
All about the geometry
you will need for
a tracking system
Camera models
Projection
Epipolar geometry
Homographies
…Graphics – Recommended
reading
Mobile 3D Graphics
(all about OpenGL ES 1.x)
OpenGL ES 2.0
Programming Guide
OpenGL ES 2.0 Man Pages
http://www.khronos.org/opengles/sdk/docs/man/
ShaderX7
Chapter on “ Augmented Reality on Mobile
Phones”OpenGL ES Resources
Khronos Group OpenGL ES Page
http://www.khronos.org/opengles/
OpenGL ES 2.0 Book
http://www.opengles-book.com/
AMDs OpenGL ES 2.0 Emulator
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Comments 0
Log in to post a comment