A Modern AR and Web Architecture - Perey Research & Consulting

crickettachyphagiaΚινητά – Ασύρματες Τεχνολογίες

10 Δεκ 2013 (πριν από 3 χρόνια και 7 μήνες)

94 εμφανίσεις

International  AR  Standards  Meeting  June  15
-­‐
16  2011,  Page  
1
 
of  
7
 
A  Modern  WEB  based  mobile  AR  architecture  for  research  and  
development  based  on  standards.
 
 
Position  paper  for  
the  Third  International  
AR
-­‐
Standards  
Meeting
 
June,  
15.
-­‐
16.  2011,  TAICHUNG
 
 
Timo  Engelke
1
,  Jens  Keil
 
Fraunhofer  Institute  for  Computer  Graphics  
Research  (IGD)
,  Germany
 
Abstract
 
A  multitude  of  AR  capable  mobile  devices  have  evolved  during  the  last  
years.  While  AR  delivers  a  
huge  potential  of  providing  contextual  information,  available  
frameworks  
and  standards  are  still  
at  the  beginning  of  developme
nt.
 
 
Besides  POI  annotations  for  location
-­‐
based  services
 
(like  used  in  
[7]
[8]
)
,  AR  emphasizes  its  full  
potential  in  new  interaction  paradigms  and  providing  information  for  the  right  situation  
in
 
the  
right  
way  and  
place
 
and  context
.  Ideally  mobile  AR  systems  can  not  onl
y  cope  with  informational  
retrieval  but  also  create  useful  information  helping  implicit
ly  in  content  recognition  
or  explicitly  
in  data  presentation
 
for  other  AR  users.
 
 
In  this  work  we  present  a  mobile  AR  framework  based  on  currently  used  standards  for  crea
tion  
and  experimenting  with  new  parad
igms  of  AR.  We  try  to  overcome  
limits  in  device  independent  
development  and  
common  problems  
of  distribution
/generation
 
of  content  
and  abstractions  of  
interfaces.  We  use  modern  web  based  approaches  for  interfacing  AR  fun
ctionality  in  ordinary  
web  based  application  development.
 
Introduction
 
We  present  a  framework  able  to  experiment  with  different  cases  of  AR  and  its  application.  
Areas  
touched  with  
our  implementation  
are  cultural  heritage,  industrial  maintenance  and  print  
media.
 
We  will  show  some  technical  aspects,  describe  our  implementation  for  mobile  devices  and  how  
we  deal  with  standardized  formats  like  X3D,  HTML,  JavaScript  and  visual  tracking  as  well  as  
concepts  for  distribution  using  couchDB
[3]
 
as  a  modern  approach  for  web  based  serving  of  
information  and  automatic  application  generation.
 
 
We  present  a  
generic  common  
way  of  describing  
AR  content
 
and  paradigms  available  in  AR.  
At  
the  end  we  show  some  applications  realized  using  our  framework,  which  reach  from  
experimental  to  commercial  use.
 
The  mobileAR  Browser  
Concept
 
The  most  common  way  of  creating  interaction  and  applications  
on  mobile  devices  
can  be  
archived  using  HTML5  and  
JavaScript  (JS).  Every  platform  itself  delivers  derived  Web
-­‐
Kit  or  
similar  implementations  for  its  own.  By  using  an  according  message  transport  mechanism,  every  
implementation  can  be  extended  to  work  with  engines  at  the  back  end  of  the  application,  which  
p
otentially  allows  the  creation  of  rich  AR  applications  in  conjunction  with  the  well
-­‐
known  
standards  HTML5  and  JS.  Approaches  like  PhoneGap
[1]
 
already  have  done  wor
k  in  extending  
functionality  to  multiple  devices  and  operating  systems.  While  PhoneGap  itself  delivers  a  small  
subset  of  rather  simple  functions,  we  have  implemented  a  similar  approach  being  able  to  
interface  multithreaded  engines  running  in  the  background
.  Our  generic  application  approach  
layers  a  transparent  Web
-­‐
Kit  implementation  over  a  X3D  render  engine,  which  can  directly  
intercommunicate  with  our  
computer  vision  
engine
 
(
InstantVision
)
[2]
.  The  engines  itself  can  be  
fed  using  declarative  descriptions  in  form  of  XML,  JSON  or  URLs  (see  also  Figure  
below
).
 
                                                                               
                                                     
 
1
 
timo.engelke@igd.fraunhofer.de
 
 
Figure  
1
 
Our  generic  AR  architecture  for  mobile  devices  coordinated  by  a  WEB
-­‐
Component.  A  JS  
interface  allows  steering  and  connecting  render
-­‐
,
 
vision
-­‐
processor  and  other  res
ources.
 
By  delivering  abstractions  in  form  of  JS  interfaces,  the
 
developer  is  able  to  load,  modify,  replace,  
and  remove  parts  of  the  descriptions  and  thus  dynamically  generate  applications.  Furthermore  
callbacks  can  be  installed  in  an  AJAX  service  manner,  allowing  for  gaining  real  time  information  
about  the  current  tra
cking  states  and  process  o
f
 
interconnecting  data  in  between  the  JS
-­‐
Engine  
and  the  other  processors  in  the  backend.
 
 
The  lightweight  X3D  processor  allows  for  rendering  a  simple  subset  of  standardized  nodes.  
Additionally  a  PolygonBackground
 
node  has  been  added  in  order  to  be  able  to  display  the  camera  
image  in  the  background  of  the  scene  and  thus  enable  AR.
 
 
The  vision  engine  
(instantVision)  allows
 
Algorithms  
to  be  
described  inside  declarative  
descriptions  in  form  of  conditional  fixed  functio
n  pipes.  The  used  tracking  data  itself  can  be  
stored  inside  an  internal  hierarchical  database  for  optimized  access.  While  data  of  database  
objects  can  be  transferred  via  network  it  is  possible  to  transfer  parts  or  full  tracking  data  via  
network  to  and  from
 
a  server  infrastructure.
 
 
Application  and  Content  Delivery
 
Application  delivery  can  be  accomplished  in  our  framework  through  simple  downloading  of  files  
using  the  http  protocol.  Zip  packages  allow  a  previously  saved  server  based  environment  to  be  
mirrored
 
to  a  device  and  thus  content  to  be  used  offline.
 
Usually  i
n  a  first  step  the  general  application  is  downloaded  into  the  device,  it  can  itself  be  
generated  through  the  template  system  of  the  
c
ouchDB  in  form  of  a  view.  
A  couchDB  
instance  as  
part  of  the  mobi
le  device  
is  also  thinkable  for  maintaining  synchronizing  and  replicating  content
 
for  seamless  on
-­‐
 
and  offline  use.
 
 
In  our  
experiments  also  
simple  XMLHttpRequests  can  be  used  inside  the  HTML  in  order  to  
download  information  about  customized  tracking  enviro
nments  from  a  
c
ouchDB
 
server.  A  multi
-­‐
step  initialization  and  tracking  approach  allows  delivering  the  right  information  for  the  vision  
engine.  Such  an  approach  can  be  realized  by  subsequently  replacing  the  tracking  and  context  
pipes  from  coarse  to  fine  dur
ing  runtime.  
 
 
Tracking  Process
 
Depending  on  the  aim  of  the  application  the  user  can  either  choose  his  coarse  context  supported  
by  first  queried  information  using  a  combination  of  GPS  and  magnetic  directional  sensor  data.  In  
this  phase  already  a  bunch  of  v
isual  abstract
ions  (i.e.  BAG  of  visual  words  
[4]
[5]
)  can  be  sent  to  
the  server  allowing  it  to  choose  a  properly  associated  tracking  model  for  further  initialization.  
Upon  this  query  the  server  will  return  a  bunch  of  general  trackin
g  descriptio
ns  in  form  of  HIPs  
[6]
 
for  2D  initialization,  line  models
[11]
 
or  alread
y  prepared  feature  maps  along  with  their  
tracking  descriptions.  If  the  tracking  algorithms  on  the  device  are  not  able  to  cope  with  the  
International  AR  Standards  Meeting  June  15
-­‐
16  2011,  Page  
3
 
of  
7
 
tracking  data  for  a  certain  time,  the  camera  image  might  be  sent  to  the  server  allowing  for  its  
initialization  to  take  pl
ace  on  a  server  as  described  in  the  
dARsein
 
application  below.
 
 
 
The  tracking  availability  and  its  accuracy  permanently  are  observed  by  the  system  and  fallback  
solutions  can  be  applied  on  tracking  loss.  Our  implementation  also  takes  advantage  of  the  built
-­‐
i
n  
sensors  (Accelerometer,  Gyro,  Compass)  of  the  device  as  a  fallback  solution.  A  reference  pose  is  
permanently  generated  in  the  background.  On  tracking  loss  a  replacement  pose  is  generated  
based  on  a  differential  rotation  matrix  acquired  by  the  motion  syst
em.  The  reprojected  pose  can  
be  used  for  re
-­‐
initialization  using  the  supplied  tracking  models.  In  such  a  way  a  coarse  
augmentation  can  still  be  realized,  even  if  a  feature  less  area  of  the  room  is  visible  to  the  device.  
The  device  specific  calibration  data
 
(intrinsics,  physical  sensor  positions  etc.)  can  also  
queried  
from  a  device  document  of  a  running  
c
ouchDB  instance.  If  very  little  information  about  the  
context  is  available,  a  direct  dynamic  
extensible  
reconstruction  of  features  in  the  room  can  be  
applie
d  as  well
 
(i.e.  as  described  in  
[12]
)
.
 
 
We  denote  Tracked  Objects  as  static  units  with  their  own  world  coordinate  system.  It  can  be  a  
room,  an  object  or  even  a  part  
of  an  object.  Every  tracked  object  has  its  own  camera  and  tracking  
models,  which  can  be  applied  for  initialization  and  tracking.
 
 
When  tracking  is  stable,  interaction  data  can  be  downloaded  (using  a  XMLHttpRequest)  for  the  
tracked  objects  context,  allowing
 
for  dynamic  generation  of  in  situ  annotations.  The  received  
JSON  description  can  directly  be  used  for  setting  up  the  interaction  environment.  
 
Structures  for  describing  the  application
 
 
Depending  on  the  complexity  of  the  application  to  be  shown  we  have  created  some  templates  
that  can  be  generated  by  a  couchDB  request.  It  can  either  be  a  complete  HTML  and  JS  page  
allowing  for  specialized  access  to  the  AR  context  or  a  JSON  based  description
.  
 
Figure  1  shows  a  schematic  graph  indicating  the  aspects  to  take  care  of,  when  augmenting  a  
scene  and  describing  it  abstractly.  It  does  not  cover  dynamic  objects  and  changing  real  worlds,  
which  itself  will  change  the  context  of  the  situation  and  would  re
quire  an  other  informational  
structure  for  representation.    
 
 
We  dist
inguish  content  to  be  augmented  
and  augmentation  content.  The  content  to  be  
augmented  is  mostly  hidden  to  the  user  of  an  application  and  provides  information  to  the  
processing  algorithm,  w
hich  is  able  to  calculate  the  camera  pose.
 
 
Every  tracked  object  has  its  own  camera  and  tracking  models,  which  can  be  applied  for  
initialization  and  tracking
.
 
 
The  augmentation  content  itself  usually  is  bound  to  virtual  content  and  can  be  of  arbitrary  nature
 
but  mostly  2D  and/or  3D  visuals.  The  contextual  bounding  between  the  real  and  virtual  world  
can  be  described  in  spatial,  interaction,  and  presentational  aspects.  Besides  Points  of  Interests    
(POI)  our  approach  also  covers  
a
reas  and  
v
olumes  of  Interest  (AO
I,VOI).  Since  we  in  real  objects  
surfaces  or  complete  volumes  of  objects  can  be  of  interest.  While  a  POI  can  be  represented  
through  a  single  point,  AOI  and  VOI  might  be  represented  through  abstracted  contours
/volumes
 
or  even  3D  models.  Presentation  of  cont
ent  in  AR  is  usually  done  using  annotations.  Anyway  
visual  effects  can  emerge  the  context  awareness  of  the  user.  Reality  Filters  for  instance  allow  the  
abstraction  of  the  real  world  shown  camera  image  and  presentation  of  valuable  content  to  the  
user.  The  o
bject  of  interest  can  be  exposed  and  the  background  can  be  diminished  or  replaced  by  
a  virtual  scene.  Any  combination  of  background  (BG)  and  foreground  (FG)  modification,  
augmentation  or  filtering  is  thinkable  that  suits  the  user  experience
 
(see  
Figure  
3
 
or  
Figure  
5
 
middle)
.
 
 
We  categorize  annotations  in  2D  and  3D,  while  2D  augment
ations  can  be  applied  in  3D  
perspective  as  well.    These  can  reach  from  3D  models  and  animations  to  2D  video,  HTML  etc.  Our  
framework  features  these  to  be  used  in  either  HTML  or  X3D  using  special  texture  and  model  
nodes.
 
 
Augmented
Content
(V
irtual W
orld)
Content to be
Augmented
(Real W
orld)
2D
3D
HTML
V
ideo
Images
Models
Perspective
2D
Spatial
POI
AOI
VOI
T
racked
Objects
LOD
Filter
Presentational
Drawings
Context Bounding
T
racking
Model
2D
3D
Interaction
Pointing
BG/
FG
 
Figure  
2
 
An  abstracted  structure  describing  annotation  elements  in  virtual  and  real  world  as  some  
aspects  of  bounding  and  presentation.
 
 
Besides  usual  annotations  the  interaction  design  can  be  a  crucial  aspect  in  development.  The  
developer  has  to  be  able  to  cope  w
ith  virtual  and  real  world  information  and  its  proper  
presentation  on  a  mobile  device.  Since  the  screen  size  and  resolution  is  a  limiting  factor  for  
presentation,  a  distance  or  position  dependent  presentation  of  information  is  helpful
 
(LOD)
.  
Another  paradi
gm,  which  we  have  implemented,  is  pointing  using  the  device  for  information  
selection.  The  user  therefore  uses  a  crosshair  on  the  screen  for  pointing  at  POIs/AOIs/VOIs  in  
order  to  get  more  information.
 
 
 
Alternatively  the  user  can  press  the  pause  key  and  
stop  the  life  augmentation.  He  can  move  the  
virtual  focus  with  his  finger  in  order  to  gain  information  about  the  regions  in  the  still  scene.
 
Applications
 
In  the  following  section  we  will  describe  some  applications  that  we  have  
been  
realized  using  the  
fram
ework.  They  usually  use  subs
ets  of  the  described  structures  but  can  all  be  loaded  using  the  
framework  from  a  web  server  or  locally  saved  to  the  device.
 
 
 
International  AR  Standards  Meeting  June  15
-­‐
16  2011,  Page  
5
 
of  
7
 
Berlin  Sightseeing
 
Figure
 
below
 
shows  the  application  able  to  track  one  of  two  distinct  textured  targets.  The  
tracking  definition  has  been  generated  using  the  desktop  GUI  of  the  used  vision  library.  It  allows  
the  creation  of  HI
P
 
models  that  can  be  used  for  initiali
zation  of  targets.  A  KLT
[13]
 
tracker  allows  
further  tracking  of  these  targets  from  frame  to  frame.  The  database  size  for  initialization  has  
about  50kb.  The  poster  i
nitializes  at  about  6
-­‐
10  FPS
 
on  an  iPhone3GS  and  30  frames  on  an  iPad2
,  
while  tracking  occurs  stable  with  15
-­‐
20  
(iPhone)  and  30  
frames
 
(iPad2)
,  depending  on  light  
conditions  and  size.  By  interacting  with  
swipe  gestures
 
the  application  allows  switching  betw
een  
different  overlays,  which  are  defined  in  an  X3D  file.  Additional  text  augmentations  are  realized  
trough  HTML  div  elements.  The  transform  nodes  are  set  dynamically  and  in  real  time  from  within  
JS,  allowing  transition  animations.
 
Through  the  use  of  convo
lving  filters  in  the  vision  engine  the  
background  image  of  the  camera  can  be  abstracted  and  a  reality  filter  can  be  applied.  
 
 
 
Figure  
3
 
Reality  Filtering  using  a  sobel  convolution  filter  and  augmentation  of  a  2D  surface.
 
Building  Information  Management
 
The  very  same  informational  
structure  as  used  in  the  Berlin  Sightseeing  App  
has  been  used  in  the  
implementation  of  a  Building  Information  Visualization  system.  It’s  intent  is  the  visualization  of  
global  and  specific  context  
information  of  machines  coming  from  a  couchDB  database.  
Figure  
4
 
shows  an  application
,
 
which  allows  presenting  3  layers  of  information.  2  Layers  delivers  static  
infor
mation  about  the  scene,  while  a  third  layer  implements  a  pointing  paradigm.  Machines  in  a  
rather  industrial  environment  can  be  augmented  with  their  specified  
contours
/areas  of  interest.  
Colors  indicate  the  general  state  of  the  machines.  By  pointing  the  dev
ices  cross  hair  information  
about  the  proper  device  can  be  
revealed
 
in  place.  A  pause  key  allows  the  stopping  of  
the  live  
image  
and  
allowing  for  exploring  
the  information  by  moving  the  crosshai
r.  
 
!
 
Figure  
4
 
AR  visualization
 
of  BIM  data.  Left:  Flow  overview.  Center:  Heat  chart.  Right:  Maintenance  
status
 
 
 
dARsein
 
Figure  
5
 
depict  a  smartphone  applicatio
n  for  augmented  reality  tourism,  which  is  also  available  
in  the  Apple  AppStore.
 
The  digital  guide  extends  a  real  building  outdoors  with  additional  
information.  T
he  building  with  its  art  nouveau  elements  was  destroyed  in  the  second  World  War.  
Unfortunately,  it  was  poorly  reconstructed  without  being  very  close  to  its  original  version.  The  
application  aims  to  revive  the  building's  appearance  by  superimposing  historic
al  blueprints  the  
building's  front.  Special  points
-­‐
of
-­‐
interests  are  marked  and  link  to  additional  content.
 
The  application  has  been  implemented  using  the  jQTouch
[13]
 
and  jQuery  Frameworks  in  HTML.  
These  support  creating  websites  with  the  look  and  feel  of  natively  written  applications.
 
In  this  case  a  sever
-­‐
based  tracking  method,  like  described  in  
[9]
 
and
[10]
,  is  used.  Users  can  take  a  
picture  of  the  building,  with  the  mobile  application  which  is  sent  to  a  server.  The  server  side  
vision  l
ibrary  initializes  and  detects  the  camera  pose  for  every  taken  picture.  If  the  process  was  
successful,  the  client  receives  the  superimposed  image  from  the  server.  
 
International  AR  Standards  Meeting  June  15
-­‐
16  2011,  Page  
7
 
of  
7
 
 
Figure  
5
 
dARsein  application  allows  real  3D  server  based  
augmentation  of  the  olbich  house.
 
Conclusion  and  further  work
 
We  have  presented  a  mobile  AR  framework  allowing  for  specifying  a  huge  potential  of  AR  
applications  using  operating  system  and  device  independent  descriptions  using  HTML,  JS,  X3D.  
We  have  shown  
a  general  structure  allowing  for  description  of  AR  apps  with  new  paradigms  
allowing  to  extend  the  general  idea  of  POI  augmentation,  as  they  are  known  in  current  AR  
applications.  Through  the  use  of  the  framework  these  paradigms  are  extensible  and  can  be  
exp
lored  and  evaluated  in  further  experim
ental  applications
.  We  state  that  this  can  be  a  way  for  
gaining  
experience  of  other  interaction  and  content  presentation  paradigms  for  future  AR  
applications  allowing  to  be  used  as  impulses  and  basics  for  establishment
 
in  upcoming  standards  
for  AR
 
(like  i.e.  KML)
.
 
Future  work  
of  research  will  focus  on  more  robust  tracking  and  dynamic  and  parametric  scenes  
to  be  described  
and  joined
.  On  the  other  hand  it  will  be  important  
finding  
convenient  tool  chains  
for  efficient  appl
ication  generation  based  on  mentioned  structures.
 
 
References
 
[1]

PhoneGap.
http://www.phonegap.com/

[Last visited May 2011].

[2]

M. Becker, G. Bleser, A. Pagani, D. Stricker, and H. Wuest.
An architecture for prototyping a
nd application development
of visual tracking systems. In Capture, Transmission and Display of 3D Video (Proceedings of 3DTV
-
CON 07 [CD
-
ROM]), 2007.

[3]

CouchDB,
http://couchdb.apache.org/

[Last visited May 2011].

[4]

Irschara, A.; Zach, C.; Frahm, J.
-
M. & Bischof, H.
,
From structure
-
from
-
motion point clouds to fast location recognition.
IEEE, 2009, 2599
-
2606

[5]

Zilong Dong, Guofeng Zhang, Jiaya Jia, and Hujun Bao.
,
Keyframe
-
Based Real
-
Time Camera Tracking
,
IEEE
Internatio
nal Conference on Computer Vision (ICCV), 2009.

[6]

Taylor, S. & Drummond, T. Binary Histogrammed Intensity Patches for Efficient and Robust Matching International
Journal of Computer Vision, Springer, 2011, 1
-
25

[7]

Layar Augmented Reality Browser,
http://www.layar.com/

[Last visited June 2011]

[8]

Wikitude,
http://www.wikitude.com/

[Last visited June 2011]

[9]

M. Zöllner, M. Becker, and J. Keil.
Snapshot augmented reality


augmented
photography. In International Symposium on
Virtual Reality, Archaeology and Cultural Heritage (VAST), pages 53

56, 2010.


[10]

Riess
, P. and Stricker, D.
,
AR on
-
demand: A practicable solution for augmented reality on low
-
end handheld devices
, In
Virtuelle und
Erweiterte Realität, (2006)

[11]

H. Wuest, F. Wientapper, and D. Stricker. Adaptable model
-
based tracking using analysis
-
by
-
synthesis techniques. In W.
Kropatsch, M. Kampel, and A. Hanbury, editors, Computer Analysis of Images and Patterns, volume 4673 of Lectu
re
Notes in Computer Science, pages 20

27.
Springer Berlin / Heidelberg, 2007.

[12]

G. Klein and D. Murray. Parallel tracking and mapping on a camera phone. In Proc. Eigth IEEE and ACM International
Symposium on Mixed and Augmented Reality (ISMAR’09), Orlando,
October 2009.

[13]

Shi, Jianbo and Tomasi, C.
Good features to track
. In Proc.
Computer Vision and Pattern Recognition, 1994. Proceedings
CVPR '94.

[14]

jQTouch,
http://jqtouch.com/

[Last visited June 2011]