Progress and Challenges for Real-Time Virtualization

smilinggnawboneInternet and Web Development

Dec 4, 2013 (3 years and 8 months ago)

87 views

Progress and Challenges for
Real-Time Virtualization
*
Chris Gill
Professor of Computer Science and Engineering
Washington University, St. Louis, MO, USA
cdgill@cse.wustl.edu

VtRES
Workshop Keynote at RTCSA 2013
National Taiwan University, Taipei, Taiwan, Wed Aug 21, 2013
*Our research described in this talk is supported in part by the US NSF and
the US ONR, and has been driven by numerous contributions from Sisu Xi,
Justin Wilson, Chong Li, and Chenyang Lu (Washington University in St. Louis)
and from
Jaewoo
Lee,
Sanjian
Chen,
Linh

Phan
,
Insup
Lee, and Oleg
Sokolsky

(University of Pennsylvania)

Two  Key  Uses  of  Virtualiza3on  


Use  fewer  compu3ng  resources  and/or  pla;orms  to  (consolidate  or)  
integrate
 
systems
 via  virtualiza3on  


Provide  elas3c  
cloud  services
 on-­‐demand  and  at  scale  to  mul3ple  
tenants  via  virtualiza3on  
2
Challenges  for  Real-­‐Time  Virtualiza3on  


Real-­‐Time  System  Integra3on  


How  to  
schedule
 resources  feasibly  among  compe3ng  domains  


How  maintain  3ming  guarantees  as  different  components  and  
systems  are  
composed  


How  to  preserve  guarantees  across  mul3ple  
shared  resources  


Real-­‐Time  Cloud  Services  


How  to  analyze  3ming  and  provide  guarantees  in  the  face  of  
resource  
elas4city
 and  
mul4-­‐tenancy  
3
RT  Virtualiza3on  for  System  Integra3on



Some  key  challenges  for  real-­‐3me  (especially  
s
afety-­‐cri4cal
)  systems
 


Temporal  isola3on  as  
dedicated  cores  become  shared  ones  


Preserving  isola3on  as  components  and  systems  are  
composed  


Maintaining  end-­‐to-­‐end  3ming  guarantees  as  
networked  
communica2on  becomes  inter-­‐domain  communica2on  
spanning  
both  computa3on  and  communica3on  resources  
4
Hypervisor
Legacy System
Virtualization Platform
Domains
Legacy System
A  Brief  Survey  of  Other  Related  Work    
(please  see  our  publica3ons  for  references)  


Improving  VMM  Scheduling  (Credit,  SEDF)  and  Domain  0  in  Xen    


OUen  helps  with  isola3on,  predictability,  etc.  but  without  real-­‐3me  guarantees  


Improving  Inter-­‐domain  communica3on  in  Xen    


E.g.,  XWAY,  
XenLoop
,  
Xensocket
:  involve  modifying  guest  OS  or  applica3ons  


Approaches  targe3ng  other  virtualiza3on  architectures  


CucinoZa
 et  al.  [COMPSAC  2009]  applied  hierarchical  real-­‐3me  scheduling  to  
KVM,  e.g.,  towards  suppor3ng  Real-­‐Time  Service  Oriented  Architectures  


Fiasco  and  L4  (TU  Dresden)  offer  precise  virtualiza3on  capabili3es  for  systems  
ranging  from  small  embedded  systems  to  large  complex  systems      
5
Tradi3onal  Virtualiza3on  in  Xen    


Good  for  system  integra3on,  cost  reduc3on,  etc.  
6
App  
App  
OS

App  
App  
OS

Hardware

Hardware

Problem:
Some RT Applications CANNOT benefit from this kind of Virtualization
Time


R
eal-time
aware
X

Not
real-time aware

Hardware
Domains are scheduled round-robin
with
NO
prioritization of OS instances
Xen
Hypervisor
RT  Virtualiza3on  I:    
Real-­‐Time  Scheduling  of  Domains  in  RT-­‐Xen  
7
Xen
Scheduler
OS
Sched

OS
Sched

App  
App  
App  
App  
Basic Solution: Incorporate
Hierarchical Scheduling into Xen
Root Scheduler
Leaf
Sched

Leaf
Sched

App  
App  
App  
App  
Leaves are implemented as
Servers
(Period, Budget, Priority)


Budget in
S1
Actual
Execution
Periodic  
Server  
5

10

15


2
Budget in
S1
Actual
Execution
Deferrable  
Server  
5

10

15


2
Basic  Server  Design  (Deferrable  &  Periodic)  


Servers  have  3  parameters  (
Period,  Budget,  Priority
)  
8
S1 (5, 3, 1) with
Two Tasks
T1 (
10
,

)
T2 (
10
,

)
Time
5

10

15

back-to-back
IDLE
2
Time
Time
Evalua3on  Setup    
9
VCPU

Core 0
Core 1
RT-
Xen
Schedulers (Deferrable, Polling, Periodic, Sporadic)
Dom0

App  
App  
VCPU

Dom1

VCPU

Dom5

App  
App  
Scheduling  Algorithm  
(Deferrable,  Polling,  Periodic,  Sporadic)  
(Period,  Budget,  Priority)  
for  Dom1  
(Period,  Budget,  Priority)  
for  Dom2  
…  
IDLE  
Use  Rate  Monotonic  within  each  Domain  
For  each  task:  
       shorter  period  -­‐>  higher  priority  

50
60
70
80
90
100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8


Deferrable
Sporadic
Polling
Periodic
Credit
SEDF
Deadline Miss Ratio
Xen
 Credit
 vs.  Real-­‐Time  VM  Scheduling  
10
Total CPU Load
Credit scheduler

poor
real-time performance

Real-time VM
scheduling helps!
“RT-­‐
Xen
:  Towards  Real-­‐Time  Hypervisor  Scheduling  in  
Xen
”,    
ACM  Interna2onal  Conferences  on  Embedded  SoLware  (EMSOFT),  2011  
10
RT  Virtualiza3on  II:  
 Incorpora3ng  Composi3onal  Scheduling  


Composi3onal  Scheduling  Framework  (CSF)      


Provides  
temporal  isola4on
 and  
real-­‐4me  guarantees  


Computes  components’  minimum-­‐bandwidth  resource  model  


Mind  the  gap  
between
 
CSF  theory  
and
 system  implementa4on  


Realizing  CSF  though  virtualiza3on  can  bridge  that  gap  
Resource Model
Resource Model
Resource Model
Parent component
Child components
Workload
Workload
Periodic Tasks

Component

Scheduler
Rate Monotonic

Scheduler
Scheduler
Periodic Resource Model (period, budget)
11
Composi3onal  Scheduling  in  RT-­‐Xen



Component  

 domain  


Periodic  Resource  Model  (
PRM
)  

 Periodic  Server  (
PS
)  


Task  model:  independent,  CPU-­‐intensive,  periodic  task  


Scheduling  algorithm:  rate  monotonic  

12
App
App
 
App
 
App
 
Domain
 
Hypervisor
 
CSF in RT-Xen
(System Implementation))
Hardware
 
Task
Task
 
Task
 
Task
 
Component
Component
 
Root Component
 
Compositional Scheduling
(Theoretical Framework)
PS
PRM
PRM
Domain
 
PS
First  Need  to  Extend  CSF  to  Deal  with    
Quantum-­‐based  Scheduling  Pla;orms  
the upper bound of the period
to find min-BW resource model?
infeasible bandwidth
Non-decreasing

Real-number-based
resource model
Quantum-based
resource model
Necessary condition
for
schedulability



Find  the  minimum-­‐bandwidth  resource  model  for  workload  
W
 
Min-BW resource model

13
P:

B/P:
1
2
of resource model
of resource model
13
Then  Can  Improve  Periodic  Server  Design



Purely  Time-­‐driven  Periodic  Server  (PTPS)  


If  currently  scheduled  domain  is  idle,  its  budget  is  wasted  


Not  work-­‐conserving  
14
t

Δ

D
H

D
L

Budget

Budget

time

Task Release

Task Complete

Execution of tasks in D
H

Execution of tasks in D
L

Current Domain

Periodic  Server  Re-­‐Design  I



Work-­‐Conserving  Periodic  Server  (WCPS)  


If  currently  scheduled  domain  is  idle,  the  hypervisor    
picks  
a  lower-­‐priority  domain
 that  has  tasks  to  execute    


Early  execu3on  of  
the  lower-­‐priority  domain
 
during  idle  
period  does  not  affect  schedulability  
 
15
t

Δ

D
H

D
L

Budget

Budget

time

Task Release

Task Complete

Execution of tasks in D
H

Execution of tasks in D
L

Current Domain

Periodic  Server  Re-­‐Design  II



Capacity  Reclaiming  Periodic  Server  (CRPS)  


If  currently  scheduled  domain  is  idle,  we
 
can  re-­‐assign  this  idled  
budget  to  
any  other  domain
 that  has  tasks  to  execute  


Early  execu3on  of  
the  other  domain
 
during  idle  period  does  not  
affect  schedulability  
 
16
t

Δ

D
H

D
L

Budget

Budget

time

Task Release

Task Complete

Execution of tasks in D
H

Execution of tasks in D
L

Current Domain

Interface  Overhead:  Synthe3c  Workload  
17
0
0.5
1
1.5
2
2.5
3
0
0.2
0.4
0.6
0.8
1
Response Time / Deadline
U
W
: 90.4%, U
RM
: 114.3%, Dom5: (22, 1)


CRPS_dom5 (DMR: 0.0622)
WCPS_dom5 (DMR: 60.5)
PTPS_dom5 (DMR: 100)
100%
60%
0%
CRPS
≥ WCPS ≥
PTPS
deadline miss
CDF Plot, Probability
“Realizing  Composi2onal  Scheduling  Through  Virtualiza2on”,    
IEEE  Real-­‐Time  and  Embedded  Technology  and  Applica2ons  Symposium  (RTAS),  2012  
50
100
150
200
250
300
0
0.2
0.4
0.6
0.8
1
Micro Seconds
CDF Plot


RT

Xen, Original Dom 0
Credit, Original Dom 0
RT  Virtualiza3on  III:  
 Inter-­‐Domain  Communica3on  
18
VMM  Scheduler:  RT-­‐
Xen
 VS.  Credit  
C  5  
C  0  
C  1  
C  3  
C  4  
Dom 3
Dom 4
Dom 0
Dom 1
Dom 2
Linux 3.4.2
100% CPU
sent
pkt
every
10ms
5,000 data points
C  2  
Dom 9
Dom 10

When  Domain  0  is  not  busy,  the  VMM  scheduler  
dominates  the  IDC  performance  for  higher  priority  
domains  (i.e.,  adding  real-­‐4me  scheduling  already  helps)  
But,  is  Real-­‐Time  Scheduling  Enough???  
19
VMM  Scheduler  
C  5  
C  0  
C  1  
C  2  
C  4  
C  3  
Dom 3
Dom 4
Dom 5
Dom 0
Dom 1
Dom 2
100% CPU
0
5000
10000
15000
0
0.5
1
Micro Seconds
CDF Plot


RT

Xen, Original Dom 0



A  LiZle  Background  on  
Xen’s
 Domain  0  
20
C  
D  
A  
neWront  
Domain  1  
Domain  0  

B  
neWront  
Domain  2  

ne4f
 
ne4f
 
TX
RX
netback  
netback[0] {

rx_action
()
;

tx_action
()
; }
neWront  
Domain  m  

neWront  
Domain  n  



ne4f
 
ne4f
 
ne4f
 
ne4f
 
soYnet_data
 
Packets are fetched in a round-robin order
Packets share one queue in
softnet_data

RTCA:  Refining  Domain  0  for  Real-­‐Time  IDC  
21
Packets are fetched by priority, up to a batch size
A  
neWront  
Domain  1  
Domain  0  

A  
neWront  
Domain  2  

ne4f
 
ne4f
 
TX
RX
netback  
netback[0] {

rx_action
()
;

tx_action
()
; }


soYnet_data
 
B  
neWront  
Domain  m  

ne4f
 
ne4f
 
neWront  
Domain  n  


ne4f
 
ne4f
 
B  
Queues are separated by priority in
softnet_data

Effects  on  IDC  Latency  
By  reducing  priority  inversion  in  Domain  0,  RTCA  mi4gates  
impacts  of  low  priority  IDC  on  latency  of  high  priority  IDC  
IDC Latency between Domain 1 and Domain 2 in presence of low priority IDC (us)
22
“Priori2zing  Local  Inter-­‐Domain  Communica2on  in  Xen”,    
ACM/IEEE  Interna2onal  Symposium  on  Quality  of  Service    (
IWQoS
),  2013  
Preserving  Domain  0  Throughput  in  RTCA  
23
Base
Light
Medium
Heavy
0
2
4
6
8
10
12
Gbits/s


RTCA, Size 1
RTCA, Size 64
RTCA, Size 238
Original
A  small  batch  size  leads  to  significant  reduc4on  in  high  
priority  IDC  latency  and  improved  IDC  throughput  
under  interfering  traffic  
iPerf
Throughput between Dom 1 and Dom 2
What  Next?  Time  to  ShiU  Gears  
24
Real-­‐Time  System  Integra4on  is  clearly  important,  but  
Real-­‐Time  Cloud  Compu4ng  may  prove  even  more  so  
Towards  Real-­‐Time  Cloud  Services  


Key  challenge:  how  to  analyze  3ming  and  provide  guarantees  in  the  
face  of  resource  
elas4city
 and  
mul4-­‐tenancy  
25
“A  
distributed  system
 is  one  in  which  the  failure  of  a  
computer  
you  didn't  even  know  existed
 can  render  
your  
own  computer
 unusable.”  –  Leslie  
Lamport
 
A  
virtualized  system
 is  one  in  which  the  failure  of  a  
computer  
that  doesn’t  actually  exist
 can  render  your  
en4re  applica4on  
unusable.  
How  to  Address  this  Issue?  


Need  to  shiU  our  assump3ons  about  system  design  to  give  precise  
real-­‐3me  seman3cs    
within
 resource  
elas4city
 and  
mul4-­‐tenancy  
26

…  the  Java  pla;orm's  promise  of  "Write  Once,  Run  Anywhere,”  …  
offer[s]  far  greater  cost-­‐savings  poten3al  in  the  real-­‐3me  (and  more  
broadly,  the  embedded)  domain  than  in  the  desktop  and  server  
domains.”  …    
“The  real-­‐3me  Java  pla;orm's  necessarily  qualified  promise  of  "Write  
Once  Carefully,  Run  Anywhere  Condi3onally"  is  nevertheless  the  best  
prospec3ve  opportunity  for  applica3on  re-­‐usability.
”    
–  
Foreward
 to  the  Real-­‐Time  Specifica4on  for  Java  
Clouds  are  
not
 Real-­‐Time  Today    


Virtualiza3on  technology  underlying  clouds  is  not  real-­‐3me  


Xen
:  virtual  machine  monitor  for  Amazon  EC2  


CPU:  propor3onal-­‐share  scheduling  


If  anything,  I/O  is  worse  


Vague  “performance  indicators”:  low/medium/large  


Or  you  can  pay  a  lot  to  get  dedicated  physical  network  resources  
27
App
App
VM

App
App
VM

Hardware

Hardware



R
eal-time
X

Not real-time
Hardware
Virtual Machine Monitor
Mo3va3on  to  Make  Clouds  Real-­‐Time  


Hard  to  provide  3ming  guarantees  


Simple  Interface  -­‐>
 
no
 
3ming  informa3on  


Consolida3on  ra3o  keeps  increasing  -­‐>  more  
compe33on    


Live  migra3on  without  no3fica3on        -­‐>  
unstable
 
performance  


Why  are  3ming  guarantees  important?  


If  the  steal  3me  exceeds  a  given  threshold,  Ne;lix  
shuts  down
 
the  
virtual  machine  and  
restarts
 
it  elsewhere
 
 [
sciencelogic
],  [
scout
]  


“Xbox  One  may  offload  
computa3ons
 
to  cloud…”    
[MicrosoU  Blog]
 


“Energy  efficient  GPS  sensing  with  
cloud  offloading
”          [
Sensys’12
]  


…  also,  smart  grids,  earthquake  early  warning,  etc.  in  CPS  
28
28
Towards  Improving  the  
 Current  State  of  the  Art  


Func3ons  of  the  cloud  management  system  are  an  essen3al  focus  


Interface  to  the  end  users  


VM  ini3al  placement  


VM  live  migra3on  (load  balance,  host  maintenance,  
etc
)  


Commercial  management  systems  are  mostly  close-­‐source:  Amazon  
EC2  (Xen),  Google  Compute  Engine  (KVM),  MicrosoU  Azure              
(Hyper-­‐V),  
VMware  vCenter  (
vSphere
)
,  Xen  Center  (
XenServer
)  


Open  source  alterna3ves  


OpenStack
 (
HPCloud
,  RackSpace,  etc),  CloudStack,  OpenNebula,  …  


All  compa3ble  with  XenServer,  
vSphere
,  KVM,  etc.  
29
29
Limita3ons  and  Opportuni3es  –  Interface    


VMware  vCenter  


Reserva3on
:  minimum  guaranteed  resources,  in  MHz  


Limita3on
:  upper  bound  for  resources,  in  MHz  


Share
:  rela3ve  importance  of  the  VM  


OpenStack  


#  of  VCPUs  
30
30
Limita3ons  and  Opportuni3es  –    
Ini3al  VM  Placement  


Filtering  


VM-­‐VM  affinity  /  an3-­‐affinity,  VM-­‐Host  affinity  /  an3-­‐affinity,  etc  


When  is  a  host  ‘full’?  


VMware  vCenter:  based  on  
reserva3on
 of  VMs  


OpenStack:  pre-­‐configured  
ra3o
 (default  is  16)  


Ranking  


VMware  vCenter:  try  each  host;  turn  on  stand-­‐by  hosts  


OpenStack:  spread  and  packed  
31
31
Limita3ons  and  Opportuni3es  –    
VM  Load  Balancing  


Open  source  alterna3ves:  no  load  balancing  by  default  


VMware  vCenter  


Distributed  Resource  Scheduler  (DRS)  


Triggered  every  5  min,  calculate  
normalized
 host  u3liza3on  


Minimize  
cluster-­‐wide  imbalance  
   (standard  devia3on  over  all  hosts)  
32
32
Concluding Remarks


Much has been accomplished already


RT-Xen, CSF, RTCA support real-time in open-source Xen


Other approaches have focused on other virtualization
architectures and platforms (e.g., L4), mechanisms, etc.


Much remains to be done


Especially as we move towards larger and more complex
real-time systems and systems-of-systems


Gains made in real-time virtualization can be extended to
offer (and define) new capabilities for real-time clouds
33
Thank  You!  
All  source  code  is  available  at  
hZp://sites.google.com/site/real3mexen/
   
34