keynote presentation

blackeningfourAI and Robotics

Oct 19, 2013 (3 years and 7 months ago)

136 views

1

Architectural
Musings

Rethinking Computer Systems
Architecture

Christopher Vick

cvick@qualcomm.com

June 3,
2012

2




Vision Talk









Mobile
computing and current technologies fundamentally
change key parameters and constraints for computer
system architecture


Vast new opportunities for research of great interest to
and great relevance for industry


Introduction

3



Outline


Computer System Architecture


Then
(Circa 1970)


Scarce
Resources & Bottlenecks


Optimizations


Now
(Mobile Computing Platforms)


Scarce
Resources & Bottlenecks


Optimizations?


Qualcomm Research


Questions?


4



COMPUTER SYSTEM
ARCHITECTURE

5



Computer
System Architecture


Hardware


The 5 classic components (Patterson & Hennessy)


Input,

Output, Memory,
Datapath
, Control


Software


System Virtual Machine (Hypervisor, VM, or VMM)


Operating
System


Compilers & Tools


Definitions


The way components fit together


The arrangement of the various devices in a complete computer system or
network


The instruction set plus a model of the execution of the instruction set
(Amdahl
et al
)


Computer System Architecture


The selection and
combination

of hardware and software components to
assemble an
effective

computer
system

6



Combination

7



Effective


An optimization problem


Many variables


Selection of hardware/software components


Selection of interfaces/interconnects


Many constraints


Physical,
sociological, technical
& cost constraints


Scarce Resources and Bottlenecks


Maximize utilization of scarce resources


Minimize impact of bottlenecks

8



THEN

(CIRCA 1970)

9



Scarce Resources


CPU Cycles


CPUs expensive


Slow clock rates


Memory Locations


Random Access Memory
expensive


Address/Data paths into CPU expensive


Skilled Programmers


Relatively new
discipline


Poor language and tools support

10



Bottlenecks


Programmer Productivity


Software development slow and expensive


Low level programming paradigms


Memory Latency


RAM latency gated overall speed (~2
-
3 MHz)


Small RAM backed by vastly slower storage


I/O Bandwidth


Limited CPU connectivity


Crude communication mechanisms

11



Optimizations


Time Sharing


Effective sharing of limited resource


Virtual Memory


Effective sharing, and backing with cheaper alternative


Hardware Improvements


Smaller features provide more resource and faster clock


Large Scale Integration


Better signaling to improve bandwidth


High Level Programming Languages


Broadens productive programmer community


Abstracts away some hardware complexity

12



Examples


Digital PDP 11


16
-
bit address space


Orthogonal instruction set


Memory mapped I/O


Unix, DOS, many others






IBM
System 370


24
-
bit address space


Virtual Memory


VMS, VM/370, DOS/VS


Backward compatibility with System 360


13



NOW

(
MOBILE COMPUTING)

14



Scarce Resources


Energy


Fixed Energy Budget

for mobile devices


Thermal issues at all scales


Tradeoff between performance and energy


Shrinks no longer significantly improving consumption


Memory Bandwidth


Providing bandwidth is
expensive


Memory interconnect consumes significant energy

15



Bottlenecks


Memory Latency


Increasing gap between CPU
speed and
DRAM
latency


Physical distance to DRAM devices a factor


Concurrency


Shortage of programmers who can handle this


Inadequate language/tools support


I/O
Bandwidth/Latency


Wireless bandwidth lower than wired


Consumes large amounts of energy

16



Example


HTC One


Processor: 1.5 GHz Dual Core Qualcomm MSM8960


OS:
Android™ 4.0 (ICS
)


Memory RAM:
1 GB
DDR2


Memory Storage:
16 GB onboard
storage


Display:
4.7" HD super LCD 1280 x 720


Network:
LTE CAT3
-

DL 100 /UL
50 LTE
: 700/AWS

WCDMA
: 2100/1900/AWS/850

EDGE: 850/900/1800/1900


Battery: 1800
mAh


Camera
(Main): 8 MP, f/2.0, BSI
, 1080p
HD
Video

(Front): 1.3 MP with 720p video


Dimensions: 134.8
x
69.9
x
8.9mm


This is a General Purpose Computer!

17



Optimizations?


Multi
-
core


Aggressive addition of cores and threads


Hardware concurrency outstripping
software


New Concurrent Programming Models/
Tools?


Memory Subsystem


Significant contributor to total energy consumption


Adding bandwidth is expensive


New technologies addressing some energy issues


W
ireless bandwidth enhancements (LTE
Advanced,etc
.)


Solutions from desktop/server or embedded worlds
may not directly apply in mobile space!


18



Memory System
E
nergy


Retaining data (one second)


DRAM: ~1
-
10
pJ
/bit self
-
refresh


SRAM: 1200+
pJ
/bit, and rising over time [ITRS 2009]


4
pJ
/bit (45nm LP, standby)
[
Barasinski

et al., ESSCIRC ‘08]


Flash, PCM, STT RAM…:

Zero !


Moving Data


32
-
bit value:


Recompute
: 60
pJ

(Razor)


Send 1mm: 10
pJ


Retain in cache for 1
ms
: 38
pJ


Retain in DRAM for 1 second: 32+
pJ



19




Move less!


Caches physically close to CPU


Locality, locality, locality (the first rule of chip real estate)


Retain less!


Power off unused caches lines [
Kaxiras

et al., ISCA ‘01]


“Drowsy” caches [
Flautner

et al., ISCA ‘02]


… with compiler analysis

[Zhang et al., Trans.
Emb
. Comp. Sys. 4(3) 2005]


Don’t refresh unused DRAM


… e.g. with garbage collection [Chen et al., CODES+ISSS ‘03]

Reducing Memory System Energy

20




Maintaining the illusion of a single flat memory address
space
is
too expensive


On
-
chip caches
can be
major consumers of area and
energy


Coherence protocols are expensive and difficult to
scale


Alternative: software
-
managed memory hierarchies


Tightly
-
coupled memory (TCM), scratchpads


Do not require tag memory, address comparison logic


More area
-

and energy
-
efficient


Help bridge
gap between
bandwidth
and
throughput


Extending the Memory Model

21




Different programming paradigm: software explicitly
orchestrates all transfers between on
-
chip and off
-
chip
memory areas


Major implications on memory management


Scratchpad allocation strategies


Data partitioning strategies


Dynamic relocation between scratchpad and DRAM to track the
program’s locality
characteristics


Opportunities for compile
-
time and runtime
optimization


Challenges in both Hardware and Software!


New Challenges and Opportunities

Qualcomm Research

Excellence in
Wireless

MAY
|
2012
WWW.QUALCOMM.COM
/RESEARCH


23



23

State of the Art Capabilities Fostering Innovation


Prototype Development Facilities


CPU Simulation Clusters


Antenna Ranges


Outdoor Field Systems


30% of engineers with PhD,

50% Masters


Systems, HW, SW, Standards,
Test Engineering


Ventures, Bus Dev, Technical
Marketing, Program Mgmt.

Complete Development Labs

Human Resources

24



Global Research and Development
Organization

UNITED STATES

EUROPE

ASIA


San Diego, CA


卡湴p

C污牡l CA


Bridgewater, NJ


Cambridge, UK


N畲敭扥牧b 䝥牭慮a


Vienna, Austria


Beijing,

China


䉡湧慬潲攠慮搠
ey摥牡扡搬df湤楡


卥潵氬p匮 䭯牥h

25



Qualcomm Research & University Relations

ACADEMIC COLLABORATION TO FOSTER ADVANCED RESEARCH
RESEARCH


Ongoing relations with more than 30 US and 25 International Universities


Current funding includes MIT, UC Berkeley, Stanford,

UCSD, UT Austin, ASU,
UIUC, Univ. of Michigan, EPFL,
IISc

Bangalore, KAIST,
Tsinghua

Research collaboration spans
variety of technical areas


Computer vision,
multicore

processing, context aware computing, machine
learning,

low power devices,, wireless networks and signal processing, etc.
.

Qualcomm Innovation Fellowship (
QInF
)

invests

on

innovative ideas


Close interactions between Qualcomm Research

engineers,

graduate students and
professors

26



INNOVATE

BEYOND WAN

EXCELLING IN ALL
FORMS OF
WIRELESS

TAKE WWAN TO
THE NEXT LEVEL

IMPROVING WWAN
TECHNOLOGY

RE
-
ARCHITECTING
NEXT
-
GEN MOBILE
DEVICES

BREAKTHROUGH
PERFORMANCE

TRANSFORMING
THE MOBILE USER
EXPERIENCE

ENABLE SMART
APPLICATIONS

Qualcomm Research For The Wireless
Future

27



Innovate Beyond WAN

WIRELESS LOCAL AREA

PEANUT

WIFI

ADVANCED

LTE D2D

(
FLASHLINQ
)

INNAV


Next gen short range
ultra
-
low power radio


Multi
Gbps

WLAN using

5
GHz and 60 GHz band.


Next Gen low
-
power
WiFi

for Internet of Things


Proximal Wireless


First Gen

device
-
to
-
device wireless network


Autonomous discovery


Direct communications


Indoor positioning for indoor
location based applications


Map tools for Mobile
Devices


28



AUGMENTED
REALITY

LOOK

LISTEN

DASH

AWARE


Mobile user
interface


Computer vision for
mobile devices


Multiple

language
text detection and
recognition


With Mobile phone
camera view finder


Background

Audio
processing


Augmented user

experience


Efficient video
delivery
over
HTTP
for

mobile
devices


Build awareness
in mobile devices


For

enhanced
daily life situations

Enable Smart Applications

ELEVATE THE WIRELESS USER EXPERIENCE

29



Breakthrough Device Performance

RE
-
ARCHITECTING
NEX
-
GEN DEVICES

ADVANCED RADIO
TECHNOLOGIES

MANTICORE

GRYPHON


New RF front
-
end and
baseband technologies


RF/antenna and
systems/protocol
techniques


Concurrent multi
-
radio
operation


Advanced mobile device
SW

platforms


Improved user
experience


Virtual machine
design for
SoC

architecture


Enabling higher

power
efficiency

Thank You