VLSI Implementation of Reconfigurable Cells for RFU in Embedded Processors

mittenturkeyΗλεκτρονική - Συσκευές

26 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

70 εμφανίσεις

VLSI
Implementation

of

Reconfigurable

Cells

for

RFU in
Embedded

Processors

Authors
:

G.C. Cardarilli, L. Di Nunzio, R. Fazzolari,
C. Lenci
, M. Re

University

of

Rome


Tor

Vergata”

Department

of

Electronic
Engineering

Index



Introduction

/
motivation



Reconfigurable

Functional

Units



Multicontext

Logic

Blocks



Traditional

and
proposed

cell

comparison



Performance
evaluation


Delay


Power

consumption


Area
requirements



Conclusions

Motivation



Operands

usually

shorter

than

native
processor

wordlenght

in some
applications

Poor

efficiency

of

general

purpose

processors

while

processing
shorter

data

XOR

AND

Data 1

Data 2

Result

Result

Possible

Solution



Execution

speed

can
be

increased

using

a
reconfigurable

unit

for

“custom”
instructions

ALU

Register

File

PROCESSOR

Reconfigurable

Unit

Reconfigurable

Units

Attached

Processing
Unit

(APU)



Located

outside

of

the
processor

core




“Slow” data
-
transfer
between

APU and
processor




Original

instruction

set

PROCESSOR

ALU

Register

File

Processor

core




APU

Reconfigurable

Units

Coprocessor



Located

outside

of

the
processor

core




Faster

interaction

with

processor

core

than

APUs




Instruction

set
extension

needed

PROCESSOR

ALU

Register

File

Processor

core

Coprocessor

Register

File




Coprocessor

Reconfigurable

Units

Reconfigurable

Functional

Units

(
RFUs
)

ALU

Register

File

PROCESSOR



Integrated

into

the
processor

core




Fastest

interaction
with

the
processor




Core

re
-
design
needed




Instruction

set
extension

needed

RFU

Reconfigurable

Units



Fast data
-
transfer
between

RU and
processor

RFU
approach

chosen



Fast
reconfiguration

of

the RU

Reconfigurable

Unit

requirements
:



Silicon

area
as

small

as

possible



Low
power

consumption

Multicontext

Reconfigurable

Cells

Traditional

approach

(
LUT
-
based
):

One

Look
-
Up
Table

for


each

context

(
operation
)

LUT

Context

1

LUT

Context

N

Selector

input

output

context


selection

Context

Memory

context


selection

Configurable

Block

output

input

Reconfigurable

Logic

Block:

A single
reconfigurable

block, complete

with

a
memory

containing

the
contexts

Proposed

Logic

Block



Full
-
Adder

based



Additional

blocks

for

its

configuration



4
configuration

bits

(2
4

= 16
context
)



3 Input
bits
/ 1 Output bit

Full
Adder

C
IN

C
OUT

X

Y

Sum

M
U
X

M
U
X

Switch

S
0

S
1

S
2

P

S
0

S
1

C
IN

To

C
IN
of

next

LB

D
1

D
2

D
3

LB Out

Data

Configuration

Bits

Reconfigurable

Cell

Comparison

Proposed

Reconfigurable

Cell

A single
reconfigurable

logic

block
based

on a
full
-
adder
,
complete
with

a
memory

containing

the
context

configuration

bits

C
OUT

SUM

Logic

Block

16x3

Context


Memory

Out

Context


Enable

C
IN

3

D
1

D
2

D
3

Context

Selection


3

4

Reconfigurable

Cells

Comparison

Traditional

(
LUT
-
based
)
implementation

of

the
same

cell
:

M
U
X

C
OUT

SUM

16x8 LUT

Out

Context


Enable

C
IN

3

D
1

D
2

D
3

Context

Selection


3

4

16x8 LUT

Out

Context


Enable

M
U
X

8

8

MUX

S
0

Data Input

Performance
evaluation



Simulation

software:
SPECTRE,
Cadence

Virtuoso
Suite




Process

used
:
CL018
by

TSMC, Taiwan (0.18
μ
m
feature

size
)





Process

related

simulation

data:
NCSU Design Kit


Performance
evaluation
: layout

Proposed

cell

layout:

LUT
-
based

cell

layout:

0.00903 mm
2

vs 0.0212 mm
2

(57.4%
less
)


Performance
evaluation
:
delay

Maximum

delays

of

the
proposed

cell
:

Maximum

delays

of

the
traditional

LUT
-
based

cell
:

OUTPUT

CONTEXT SWITCH

INPUT SWITCH

SUM

800ps

450ps

C
OUT

685ps

325ps

OUTPUT

CONTEXT SWITCH

INPUT SWITCH

SUM/C
OUT

845ps

370ps

Performance
evaluation
:
power

Simulation

conditions
:



100 MHz
operating

frequency



100% input
node

activity


Power

consumption

of

the
proposed

cell
:
0.572mW

Power

consumption

of

the
traditional

LUT
-
based

cell
:
1.097mW

Average

power

consumption

reduced

by

48%

Performance
evaluation
:
summary

Summary

of

performance
comparison
:

Conclusions



Architecture

advantages
:



Main

limitations
:


Reduced

flexibility

if

compared

to

a
LUT
-
based

cell



Fast
reconfiguration



Low transistor
count

(68.8%
less
) and area
requirements



Low
power

consumption




Future work:



Use

of

the
proposed

cell

in a complete RFU
architecture



Integration

of

the RFU in
an

existing

embedded

processor