Converting Behavioral Verilog to Transistor Counts

heartlustElectronics - Devices

Nov 2, 2013 (3 years and 8 months ago)

78 views

E
-
Voting Machine
-

Design Presentation


Group M1


Bohyun Jessica Kim


Jonathan Chiang


Chi Ho Yoon


Donald Cober



Mon. Sept 29

System Hardware Component Diagram

Gate
-
level Data path

Updated Transistor Estimates

Floorplan

Secure Electronic Voting Terminal


Behavioral Verilog Entire System


Gate
-
level Hardware Block Diagram


Updated Transistor Count Calculations


Initial Floorplan


Structural Verilog Entire System


Refined Floorplan



Status Update

Data Bus

Machine
Init FSM

User ID
FSM

Selectio
n FSM

Confirm
ation

FSM

Display

User ID SRAM

Message ROM

Card Reader

Fingerprint Scanner

Encryption Key SRAM

User Input

Write
-
in SRAM

Choice SRAM

TX_Check

Selection Counter

Key Register

XOR


8 bit

Full

Adder

8 bit

Full

Adder

8 bit

Full

Adder

8 bit

Full

Adder

XOR



?

8 bit MUX

0

1

8 bit MUX

0

1

8 bit MUX

0

1

8 bit

Add/Sub

0

1

8 bit MUX

T: 128

8
-
bit

REG

T: 88

8
-
bit

REG


?

COMMS Register

Shift
Registe
r In

Shift
Registe
r Out

constant init

SUPER MUX!

SuperMux:


Our data flow consists of shuffling 8 bits of
data from a source to a destination


These sources and destination are SRAMs,
User Input, Comms, etc


Many are bidirectional


Since only one piece of data will be sent at
a time, it makes sense to use a bus
configuration for data movement rather
than a set of giant muxes


We can gate which srcs/dests (drop
points) are connected to the bus with one
level of pass logic


This way the data will only ever go
through two layers of pass logic to


Get onto the bus


Get off of the bus


We will still call this the SuperMux for
legacy purposes


Layout will be fun

data[7:0]

Drop point

Drop point

Drop point





Original Implementation:


64
-
bit blocks: Two 32
-
bit inputs


128
-
bit key: Four 32
-
bit keys (
K[0], K[1], K[2], K[3]
)


Feistel Structure: Symmetric structure used in block ciphers


“Magic” constant:


9E3779B9 (
Delta
) = 2^32 / 1.6180339887 (golden ratio)


64 Feistel rounds = 32 cycles



E
-
Voting Machine Implementation:


16
-
bit blocks: Two 8
-
bit inputs


32
-
bit key: Four 8
-
bit keys


32 Feistel rounds = 16 cycles


Decision:

Scale up 1.6 golden ratio by magnitude of 10 to 16, scale

(2^16) by 10 = 655360 and do division 655360 / 16 to get

Delta. Avoids using Floating point for key scheduler.


New Delta = A000, truncate least sig bit to A000 to fit 16 bits

when decrypting, since A00 * 8 cycles = 0x5000


Hardware
:


4, 5
-
bit Shifters


16
-
bit Multipliers


16
-
bit Adder / Subtractor








Tiny Encryption Algorithm Project Specs

COMMS BLOCK Hardware Implementation 1

States

inA[7:0]

inB[7:0]

sel_out

sel_shift[1:0]

sel_sum

v_out[7:0]


(1)

delta

sum[7:0]


0


00


0

v_out0 = sum[7:0]


(2)


v1

sum[7:0]


0


01


1

v_out1= (C+D)


(3)


v1 << 4

k0


1


10


0

v_out2= (A+B) ^ (C+D)


(4)


v1 >> 5

k1


1


11


0

v_out3 = (A+B) ^ (C+D) ^ (E+F)


(5)

v0

out3


0


1


1

v_outx = V0 + (A+B) ^ (C+D) ^ (E+F)

States (6)
-
(9) same as above except using k2, k3, and flip v1, v0

Implementation goes through 9 states/clk cycles each
iteration to update output function v_outx.


Reusing of
:

(1x) 8 bit Full adder/sub (Ripple carry) [16*8 = 128]

(2x) 2:1 8 bit MUX for output pass
-
through [4*8*2 = 64]

(8x) 2
-
input XORS [6*8 = 48]

(1x) 8 bit REG [11*8 = 88]

(1x) 4:1 8 bit MUX for shifting selection [12*8 = 96]


In addition, logic will to iterate 8 times and be
controlled via FSM machine that uses
:

(2x) 3:1 8 bit MUX for state input selection [8*8*2 = 128]

(2x) 1 bit Counter adder for updating cycle [16*2 = 32]

(2x) 1 bit REG for storing updated cycle [11*2 = 22]

Total: 606


Advantages:

Saves transistors and area for Comms Block


Disadvantages:

Very heavy pass
-
logic from MUX layers and XOR

High clk frequency required since reusing same
components for calculating outx by stages. This translates
to higher power consumption since we are trying to do
more with less hardware.


Tradeoff:

Every 8
-
bit MUX uses 4*8 = 32 transistors compared to 8
-
bit Full Adder 16*8 = 128 transistors. However MUXES
have high pass
-
logic so area vs. power tradeoff is
concerned here.



sum += delta;


v0 += ((v1<<4)+k0) ^ (v1+sum) ^ ((v1>>5)+k1);


v1 += ((v0<<4)+k2) ^ (v0+sum) ^ ((v0>>5)+k3);

sel_out

3:1

8 bit MUX


1
-
bit

REG


clk

1 bit Full

Adder

8 bit

Full

Adder/Sub

8 bit MUX

8
-
bit

REG

8’h00

inA[7:0]

inB[7:0]


sel_sum

0

1

clk

T: 128

T: 48

T: 88

v_outx

4:1

8 bit MUX

00 01 10 11

sel_shift[1:0]

T: 64

8 bit MUX

0

1

T: 32

T: 32

inA[7:0]

sel_shift[1:0]


delta 00



v1 01



v1 << 4 10



v1 >> 5 11

3:1

8 bit MUX

Logical Shifter Code

XOR


COMMS BLOCK Hardware Implementation 2

Implementation 2 does concurrent calculations for all 3
parts of function, completes full iteration of calculations in 2
clk cycles.


Uses
:

(1x) 8 bit Full adder/sub (Ripple carry) [16*8 = 128]

(3x) 8 bit Full adder (Ripple carry) [12*8*4 = 384]

(4x) 2:1 8 bit MUX for output pass
-
through [4*8*4 = 128]

(16x) 2
-
input XORS [6*16 = 96]

(2x) 8 bit REG [11*8*2 = 176]

(1x) 1 bit Counter adder for updating cycle [16]

(1x) 1 bit REG for storing updated cycle [11]

Total: 939


In addition, logic will not need complex FSM, just
needs to do 8 iterations.


Advantages:

Low pass logic, speed performance, low power, MUX logic
transistor count essentially halved.


Disadvantages:

More Transistor Count and larger area.


Tradeoff:

Larger area but low pass logic from reduced MUX and
complex FSM simplifies design, increases speed and
minimizes power.



sum += delta;


v0 += ((v1<<4)+k0) ^ (v1+sum) ^ ((v1>>5)+k1);


v1 += ((v0<<4)+k2) ^ (v0+sum) ^ ((v0>>5)+k3);

XOR


clk

T: 128

T: 88

v_outx

8 bit

Full

Adder

K0

V1

sum

K1

T: 128

T: 128

8 bit

Full

Adder

8 bit

Full

Adder

8 bit

Full

Adder

V0

T: 128

XOR


sel_out

8 bit MUX

0

1

T: 32

8 bit MUX

0

1

8 bit MUX

0

1

T: 32

T: 32

{V1[3:0], 4’b0}

{5’b0, V1[7:5]}

V1

V1

8 bit

Add/Sub

delta

sel_out

output


0 pass sum, V1


1 pass new sum, V0

0

1

8 bit MUX

T: 128

8
-
bit

REG

clk

T: 88

8
-
bit

REG


1
-
bit

REG


clk

1 bit Full

Adder

E
-
Voting TEA Gate Level Hardware

Full

Adder

Common full adder

Mirror Adder


-
Uses 28 transistors (including 4 transistors in inverters)


-
NMOS and CMOS are completely symmetrical


logic :

S = a


b


Carryin



Carryout = (a


b) • Carryin +(a • b)


E
-
Voting TEA Gate Level Hardware

Full

Adder

What we decided to use in this project…

1
-
bit full adder


-
Uses pass
-
transistor logic for computing XNOR


-
Sum
-
bit equals to A^B^C, where A and B are 2 inputs and Cin is the Carry
-
in input;


muxing at the bottom will sort out the Cout bit to carry out.


-
Will use this adder 8 times to compute all 8 bits of data


-
Uses inverters to strengthen the signal at the end of each XNOR


-
Uses only 16 transistors yet strong signal

E
-
Voting TEA Gate Level Hardware

XOR

XOR


-
To avoid using two t
-
gates


-
Uses 6 transistors (XNOR + inv)


MUX

T
-
gate Mux


-
4 transistors


-
very tiny hence difficult to layout


E
-
Voting TEA Gate Level Hardware

REG

TSPC Register


-
True single phase clock flip
-
flop


-
Advantage of single clock distribution, small area for clock lines, high speed and no clock


skew


-
We will use 8T instead of 9T

SRAM Gate Level Hardware

SRAM Cell


-
6T SRAM Cell


-
smaller transistor size


-
lower energy dissipation


-
efficient layout

SRAM Gate Level Hardware

Address Decoder


-
Combination of inverters and nand gates

SRAM Gate Level Hardware

SRAM


-
Input/Ouput tri
-
state buffers?


-
Need of Sense amplifier?

Encryption Key SRAM

(4 byte)

2bi t Address

8bi t Data

Card Reader

1bi t Card Detected Si gnal

Machine Initialization FSM

1bi t Acti vate next

Data Bus

8bi t Data

COMMS

1bi t Data Ready

8bi t Data

1bi t Message

Message ROM

8bi t Data

4
-
bit Data bus control

User ID SRAM

(8 byte)

3bi t Address

8bi t Data

Card Reader

1bi t Card Detected Si gnal

User ID FSM

1bi t Acti vate next

Data Bus

8bi t Data

COMMS

1bi t Data Ready

8bi t Data

2bi t Message

Message ROM

8bi t Data

Fingerprint Scanner

1bi t Fi nger Scanned Si gnal

8bi t Data

1bi t Acti vate thi s

1bi t Reacti vate thi s

Display

8bi t Data

7
-
bit Data bus control

User Input

1bi t Yes Si gnal

1bi t No Si gnal

Choice SRAM

(4 byte)

2bi t Address

8bi t Data

User Input

1bi t Next Page Si gnal

Selection FSM

1bi t Acti vate next

Data Bus

8bi t Data

COMMS

1bi t Data Ready

8bi t Data

2bi t Message

Message ROM

8bi t Data

1bi t Acti vate thi s

1bi t Reacti vate thi s

Display

8bi t Data

6
-
bit Data bus control

1bi t Previ ous Page Si gnal

Selection Counter

8bi t Data

3bit Count

User Input

1bi t Yes Si gnal

Confirmation FSM

1bi t Reacti vate Sel ecti on

Data Bus

COMMS

1bi t Data Ready

8bi t Data

2bi t Message

Message ROM

8bi t Data

1bi t Acti vate thi s

Display

8bi t Data

8
-
bit Data bus control

1bi t No Si gnal

1bi t Reacti vate User ID

User ID SRAM

(8 byte)

8bi t Data

Write
-
in SRAM

(64 byte)

8bi t Data

Choice SRAM

(4 byte)

8bi t Data

3bi t Address

2bi t Address

6bi t Address

1bi t Reset

1bi t Reset

1bi t Reset

TX_Check

1bi t TX_good

The statement that we only transfer one byte of data at a time is technically false

For example:


Encryption Key SRAM

(4 byte)

COMMS

Message ROM

When the Message ROM is sending a message to the COMMS

The COMMS are using data from the Encryption Key SRAM to encode the message

SUPER MUX!

We can circumvent this by hardwiring the Encryption Key SRAM data to the
COMMs Key input in addition to attaching it to the bus. This only works
because the Key SRAM will never be active on the data bus while the
COMMs are accessing it


Data Bus

SUPER MUX!

Other hardwired Connections:

Choice SRAM

TX Check

The transmission check confirms that the data sent to the main computer
and held in it’s current session matches the choices stored in our SRAM


During the Confirmation FSM the SRAM data is sent to the main
computer and the main computer echos it back.


The echo is streamed into the TX Check (as well as the display) and the
TX Check compares it (as it is streaming) to the Choice SRAM

Write
-
In SRAM

User Input

Converting Behavioral Verilog to Transistor Counts

module machine_init_fsm(clk, cardDetectSig, commDetectSig, actNext, mux_src, mux_dest, message, address);





//Initialize



initial begin




actNext = 0;




state = 0;




next_state = 1'b0;



end



//Main FSM


always @* begin



if(!actNext) begin








case (state)





`s1: begin






mux_src = 0;







mux_dest = 0;






//Wait for card data






if(cardDetectSig) begin







//Send card data to the Key SRAM







next_address = 0;








next_state = `s2;






end






end






`s2: begin






mux_src = `CARD_SRC;







mux_dest = `KEY_SRAM_DEST;







//read in 4 bytes from card reader






if(address==3) begin







next_state = `s3;










end






next_address = address + 1;












end






`s3: begin






//Send a key request to the comms






message = `KEY_REQUEST;






mux_src = `MESSAGE_SRC;







mux_dest = `COMMS_DEST;






next_state = `s4;






end






`s4: begin







mux_src = 0;







mux_dest = 0;








next_address = 0;






//Wait for data to arrive









if(commDetectSig==0) begin







next_state = `s4;






end else begin







next_state = `s5;






end





end






`s5: begin






mux_src = `COMMS_SRC;






mux_dest = `KEY_SRAM_DEST;






//read in 4 bytes from card reader






if(address==3) begin







next_state = `s6;










end






next_address = address + 1;












end






`s6: begin






//proceed






mux_src = 9'bzzzzzzzzz;






mux_dest = 8'bzzzzzzzz;






message = 3'bzzz;






address = 2'bzz;






next_address = 2'bzz;






actNext = 1;






end




endcase



end else begin




mux_src = 9'bzzzzzzzzz;




mux_dest = 8'bzzzzzzzz;




message = 3'bzzz;




address = 2'bzz;




next_address = 2'bzz;



end



end



//State Register:


always @(posedge clk) begin



state = next_state;



address = next_address;


end



endmodule


Machine Init FSM


1.
Create registers:


6 states => 3 D
-
flip
-
Flops


+ 2bit SRAM address


2.
State Change Logic:


Most changes are sequentially
incrementing


Flip Flops are configured as
counters


3.
Further Logic:


Remaining logic consists of
output signals generated mostly
by state


Random logic can be
approximated based on number
and configuration of outputs

D ~Q


>

Q

D ~Q


>

Q

D ~Q


>

Q

D ~Q


>

Q

D ~Q


>

Q

State:

src

dest

message

1

0

0

0

2

CARD

KEY

0

3

MESSAGE

COMMS

KEY_REQUEST


4

0

0

0

5

COMMS

KEY

0

6

z

NEXT

z

5 distinct 1bit outputs

Each 1
-
bit output derived from a 3
-
bit input (state)

Approx 2 / 2 input gates for each

~10 transistors tfor each distinct output


50 transistors total for random logic

Converting Behavioral Verilog to Transistor Counts (cont)

Block

States

Address

Registers

Distinct
Outputs

Random

Transistors

Machine Init FSM

6

2 bits

5

5

50

105

User ID FSM

12

3 bits

7

13

130

207

Selection FSM

7

2 bits

5

9

90

145

Confirmation FSM

9

6 bits

10

8

80

170

User Input

NA

6 bits

14

20

90

244

Selection Counter

NA

NA

3

3

0

33

TX Compare

NA

2 bit

3

1

0

33

Block

Points on Bus

T
-
gates

Transistors

Data Bus MUX

13

104

208

Block

Messages

Inputs

~ Gates / Bit

Transistors

Message ROM

8 (1 byte)

8

7
(35 transistors)

280

Total: 1425

Converting Behavioral Verilog to Transistor Counts (cont)

Block

Bits

Address transistors?

Transistors

Key SRAM

32

8*(2^2)+2*2 = 36

228

User ID SRAM

64

8*(2^3)+2*3 = 70

454

Choice SRAM

32

8*(2^2)+2*2 = 36

228

Write
-
In SRAM

512

8*(2^6)+2*6 = 524

3 596

Total: 7254

Block

Bits

Transistors

COMMs

<slide 7>

939

Shift IN

8

88

Shift Out

8

88

Input/Output MUX

8

32

Register

16

176

Write
-
In SRAM

Choice SRAM

User ID SRAM


?


?

Encryption Key SRAM

Comm Register

MUX

User Input

?USER ID FSM

?COMMS

Shift In

Shift Out

Selection FSM

?Confirmation FSM

Machine Init FSM

Questions?


Thank you!