Hardware Implementations of

innocentsickAI and Robotics

Nov 21, 2013 (4 years and 1 month ago)

70 views




Survey on

Hardware Implementations of
Cryptographic Systems

Elliptic Curve Cryptography and others.

-
REMOVED

Goodman & Chandrakasan Implementation (D
omain

S
pecific

R
econfigurable

C
ryptographic

P
rocessor
)


Capable of performing an entire suite of cryptographic
primitives over integer modulo N, binary Galois Fields and
nonsupersingular Elliptic Curves. This includes RSA, ECC



Fully programmable parameters for many cryptographic
systems. Data size can vary from 8 to 1024 bits.



Reconfigurability limited to the subset of functions
(domain) required for public
-
key cryptography as defined
in IEEE 1363. Requires only a small set of configurations
for performing all required operations for RSA, ECC, etc.

Instruction Set Architecture


As defined by IEEE 1363 Public Key Cryptography
Standard Document.




24 instructions broken up into 6 types of operations:
conventional arithmetic, modular integer arithmetic,
GF(2
n
) arithmetic, elliptic curve arithmetic, register
manipulation and processor configuration.

SET_LENGTH length

Sets width of processor to be length+1

REG_CLEAR rd, rs0

Clears regs specified in mask formed by
(rd,rs0)=R<7:0>

REG_MOVE rd, rs0

rd = rs0

REG_LOAD rd

rd is loaded from I/O interface

REG_UNLOAD rs1

rs1 in unloaded to I/O interface

COMP rs0, rs1

Set “gt” and “eq” flags according to the result


ADD/SUB rd, rs0, rs1, rs2

rd = rs0 + rs1 + rs2<0> rs2<2:1> = 00

rd = (rs0 + rs1 + rs2<0>) >> 1 rs2<2:1> = 01

rd = rs0


rs1 rs2<2:1> = 10

rd = (rs0


rs1) >> 1 rs2<2:1> = 11

MOD_ADD rd, rs0, rs1, rs2

rd = (rs0 + rs1 + rs2<0>) mod N

MOD_SUB rd, rs0, rs1

rd = (rs0


rs1) mod N

MONTRED_A

(Pc, Ps) = A . 2
-
n

mod N

MONTMULT

(Pc, Ps) = A . B . 2
-
n

mod N

MONTRED

(Pc, Ps) = (Pc, Ps) . 2
-
n

mod N

DSRCP ISA

DSRCP ISA


cont.

MOD rd, rs0, rs1, rs2

rd = (rs1 . 2
n

+ rs0) mod N, correction factor of


2
2n

mod N stored in rs2

MOD_MULT rd, rs0, rs1, rs2`

rd = (rs0 . Rs1) mod N, correction factor of

2
2n

mod N stored in rs2

MOD_INV rd, rs0

rd = (1 / rs0) mod N

MOD_EXP rd, rs0, rs2, length

rd = rs0
EXP

mod N, Exp has (length + 1)bits,
correction factor of 2
2n

mod N stored in rs2

GF_ADD rd, rs0, rs1

rd = rs0 + rs1 over GF(2
n
) (equiv. to rs0 XOR rs1)

GF_MULT

Pc = A . B

GF_INV

A = 1 / Pc

GF_INVMULT

A = B / Pc

GF_EXP rd, rs0, length

rd = rs0
Exp

mod N, Exp has (length +1) bits

EC_ADD rd, rs0, rs1, rs2

(rd, rd+1) = (rs0, rs0+1) + (rs1,rs1+1), over curve
defined by parameters in (rs2, N).

EC_DOUBLE rd, rs0, rs2

(rd, rd+1) = 2.(rs0, rs0+1), over curve defined by
parameters in (rs2, N)

EC_MULT length

(R4, R5) = Exp(R2,R3) Exp has (length + 1) bits,
over curve defined by parameters in (R6, N)

Top
-
Level system architecture

Reconfigurable


datapath


(32 x 32bits)

Shutdown

controller


Global

controller


m
-
code


ROMs


I/O

interface

32

32

Data

Instruction

Reconfigurable Datapath

Reconfigurable Datapath


Adder: adding/subtracting two
n
-
bit (8 <=
n

<=1024)
operands in 3 cycles.



Comparator: single
-
cycle magnitude comparisons between
two
n
-
bit operands. XOR of the two operands.



Local registers in Reconfigurable Logic Unit (Pc, Ps, A, B,
Exp and N) for special purposes and operations. Eliminates
the need for accessing the register file every cycle.



Two operand buses (rs0 and rs1) and one write
-
back bus.


Modular arithmetic


Complex operations (multiplication, reduction, inversion,
exponentiation) use microcoded instructions








Simple operations (addition, subtraction, comparisons) are
implemented directly in hardware



Modular arithmetic


cont.


Multiplication: Montgomery multiplication


MONTMULT (A,B,N) = A . B . 2
-
n

mod N




Modular inversion: extended binary euclidean algorithm.




Modular exponentiation: square
-
and
-
multiply algorithm.

Precomputes and stores the values {2
n
, rs0.2
n
, rs0
2
.2
n
, rs0
3
.2
n
}

in {R0, R1, R2, R3}.



GF(2
n
) arithmetic


Addition: XOR function in the comparator unit.




Multiplication and inversion: implemented directly in
hardware using the reconfigurable datapath




Exponentiation: same manner as modular exponentiation,
with {1, rs0, rs0
2
, rs0
3
} stored in {R0, R1, R2, R3}.



Elliptic Curve arithmetic


Point addition and doubling: implemented in microcode,
with curve points stored as register pairs (R
i
, R
i+1
) = (x,y)




Point multiplication: performing using a repeated double
-
and
-
add algorithm.




Exponentiation: same manner as modular exponentiation,
with {1, rs0, rs0
2
, rs0
3
} stored in {R0, R1, R2, R3}.



Summary


Tier I: implemented directly in hardware.



Tier II: microcoded instructions composed of sequences of
first
-
tier instructions.



Tier III: microcoded instructions composed of sequences
of both first and second tier instructions



Orlando and Paar Implementation


Elliptic Curve Co
-
processor. Configurable for any size of
the field (# of bits in the key). Needs a host system.




Main features: optimized bit
-
parallel squarer, digit
-
serial
multiplier and two programmable processors.



Most dramatic example: squaring can be performed in one
clock cycle, whereas a general architecture usually requires
m/2
clock cycles ( m >= 160 for reasonable security).

Processor Architecture


MC : main controller. Controls the computation of
kP

and


interacts with the host system.


AUC : arithmetic unit controller. Controls the the




computation of
point

additions, point doublings...


AU: arithmetic unit. Perform
field

additions, squares,




multiplications, inversions under AUC.

Example: Point multiplication


1) Host loads
k

into the MC.

2) Host load the coordinates of
P

into the AU.

3) Host commands the MC to start processing.

4) MC initializes (series of operations)

5) MC commands AUC to perform its initialization

6) Computation is performed by MC, AUC and AU.

Arithmetic Unit


Responsible for field arithmetic


Contains a register file, LSD first multiplier, squarer,
accumulator and a zero test circuit.

Arithmetic Unit


cont.



Field addition / subtraction: 2 clock cycles.



Field multiplication:
AB + C mod F



takes

k
D
clock cycles (1 <=
k
D

<= [
m/D
])


k
D
represents the number of digits of
B

and
m

the number
of bits of the field.

Prototypes


16 bit MC processor with 256 words of program mem.


24
-
bit AUC processor with 512 words of program mem.


128 registers of 167 bits each


32 bit I/O interface to the host system.


4, 8 and 16 bits multipliers (3 prototypes)


Number of cycles to compute curve operations (
D
: size
of the multiplier; 167 is the size of the field)


Software ECC Profiling
-

unoptimized

%time

seconds

#calls

msec/call

Name

50.9

2.74

3360427

0.00081

rot_right

29.6

1.59

10700

0.1486

opt_mul

13.0

0.70

1712760

0.0004

copy

2.4

0.13

130658

0.0010

rot_left


Profile of ONB ECCDSA with 158
-
bit key



ECC Profiling


optimized

%time

seconds

#calls

msec/call

Name

90 +

1.0

10661

0.0938

opt_mul

4.8

0.05

820

0.06

opt_inv


Profile of ONB ECCDSA with 158
-
bit key



ECC Profiling
-

unoptimized


Profile of polynomial ECCDSA with 111
-
bit key



%time

seconds

#calls

msec/call

Name

67.5

3.78

3996909

0.0009

mul_shift

12.1

0.68

33321

0.0204

poly_mul_
partial

6.1

0.34

333882

0.0010

div_shift

3.8

0.21

36544

0.0057

poly_div

ECC Profiling


optimized


Profile of polynomial ECCDSA with 111
-
bit key



%time

seconds

#calls

msec/call

Name

69.1

0.76

567

1.34

poly_inv

22.7

0.25

36544

0.0068

poly_div

5.5

0.06

2134

0.028

poly_mul


Software performance remains highly constrained by


memory access, large integer data and complex arithmetic


operations


Conclusions


Hardware for Cryptosystems rely on fully construction of
a processor/co
-
processor.



Large register files are required.



Reconfigurability is suitable for using different elliptic
curves within the same hardware.



Small additions on ISA are not being exploited (are they
going to be useful??)




References