1

Introduction to Cryptography

Introduction to Cryptography

2

CMPS 122, Spring 2004

What is cryptology?

Greek:

“

krypto

”

= hide

Cryptology

–

science of hiding

⇒

cryptography + cryptanalysis +

steganography

Cryptography

–

secret writing

Cryptanalysis

–

analyzing (breaking) secrets

◆

Decipher

(

decryption

) is what we do

◆

Cryptanalysis

is what they do

2

Introduction to Cryptography

3

CMPS 122, Spring 2004

Steganography

“

Covered

”

messages

Technical

steganography

◆

Invisible ink, shaved heads, microdots

Linguistic

steganography

◆

“

Open code

”

–

secret message appears innocent

–

“

East wind rain

”

= war with USA

–

Broken dolls in WWII

◆

Hide message in low-order bits in GIF

Introduction to Cryptography

4

CMPS 122, Spring 2004

Cryptology

vs

. security

Cryptology is a branch of mathematics

◆

Lots of formal representation

◆

Proofs about encryption are possible

Security is a system issue

◆

Easiest way to violate security is through people!

◆

Security uses cryptology and other tools

3

Introduction to Cryptography

5

CMPS 122, Spring 2004

Terminology

Encrypt

Decrypt

Plaintext

Plaintext

Alice

Bob

Eve

Insecure Channel

C = E(P)

P = D(C)

E must be

invertible

Ciphertext

Introduction to Cryptography

6

CMPS 122, Spring 2004

Kerckhoff

’

s

Principle

Cryptography always involves two things

◆

Transformation

◆

Secret

Security should depend only on the secrecy of the

key

◆

Assume the enemy can get the algorithm

–

Can capture machines (or people), disassemble programs, etc.

–

Very expensive and difﬁcult to invent a new algorithm if the old

one might have been compromised

◆

Security through obscurity isn

’

t

–

Look at history of examples

–

Better to have scrutiny by open experts

“

The enemy knows the system being used.

”

(Claude

Shannon)

4

Introduction to Cryptography

7

CMPS 122, Spring 2004

Alice and Bob

K

E

K

D

C = E(K

E

, P) = E

K

E

(P)

P = D(K

D

, C) = D

K

D

(C)

K

E

= K

D

=>

symmetric

encryption

K

E

≠

K

D

=>

asymmetric

encryption

Encrypt

Decrypt

Plaintext

Plaintext

Alice

Bob

Ciphertext

Introduction to Cryptography

8

CMPS 122, Spring 2004

Overview of modern cryptography

Three basic types of algorithms

◆

Symmetric (shared) key encryption

◆

Asymmetric (public key) encryption

◆

Secure hash functions

For each type of algorithm, many choices

◆

Symmetric key: DES, AES, Blowﬁsh, RC5, RC6

◆

Asymmetric key: RSA,

El-Gamal

, elliptic curve

◆

Secure hash function: MD4, MD5, SHA-1, RIPEMD

Different implementations within a type of algorithm share

many characteristics

◆

Goal, approach are similar

◆

Speciﬁc implementation details may differ

Good books on algorithms include

Applied Cryptography

(somewhat dated) and

Practical Cryptography

5

Introduction to Cryptography

9

CMPS 122, Spring 2004

Symmetric key encryption

Encryption key and decryption key are identical

Strength of algorithm is usually proportional to 2

key length

◆

Assumes a truly random key!

Algorithm is usually fast

◆

Around 20 cycles per byte for many algorithms

◆

Upwards of 100 MB/s possible on today

’

s CPUs

◆

Straightforward to build hardware to run the algorithm

Decryption may be the same algorithm as encryption, but isn

’

t always

K

S

K

S

Encrypt

Decrypt

Plaintext

Plaintext

Alice

Bob

Ciphertext

Introduction to Cryptography

10

CMPS 122, Spring 2004

Asymmetric key encryption

Keys come in pairs: <KU,KR> (K

U

is p

u

blic, K

R

is p

r

ivate)

◆

Designation of which is public and which is private is arbitrary

◆

Knowing one key of a pair won

’

t help you ﬁgure out the other one

Encryption and decryption are typically the same algorithm

◆

May be applied in either order (public or private encrypt ﬁrst)

◆

D

KR

(E

KU

(

m

)) = D

KU

(E

KR

(

m

)) =

m

Usually much slower than symmetric key encryption

◆

Speed much less than 1 MB/s

KU

KR

Encrypt

Decrypt

Plaintext

Plaintext

Alice

Bob

Ciphertext

6

Introduction to Cryptography

11

CMPS 122, Spring 2004

Secure hash functions

Variable-length input produces ﬁxed-size output

◆

Similar to encryption, but without a key and output blocks collapsed together

Secure:

“

difﬁcult

”

to construct fake plaintexts

◆

Weak collision resistance: difﬁcult to ﬁnd a plaintext with the same hash value as any

randomly-chosen plaintext

◆

Strong collision resistance: difﬁcult to ﬁnd pairs of plaintexts with the same hash value

Useful because secure hash function can serve as a stand-in for the plaintext for

various other functions

…

Secure hash

Plaintext

Alice

Bob

Hash value

Introduction to Cryptography

12

CMPS 122, Spring 2004

Simple cipher: Substitution Cipher

C = E

K

(p)

C

i

= K[p

i

]

Key is alphabet mapping

◆

a

→

J, b

→

L, ...

Suppose attacker knows algorithm but not key, how

many keys to try?

◆

Answer: 26! (26 factorial)

◆

If every person on earth tried one per second, it would

take 5 billion years

7

Introduction to Cryptography

13

CMPS 122, Spring 2004

Monoalphabetic

Cipher

“

XBW HGQW XS ACFPSUWG FWPGWXF

CF AWWKZV CDQGJCDWA CD BHYJD

DJXHGW; WUWD XBW ZWJFX PHGCSHF

YCDA CF GSHFWA LV XBW KGSYCFW SI

FBJGCDQ RDSOZWAQW OCXBBWZA

IGSY SXBWGF.

”

We know:

This is English text.

It uses a

monoalphabetic

cipher

Introduction to Cryptography

14

CMPS 122, Spring 2004

Frequency Analysis

“

XBW HGQW XS ACFPSUWG FWPGWXF CF AWWKZV

CDQGJCDWA CD BHYJD DJXHGW; WUWD XBW

ZWJFX PHGCSHF YCDA CF GSHFWA LV XBW

KGSYCFW SI FBJGCDQ RDSOZWAQW OCXBBWZA

IGSY SXBWGF.

”

W: 20

“

Normal

”

English:

C: 11

e

12%

F: 11

t

9%

G: 11

a

8%

8

Introduction to Cryptography

15

CMPS 122, Spring 2004

Pattern Analysis

Most common trigrams in English:

the = 6.4%

and = 3.4%

XBe

=

“

the

”

?

“

XBe

HGQe

XS

ACFPSUeG FePGeXF

CF

AeeKZV CDQGJCDeA

CD BHYJD

DJXHGe

;

eUeD

XBe

ZeJFX

PHGCSHF YCDA CF

GSHFeA

LV

XBe

KGSYCFe

SI FBJGCDQ

RDSOZeAQe

OCXBBeZA

IGSY

SXBeGF

.

”

Introduction to Cryptography

16

CMPS 122, Spring 2004

Guessing

“

the

HGQe

tS

ACFPSUeG FePGetF

CF

AeeKZV

CDQGJCDeA

CD

hHYJD DJtHGe

;

eUeD

the

ZeJFt

PHGCSHF YCDA CF

GSHFeA

LV the

KGSYCFe

SI

FhJGCDQ RDSOZeAQe OCthheZA

IGSY

StheGF

.

”

tS

= to

➞

S =

“

o

”

9

Introduction to Cryptography

17

CMPS 122, Spring 2004

Guessing

“

the

HGQe

to

ACFPoUeG FePGet

F

CF

AeeKZV

CDQGJCDeA

CD

hHYJD DJtHGe

;

eUeD

the

ZeJFt

PHGCoH

F

YCDA

CF

GoHFeA

LV the

KGoYCFe oI

FhJGCDQ RDoOZeAQe OCthheZA IGoY otheG

F

.

”

F appears at the end of many words

➞

likely a consonant

CF is a common two-letter word

➞

C likely a vowel

F =

“

s

”

and C =

“

i

”

otheGs

= others

➞

G =

“

r

”

Introduction to Cryptography

18

CMPS 122, Spring 2004

Guessing

“

the

HrQe

to

AisPoUer

sePrets

is

AeeKZV

iDQrJiDeA

iD

hHYJ

D

DJtHre

;

eUe

D

the

ZeJst

PHrioHs YiDA

is

roHseA

LV the

KroYise oI shJriDQ

RDoOZeAQe OithheZA IroY

others.

”

sePrets

=

“

secrets

”

➞

P =

“

c

”

AiscoUer

= discover

➞

A =

“

d

”

, U =

“

v

”

iD

=

“

if

”

or

“

in

”

, but

“

D

”

ends two words (unlikely to be

“

f

”

)

oI

=

“

on

”

or

“

of

”

, (

“

r

”

already deciphered)

D =

“

n

”

and I =

“

f

”

10

Introduction to Cryptography

19

CMPS 122, Spring 2004

Guessing

“

the

HrQe

to discover secrets is

deeKZV

inQrJined

in

hHYJn nJtHre

; even the

ZeJst

cHrioHs Yind

is

roHsed

LV the

KroYise

of

shJrinQ RnoOZedQe OithheZd froY

others.

”

At this point, start completing individual words.

Yind

=

“

mind

”

&

froY

=

“

from

”

➞

Y =

“

m

”

Kromise

=

“

promise

”

➞

K =

“

p

”

cHrioHs

=

“

curious

”

➞

H =

“

u

”

And so on

…

Introduction to Cryptography

20

CMPS 122, Spring 2004

Monoalphabetic

Cipher

“

The urge to discover secrets is deeply ingrained

in human nature; even the least curious mind

is roused by the promise of sharing knowledge

withheld from others.

”

- John Chadwick,

The Decipherment of Linear B

11

Introduction to Cryptography

21

CMPS 122, Spring 2004

Why was it so easy?

Doesn

’

t hide statistical properties of plaintext

◆

Common letters in plaintext will result in common

symbols in

ciphertext

Doesn

’

t hide relationships in plaintext

◆

EE cannot match dg

English (and all natural languages) are very

redundant

◆

About 1.3 bits of information per letter

–

Many combinations of letters simply don

’

t exist or aren

’

t common

◆

Running English thru

gzip

reduces size by a factor of 6

–

8 bits/letter / 1.3 bits of information per letter

≈

6

Introduction to Cryptography

22

CMPS 122, Spring 2004

How can we make it tougher?

Cosmetic: use different symbols

Hide statistical properties:

◆

Encrypt

“

e

”

with 12 different symbols,

“

t

”

with 9

different symbols, etc.

◆

Add nulls, remove spaces

Polyalphabetic

cipher

◆

Use different substitutions

Transposition

◆

Scramble order of letters

12

Introduction to Cryptography

23

CMPS 122, Spring 2004

Types of attacks

Ciphertext-only

—

how much

ciphertext

is needed?

Known plaintext

—

often

“

guessed plaintext

”

Chosen plaintext (get ciphertext)

◆

Not as uncommon as it sounds!

Chosen

ciphertext

(get plaintext)

Leave these to the professionals:

◆

Dumpster diving

◆

Social engineering

◆

“

Rubber-hose cryptanalysis

”

(actually an advanced form

of social engineering)

–

Use threats, blackmail, torture, and bribery to get the key.

Introduction to Cryptography

24

CMPS 122, Spring 2004

Really brief history: ﬁrst 4000 years

Cryptographers

Cryptanalysts

3000BC

monoalphabetics

900

al-

Kindi

- frequency analysis

Alberti

–

first

polyalphabetic

cipher

1460

Vigen

è

re

1854

Babbage breaks

Vigen

è

re

;

Kasiski

(1863) publishes

13

Introduction to Cryptography

25

CMPS 122, Spring 2004

Really brief history: last 100 years

Cryptographers

Cryptanalysts

1854

1918

Mauborgne

–

one-time pad

Mechanical ciphers - Enigma

1939

Rejewski

repeated

message-key attack

Turing

’

s loop attacks,

Colossus

Enigma adds rotors, stops repeated key

1945

Feistel

block cipher, DES

Linear, Differential

Cryptanalysis

?

1973

Public-Key

Quantum Crypto

Introduction to Cryptography

26

CMPS 122, Spring 2004

How does cryptology advance?

Arms race between cryptographers and cryptanalysts

◆

Often, disconnect between two (e.g., Mary Queen of Scots used

monoalphabetic

cipher long after known breakable)

Multi-disciplinary ﬁeld

◆

Linguists, classicists, mathematicians, computer scientists, physicists

Secrecy often means advances rediscovered and

miscredited

◆

Public-key cryptography ﬁrst done by British security agency, rediscovered

by

Difﬁe

&

Hellman

Dominated by needs of government: war is the great catalyst

Cryptanalysis advances led by most threatened countries:

◆

France (1800s), Poland (1930s), England/US (WWII), Israel? (Today)

14

Introduction to Cryptography

27

CMPS 122, Spring 2004

Security vs. Pragmatics

Trade-off between security and effort

◆

one-time pad: perfect security, but requires distribution

and secrecy of long key

◆

DES: short key, fast algorithm, but breakable

◆

quantum cryptography: perfect security, guaranteed

secrecy of key, slow, requires expensive hardware

Don

’

t spend $10M to protect $1M

Don

’

t protect $1B with encryption that can be

broken for $1M

Introduction to Cryptography

28

CMPS 122, Spring 2004

Unbreakable cipher: one-time pad

Mauborgne

/

Vernam

[1917]

XOR (

⊕

):

◆

0

⊕

0 = 0 1

⊕

0 = 1

◆

0

⊕

1 = 1 1

⊕

1 = 0

◆

a

⊕

a = 0

◆

a

⊕

0 = a

◆

a

⊕

b

⊕

b = a

E(P, K) = P

⊕

K

D(C, K) = C

⊕

K = (

P

⊕

K)

⊕

K = P

15

Introduction to Cryptography

29

CMPS 122, Spring 2004

Why perfectly secure?

For any given ciphertext, all plaintexts are equally

possible.

Ciphertext

:

0100111110101

◆

Key1: 1100000100110

◆

Plaintext1:

1000111010011 =

“

CS

”

◆

Key2: 1100010100110

◆

Plaintext2:

1000101010011 =

“

BS

”

Introduction to Cryptography

30

CMPS 122, Spring 2004

Perfect security => our job is done?

Can

’

t reuse K

◆

What if receiver has

C1 = P1

⊕

K and

C2 = P2

⊕

K

C1

⊕

C2 = P1

⊕

K

⊕

P2

⊕

K

= P1

⊕

P2

Need to generate truly random bit sequence as long

as all messages

Need to securely distribute keys

16

Introduction to Cryptography

31

CMPS 122, Spring 2004

Vigen

è

re

Invented by

Blaise

de

Vigen

è

re

, ~1550

Considered unbreakable for 300 years

Broken by Charles Babbage but kept secret to help

British in Crimean War (circa 1854)

Attack discovered independently by Friedrich

Kasiski

, 1863

Introduction to Cryptography

32

CMPS 122, Spring 2004

Key is an

N

-letter string

Alphabet has

Z

symbols

E

K

(P) = C where

C

i

= (P

i

+

K

i

MOD

N

) MOD

Z

E

“

KEY

”

(

“

test

”

) = DIQD

C

0

= (

‘

t

’

+

‘

K

’

) mod 26 =

‘

D

’

C

1

= (

‘

e

’

+

‘

E

’

) mod 26 =

‘

I

’

C

2

= (

‘

s

’

+

‘

Y

’

) mod 26 =

‘

Q

’

C

3

= (

‘

t

’

+

‘

K

’

) mod 26 =

‘

D

’

Vigenère

Encryption

17

Introduction to Cryptography

33

CMPS 122, Spring 2004

Babbage

’

s Attack

Use repetition to guess key length:

◆

Suppose sequence XFO appears at 65, 71, 122, 176

◆

Calculate distances between occurrences

–

(71

–

65) = 6 = 3 * 2

–

(122

–

65) = 57 = 3 * 19

–

(176

–

122) = 54 = 3 * 18

◆

Key is probably 3 letters long

This approach isn

’

t foolproof

◆

XFO could correspond to different sequences at different

locations

◆

Use lots of different trigrams (or longer!) to ﬁnd the key

length

Introduction to Cryptography

34

CMPS 122, Spring 2004

Index of coincidence

Calculate index of coincidence by

◆

Taking two strings and pairing their letters by position

◆

Computing the fraction of paired letters that are the same

For English, index of coincidence is

◆

About 3.8% for randomly chosen letters (= 1/26)

◆

About 6.6% for real English text

◆

Reason: some letters (and sequences) are more common than others

in English

Index of coincidence is unaffected by simple substitution

ciphers (assuming both strings encrypted with the same key)!

◆

Take the encrypted text and compare it with itself shifted

(horizontally) by N positions (do this for values of N from 1

–

maximum key length)

◆

If N is a multiple of the key length, the index of coincidence will

jump to a higher value

18

Introduction to Cryptography

35

CMPS 122, Spring 2004

PAMP DOKW SCAO PBSJ VFSV HRGE ASEX BRQR AGMR KOPZ

HBOI KIZH LFSV HRGE ASEM UHQV LGFI KWZE UMAJ AVQW

LODI HGAJ YSEI HFOL PTKS BFDI ZSMV JVSS HZEQ HHOL

AVAW LCRT YCVI JHEJ VFIL PQTM OOHI LLFI YBMP ZIBT

VFFM TOKF LONP LHAW BDBS YHKS BOEE YSEI HFOL HGEM

ZHMR A

Key length and frequency

Once you think you know the key length

◆

Slice the

ciphertext

◆

Use the frequency methods we looked at earlier

Example:

◆

Key length = 4

◆

For ﬁrst letter, H=9, L=7 & A=6 are most common => guess a, e, t

◆

Keep going like this

…

Even if each position in key is fully scrambled (not just shifted), this

mechanism works

Introduction to Cryptography

36

CMPS 122, Spring 2004

Vigenère

simpliﬁcation

Use binary alphabet:

◆

C

i

= (P

i

+

K

i

mod

N

) mod 2

◆

C

i

= P

i

⊕

K

i

mod

N

Use a key as long as P:

◆

Ci

= P

i

⊕

K

i

One-time pad

—

perfect cipher!

19

Introduction to Cryptography

37

CMPS 122, Spring 2004

How do you know the cipher

’

s good?

“

I tried really hard to break my cipher, but couldn

’

t. I

’

m a

genius, so I

’

m sure no one else can break it either.

”

“

Lots of really smart people tried to break it, and couldn

’

t.

”

Mathematical arguments

◆

Key size (dangerous!)

◆

Statistical properties of

ciphertext

◆

Depends on some provably (or believed) hard problem

Invulnerability to known cryptanalysis techniques (but what

about undiscovered techniques?)

Show that

ciphertext

could match multiple reasonable

plaintexts without knowing key

◆

Simple

monoalphabetic

secure for about 10 letters of English:

XBCF CF FWPHGW

This is secure

Spat at

troner

Introduction to Cryptography

38

CMPS 122, Spring 2004

Real world standard

Attacker almost certainly has details of algorithm

Attacker has access to

◆

Limited (maybe) amount of

ciphertext

◆

Known plaintext (sometimes)

◆

Chosen plaintext (occasionally)

Breaking a cipher means the attacker can read a

secret message

◆

May mean the attacker can read

many

secret messages if

the key is reused (think PGP

…

)

20

Introduction to Cryptography

39

CMPS 122, Spring 2004

“

Academic

”

standard

Harsher than real-world standard (but not always)

Assume the attacker has

◆

Full details of the algorithm

◆

An unlimited number of chosen plaintext/ciphertext pairs

Assume attacker can perform a very large number

of computations

◆

Up to, but not including, 2

n

, where

n

is the key size in bits

–

This means that the attacker can

’

t mount a brute force attack, but

can get close

Ciphers that meet this standard may be stronger than

those designed for the

“

real world

”

◆

Example: ENIGMA (more on this later) relied upon

secrecy of the algorithm as well as the key

Introduction to Cryptography

40

CMPS 122, Spring 2004

Showing a cipher is imperfect

Two (easy?) ways to show a cipher is imperfect

◆

Find a

ciphertext

that is more likely to be one message

than another

◆

Show that there are more messages than keys

–

Can be easy if message is longer than key

…

–

Implies that there is some message more likely to be a given

ciphertext

, even if you can

’

t ﬁnd it

Since most ciphers have more messages than keys,

they

’

re imperfect

◆

One-time pad is an exception!

21

Introduction to Cryptography

41

CMPS 122, Spring 2004

Entropy & rate

The entropy (H) of a message

M

is the amount of

information in the message

◆

H(

M

) = log

2

n

where

n

is the number of possible meanings

◆

Example:

H (month of year)

= log

2

12

≈

3.6 (need 4 bits to encode a year)

◆

Rounding up can give misleading results

–

Encoding three (independent months) requires log

2

12

3

≈

10.8 bits

–

Using 4 bits per month would require 12 bits

…

Absolute rate: how much information can be encoded

◆

R = log

2

Z

, where

Z

is the size of the alphabet

◆

R

English

= log

2

26

≈

4.7 bits/letter

Actual rate of a language:

r = H(

M

) /

N

, where

M

is an

N

-letter message.

◆

r of months spelled out using ASCII:

r =

log

2

12 / (8 letters * 8 bits/letter)

≈

0.06

Introduction to Cryptography

42

CMPS 122, Spring 2004

r = H(

M

) /

N

1.3 = H(

M

)/20

H(

M

) = 26 = log

2

n

n

= 2

26

= 6.7 million (of

×

possible)

One out of

×

randomly selected 20-letter groups

Rate of English

r

English

is about 1.3 bits/letter (.28 letters/letter).

◆

Many letter combinations don

’

t occur (or don

’

t occur frequently) in

English (

qz

,

xg

,

cfn

)

◆

Many words don

’

t occur together often (

“

educated car

”

)

◆

This ratio can be derived by compressing English text and looking at

the compression ratios (8/1.3

≈

6)

How many meaningful 20-letter messages are there in

English?

22

Introduction to Cryptography

43

CMPS 122, Spring 2004

Redundancy &

unicity

Redundancy (D) is deﬁned as:

◆

D = R

–

r

Redundancy in English:

◆

D

English

= 4.7

–

1.3 = 3.4 bits/letter

◆

Each letter is 1.3 bits of content, and 3.4 bits of redundancy. (~72%)

English encoded as ASCII: 1 byte per letter

◆

D = 8

–

1.3 = 6.7

◆

84% redundancy, 14% information

Unicity

◆

Theoretical and probabilistic measure of how much

ciphertext

is

needed to determine a unique plaintext

◆

Does

not

indicate how much

ciphertext

is needed for cryptanalysis

◆

U = H(

K

) / D

–

Minimum

amount of

ciphertext

needed for

brute-force attack

to

succeed.

Introduction to Cryptography

44

CMPS 122, Spring 2004

Unicity Examples

One-Time Pad

◆

H(

K

) = inﬁnite

◆

U = H(

K

)/

D

= inﬁnite

Monoalphabetic

Substitution

◆

H(

K

) = log

2

26!

≈

87

◆

D

= 3.4 (redundancy in English)

◆

U = H(

K

)/

D

≈

25.5

–

Intuition: if you have 25 letters, probably only matches one

possible plaintext.

Random bit stream (message)

◆

D

= 0

◆

U = H(

K

)/

D

= inﬁnite

◆

No amount of text will be enough!

23

Introduction to Cryptography

45

CMPS 122, Spring 2004

## Σχόλια 0

Συνδεθείτε για να κοινοποιήσετε σχόλιο