1
Introduction to Cryptography
Introduction to Cryptography
2
CMPS 122, Spring 2004
What is cryptology?
Greek:
“
krypto
”
= hide
Cryptology
–
science of hiding
⇒
cryptography + cryptanalysis +
steganography
Cryptography
–
secret writing
Cryptanalysis
–
analyzing (breaking) secrets
◆
Decipher
(
decryption
) is what we do
◆
Cryptanalysis
is what they do
2
Introduction to Cryptography
3
CMPS 122, Spring 2004
Steganography
“
Covered
”
messages
Technical
steganography
◆
Invisible ink, shaved heads, microdots
Linguistic
steganography
◆
“
Open code
”
–
secret message appears innocent
–
“
East wind rain
”
= war with USA
–
Broken dolls in WWII
◆
Hide message in loworder bits in GIF
Introduction to Cryptography
4
CMPS 122, Spring 2004
Cryptology
vs
. security
Cryptology is a branch of mathematics
◆
Lots of formal representation
◆
Proofs about encryption are possible
Security is a system issue
◆
Easiest way to violate security is through people!
◆
Security uses cryptology and other tools
3
Introduction to Cryptography
5
CMPS 122, Spring 2004
Terminology
Encrypt
Decrypt
Plaintext
Plaintext
Alice
Bob
Eve
Insecure Channel
C = E(P)
P = D(C)
E must be
invertible
Ciphertext
Introduction to Cryptography
6
CMPS 122, Spring 2004
Kerckhoff
’
s
Principle
Cryptography always involves two things
◆
Transformation
◆
Secret
Security should depend only on the secrecy of the
key
◆
Assume the enemy can get the algorithm
–
Can capture machines (or people), disassemble programs, etc.
–
Very expensive and difﬁcult to invent a new algorithm if the old
one might have been compromised
◆
Security through obscurity isn
’
t
–
Look at history of examples
–
Better to have scrutiny by open experts
“
The enemy knows the system being used.
”
(Claude
Shannon)
4
Introduction to Cryptography
7
CMPS 122, Spring 2004
Alice and Bob
K
E
K
D
C = E(K
E
, P) = E
K
E
(P)
P = D(K
D
, C) = D
K
D
(C)
K
E
= K
D
=>
symmetric
encryption
K
E
≠
K
D
=>
asymmetric
encryption
Encrypt
Decrypt
Plaintext
Plaintext
Alice
Bob
Ciphertext
Introduction to Cryptography
8
CMPS 122, Spring 2004
Overview of modern cryptography
Three basic types of algorithms
◆
Symmetric (shared) key encryption
◆
Asymmetric (public key) encryption
◆
Secure hash functions
For each type of algorithm, many choices
◆
Symmetric key: DES, AES, Blowﬁsh, RC5, RC6
◆
Asymmetric key: RSA,
ElGamal
, elliptic curve
◆
Secure hash function: MD4, MD5, SHA1, RIPEMD
Different implementations within a type of algorithm share
many characteristics
◆
Goal, approach are similar
◆
Speciﬁc implementation details may differ
Good books on algorithms include
Applied Cryptography
(somewhat dated) and
Practical Cryptography
5
Introduction to Cryptography
9
CMPS 122, Spring 2004
Symmetric key encryption
Encryption key and decryption key are identical
Strength of algorithm is usually proportional to 2
key length
◆
Assumes a truly random key!
Algorithm is usually fast
◆
Around 20 cycles per byte for many algorithms
◆
Upwards of 100 MB/s possible on today
’
s CPUs
◆
Straightforward to build hardware to run the algorithm
Decryption may be the same algorithm as encryption, but isn
’
t always
K
S
K
S
Encrypt
Decrypt
Plaintext
Plaintext
Alice
Bob
Ciphertext
Introduction to Cryptography
10
CMPS 122, Spring 2004
Asymmetric key encryption
Keys come in pairs: <KU,KR> (K
U
is p
u
blic, K
R
is p
r
ivate)
◆
Designation of which is public and which is private is arbitrary
◆
Knowing one key of a pair won
’
t help you ﬁgure out the other one
Encryption and decryption are typically the same algorithm
◆
May be applied in either order (public or private encrypt ﬁrst)
◆
D
KR
(E
KU
(
m
)) = D
KU
(E
KR
(
m
)) =
m
Usually much slower than symmetric key encryption
◆
Speed much less than 1 MB/s
KU
KR
Encrypt
Decrypt
Plaintext
Plaintext
Alice
Bob
Ciphertext
6
Introduction to Cryptography
11
CMPS 122, Spring 2004
Secure hash functions
Variablelength input produces ﬁxedsize output
◆
Similar to encryption, but without a key and output blocks collapsed together
Secure:
“
difﬁcult
”
to construct fake plaintexts
◆
Weak collision resistance: difﬁcult to ﬁnd a plaintext with the same hash value as any
randomlychosen plaintext
◆
Strong collision resistance: difﬁcult to ﬁnd pairs of plaintexts with the same hash value
Useful because secure hash function can serve as a standin for the plaintext for
various other functions
…
Secure hash
Plaintext
Alice
Bob
Hash value
Introduction to Cryptography
12
CMPS 122, Spring 2004
Simple cipher: Substitution Cipher
C = E
K
(p)
C
i
= K[p
i
]
Key is alphabet mapping
◆
a
→
J, b
→
L, ...
Suppose attacker knows algorithm but not key, how
many keys to try?
◆
Answer: 26! (26 factorial)
◆
If every person on earth tried one per second, it would
take 5 billion years
7
Introduction to Cryptography
13
CMPS 122, Spring 2004
Monoalphabetic
Cipher
“
XBW HGQW XS ACFPSUWG FWPGWXF
CF AWWKZV CDQGJCDWA CD BHYJD
DJXHGW; WUWD XBW ZWJFX PHGCSHF
YCDA CF GSHFWA LV XBW KGSYCFW SI
FBJGCDQ RDSOZWAQW OCXBBWZA
IGSY SXBWGF.
”
We know:
This is English text.
It uses a
monoalphabetic
cipher
Introduction to Cryptography
14
CMPS 122, Spring 2004
Frequency Analysis
“
XBW HGQW XS ACFPSUWG FWPGWXF CF AWWKZV
CDQGJCDWA CD BHYJD DJXHGW; WUWD XBW
ZWJFX PHGCSHF YCDA CF GSHFWA LV XBW
KGSYCFW SI FBJGCDQ RDSOZWAQW OCXBBWZA
IGSY SXBWGF.
”
W: 20
“
Normal
”
English:
C: 11
e
12%
F: 11
t
9%
G: 11
a
8%
8
Introduction to Cryptography
15
CMPS 122, Spring 2004
Pattern Analysis
Most common trigrams in English:
the = 6.4%
and = 3.4%
XBe
=
“
the
”
?
“
XBe
HGQe
XS
ACFPSUeG FePGeXF
CF
AeeKZV CDQGJCDeA
CD BHYJD
DJXHGe
;
eUeD
XBe
ZeJFX
PHGCSHF YCDA CF
GSHFeA
LV
XBe
KGSYCFe
SI FBJGCDQ
RDSOZeAQe
OCXBBeZA
IGSY
SXBeGF
.
”
Introduction to Cryptography
16
CMPS 122, Spring 2004
Guessing
“
the
HGQe
tS
ACFPSUeG FePGetF
CF
AeeKZV
CDQGJCDeA
CD
hHYJD DJtHGe
;
eUeD
the
ZeJFt
PHGCSHF YCDA CF
GSHFeA
LV the
KGSYCFe
SI
FhJGCDQ RDSOZeAQe OCthheZA
IGSY
StheGF
.
”
tS
= to
➞
S =
“
o
”
9
Introduction to Cryptography
17
CMPS 122, Spring 2004
Guessing
“
the
HGQe
to
ACFPoUeG FePGet
F
CF
AeeKZV
CDQGJCDeA
CD
hHYJD DJtHGe
;
eUeD
the
ZeJFt
PHGCoH
F
YCDA
CF
GoHFeA
LV the
KGoYCFe oI
FhJGCDQ RDoOZeAQe OCthheZA IGoY otheG
F
.
”
F appears at the end of many words
➞
likely a consonant
CF is a common twoletter word
➞
C likely a vowel
F =
“
s
”
and C =
“
i
”
otheGs
= others
➞
G =
“
r
”
Introduction to Cryptography
18
CMPS 122, Spring 2004
Guessing
“
the
HrQe
to
AisPoUer
sePrets
is
AeeKZV
iDQrJiDeA
iD
hHYJ
D
DJtHre
;
eUe
D
the
ZeJst
PHrioHs YiDA
is
roHseA
LV the
KroYise oI shJriDQ
RDoOZeAQe OithheZA IroY
others.
”
sePrets
=
“
secrets
”
➞
P =
“
c
”
AiscoUer
= discover
➞
A =
“
d
”
, U =
“
v
”
iD
=
“
if
”
or
“
in
”
, but
“
D
”
ends two words (unlikely to be
“
f
”
)
oI
=
“
on
”
or
“
of
”
, (
“
r
”
already deciphered)
D =
“
n
”
and I =
“
f
”
10
Introduction to Cryptography
19
CMPS 122, Spring 2004
Guessing
“
the
HrQe
to discover secrets is
deeKZV
inQrJined
in
hHYJn nJtHre
; even the
ZeJst
cHrioHs Yind
is
roHsed
LV the
KroYise
of
shJrinQ RnoOZedQe OithheZd froY
others.
”
At this point, start completing individual words.
Yind
=
“
mind
”
&
froY
=
“
from
”
➞
Y =
“
m
”
Kromise
=
“
promise
”
➞
K =
“
p
”
cHrioHs
=
“
curious
”
➞
H =
“
u
”
And so on
…
Introduction to Cryptography
20
CMPS 122, Spring 2004
Monoalphabetic
Cipher
“
The urge to discover secrets is deeply ingrained
in human nature; even the least curious mind
is roused by the promise of sharing knowledge
withheld from others.
”
 John Chadwick,
The Decipherment of Linear B
11
Introduction to Cryptography
21
CMPS 122, Spring 2004
Why was it so easy?
Doesn
’
t hide statistical properties of plaintext
◆
Common letters in plaintext will result in common
symbols in
ciphertext
Doesn
’
t hide relationships in plaintext
◆
EE cannot match dg
English (and all natural languages) are very
redundant
◆
About 1.3 bits of information per letter
–
Many combinations of letters simply don
’
t exist or aren
’
t common
◆
Running English thru
gzip
reduces size by a factor of 6
–
8 bits/letter / 1.3 bits of information per letter
≈
6
Introduction to Cryptography
22
CMPS 122, Spring 2004
How can we make it tougher?
Cosmetic: use different symbols
Hide statistical properties:
◆
Encrypt
“
e
”
with 12 different symbols,
“
t
”
with 9
different symbols, etc.
◆
Add nulls, remove spaces
Polyalphabetic
cipher
◆
Use different substitutions
Transposition
◆
Scramble order of letters
12
Introduction to Cryptography
23
CMPS 122, Spring 2004
Types of attacks
Ciphertextonly
—
how much
ciphertext
is needed?
Known plaintext
—
often
“
guessed plaintext
”
Chosen plaintext (get ciphertext)
◆
Not as uncommon as it sounds!
Chosen
ciphertext
(get plaintext)
Leave these to the professionals:
◆
Dumpster diving
◆
Social engineering
◆
“
Rubberhose cryptanalysis
”
(actually an advanced form
of social engineering)
–
Use threats, blackmail, torture, and bribery to get the key.
Introduction to Cryptography
24
CMPS 122, Spring 2004
Really brief history: ﬁrst 4000 years
Cryptographers
Cryptanalysts
3000BC
monoalphabetics
900
al
Kindi
 frequency analysis
Alberti
–
first
polyalphabetic
cipher
1460
Vigen
è
re
1854
Babbage breaks
Vigen
è
re
;
Kasiski
(1863) publishes
13
Introduction to Cryptography
25
CMPS 122, Spring 2004
Really brief history: last 100 years
Cryptographers
Cryptanalysts
1854
1918
Mauborgne
–
onetime pad
Mechanical ciphers  Enigma
1939
Rejewski
repeated
messagekey attack
Turing
’
s loop attacks,
Colossus
Enigma adds rotors, stops repeated key
1945
Feistel
block cipher, DES
Linear, Differential
Cryptanalysis
?
1973
PublicKey
Quantum Crypto
Introduction to Cryptography
26
CMPS 122, Spring 2004
How does cryptology advance?
Arms race between cryptographers and cryptanalysts
◆
Often, disconnect between two (e.g., Mary Queen of Scots used
monoalphabetic
cipher long after known breakable)
Multidisciplinary ﬁeld
◆
Linguists, classicists, mathematicians, computer scientists, physicists
Secrecy often means advances rediscovered and
miscredited
◆
Publickey cryptography ﬁrst done by British security agency, rediscovered
by
Difﬁe
&
Hellman
Dominated by needs of government: war is the great catalyst
Cryptanalysis advances led by most threatened countries:
◆
France (1800s), Poland (1930s), England/US (WWII), Israel? (Today)
14
Introduction to Cryptography
27
CMPS 122, Spring 2004
Security vs. Pragmatics
Tradeoff between security and effort
◆
onetime pad: perfect security, but requires distribution
and secrecy of long key
◆
DES: short key, fast algorithm, but breakable
◆
quantum cryptography: perfect security, guaranteed
secrecy of key, slow, requires expensive hardware
Don
’
t spend $10M to protect $1M
Don
’
t protect $1B with encryption that can be
broken for $1M
Introduction to Cryptography
28
CMPS 122, Spring 2004
Unbreakable cipher: onetime pad
Mauborgne
/
Vernam
[1917]
XOR (
⊕
):
◆
0
⊕
0 = 0 1
⊕
0 = 1
◆
0
⊕
1 = 1 1
⊕
1 = 0
◆
a
⊕
a = 0
◆
a
⊕
0 = a
◆
a
⊕
b
⊕
b = a
E(P, K) = P
⊕
K
D(C, K) = C
⊕
K = (
P
⊕
K)
⊕
K = P
15
Introduction to Cryptography
29
CMPS 122, Spring 2004
Why perfectly secure?
For any given ciphertext, all plaintexts are equally
possible.
Ciphertext
:
0100111110101
◆
Key1: 1100000100110
◆
Plaintext1:
1000111010011 =
“
CS
”
◆
Key2: 1100010100110
◆
Plaintext2:
1000101010011 =
“
BS
”
Introduction to Cryptography
30
CMPS 122, Spring 2004
Perfect security => our job is done?
Can
’
t reuse K
◆
What if receiver has
C1 = P1
⊕
K and
C2 = P2
⊕
K
C1
⊕
C2 = P1
⊕
K
⊕
P2
⊕
K
= P1
⊕
P2
Need to generate truly random bit sequence as long
as all messages
Need to securely distribute keys
16
Introduction to Cryptography
31
CMPS 122, Spring 2004
Vigen
è
re
Invented by
Blaise
de
Vigen
è
re
, ~1550
Considered unbreakable for 300 years
Broken by Charles Babbage but kept secret to help
British in Crimean War (circa 1854)
Attack discovered independently by Friedrich
Kasiski
, 1863
Introduction to Cryptography
32
CMPS 122, Spring 2004
Key is an
N
letter string
Alphabet has
Z
symbols
E
K
(P) = C where
C
i
= (P
i
+
K
i
MOD
N
) MOD
Z
E
“
KEY
”
(
“
test
”
) = DIQD
C
0
= (
‘
t
’
+
‘
K
’
) mod 26 =
‘
D
’
C
1
= (
‘
e
’
+
‘
E
’
) mod 26 =
‘
I
’
C
2
= (
‘
s
’
+
‘
Y
’
) mod 26 =
‘
Q
’
C
3
= (
‘
t
’
+
‘
K
’
) mod 26 =
‘
D
’
Vigenère
Encryption
17
Introduction to Cryptography
33
CMPS 122, Spring 2004
Babbage
’
s Attack
Use repetition to guess key length:
◆
Suppose sequence XFO appears at 65, 71, 122, 176
◆
Calculate distances between occurrences
–
(71
–
65) = 6 = 3 * 2
–
(122
–
65) = 57 = 3 * 19
–
(176
–
122) = 54 = 3 * 18
◆
Key is probably 3 letters long
This approach isn
’
t foolproof
◆
XFO could correspond to different sequences at different
locations
◆
Use lots of different trigrams (or longer!) to ﬁnd the key
length
Introduction to Cryptography
34
CMPS 122, Spring 2004
Index of coincidence
Calculate index of coincidence by
◆
Taking two strings and pairing their letters by position
◆
Computing the fraction of paired letters that are the same
For English, index of coincidence is
◆
About 3.8% for randomly chosen letters (= 1/26)
◆
About 6.6% for real English text
◆
Reason: some letters (and sequences) are more common than others
in English
Index of coincidence is unaffected by simple substitution
ciphers (assuming both strings encrypted with the same key)!
◆
Take the encrypted text and compare it with itself shifted
(horizontally) by N positions (do this for values of N from 1
–
maximum key length)
◆
If N is a multiple of the key length, the index of coincidence will
jump to a higher value
18
Introduction to Cryptography
35
CMPS 122, Spring 2004
PAMP DOKW SCAO PBSJ VFSV HRGE ASEX BRQR AGMR KOPZ
HBOI KIZH LFSV HRGE ASEM UHQV LGFI KWZE UMAJ AVQW
LODI HGAJ YSEI HFOL PTKS BFDI ZSMV JVSS HZEQ HHOL
AVAW LCRT YCVI JHEJ VFIL PQTM OOHI LLFI YBMP ZIBT
VFFM TOKF LONP LHAW BDBS YHKS BOEE YSEI HFOL HGEM
ZHMR A
Key length and frequency
Once you think you know the key length
◆
Slice the
ciphertext
◆
Use the frequency methods we looked at earlier
Example:
◆
Key length = 4
◆
For ﬁrst letter, H=9, L=7 & A=6 are most common => guess a, e, t
◆
Keep going like this
…
Even if each position in key is fully scrambled (not just shifted), this
mechanism works
Introduction to Cryptography
36
CMPS 122, Spring 2004
Vigenère
simpliﬁcation
Use binary alphabet:
◆
C
i
= (P
i
+
K
i
mod
N
) mod 2
◆
C
i
= P
i
⊕
K
i
mod
N
Use a key as long as P:
◆
Ci
= P
i
⊕
K
i
Onetime pad
—
perfect cipher!
19
Introduction to Cryptography
37
CMPS 122, Spring 2004
How do you know the cipher
’
s good?
“
I tried really hard to break my cipher, but couldn
’
t. I
’
m a
genius, so I
’
m sure no one else can break it either.
”
“
Lots of really smart people tried to break it, and couldn
’
t.
”
Mathematical arguments
◆
Key size (dangerous!)
◆
Statistical properties of
ciphertext
◆
Depends on some provably (or believed) hard problem
Invulnerability to known cryptanalysis techniques (but what
about undiscovered techniques?)
Show that
ciphertext
could match multiple reasonable
plaintexts without knowing key
◆
Simple
monoalphabetic
secure for about 10 letters of English:
XBCF CF FWPHGW
This is secure
Spat at
troner
Introduction to Cryptography
38
CMPS 122, Spring 2004
Real world standard
Attacker almost certainly has details of algorithm
Attacker has access to
◆
Limited (maybe) amount of
ciphertext
◆
Known plaintext (sometimes)
◆
Chosen plaintext (occasionally)
Breaking a cipher means the attacker can read a
secret message
◆
May mean the attacker can read
many
secret messages if
the key is reused (think PGP
…
)
20
Introduction to Cryptography
39
CMPS 122, Spring 2004
“
Academic
”
standard
Harsher than realworld standard (but not always)
Assume the attacker has
◆
Full details of the algorithm
◆
An unlimited number of chosen plaintext/ciphertext pairs
Assume attacker can perform a very large number
of computations
◆
Up to, but not including, 2
n
, where
n
is the key size in bits
–
This means that the attacker can
’
t mount a brute force attack, but
can get close
Ciphers that meet this standard may be stronger than
those designed for the
“
real world
”
◆
Example: ENIGMA (more on this later) relied upon
secrecy of the algorithm as well as the key
Introduction to Cryptography
40
CMPS 122, Spring 2004
Showing a cipher is imperfect
Two (easy?) ways to show a cipher is imperfect
◆
Find a
ciphertext
that is more likely to be one message
than another
◆
Show that there are more messages than keys
–
Can be easy if message is longer than key
…
–
Implies that there is some message more likely to be a given
ciphertext
, even if you can
’
t ﬁnd it
Since most ciphers have more messages than keys,
they
’
re imperfect
◆
Onetime pad is an exception!
21
Introduction to Cryptography
41
CMPS 122, Spring 2004
Entropy & rate
The entropy (H) of a message
M
is the amount of
information in the message
◆
H(
M
) = log
2
n
where
n
is the number of possible meanings
◆
Example:
H (month of year)
= log
2
12
≈
3.6 (need 4 bits to encode a year)
◆
Rounding up can give misleading results
–
Encoding three (independent months) requires log
2
12
3
≈
10.8 bits
–
Using 4 bits per month would require 12 bits
…
Absolute rate: how much information can be encoded
◆
R = log
2
Z
, where
Z
is the size of the alphabet
◆
R
English
= log
2
26
≈
4.7 bits/letter
Actual rate of a language:
r = H(
M
) /
N
, where
M
is an
N
letter message.
◆
r of months spelled out using ASCII:
r =
log
2
12 / (8 letters * 8 bits/letter)
≈
0.06
Introduction to Cryptography
42
CMPS 122, Spring 2004
r = H(
M
) /
N
1.3 = H(
M
)/20
H(
M
) = 26 = log
2
n
n
= 2
26
= 6.7 million (of
×
possible)
One out of
×
randomly selected 20letter groups
Rate of English
r
English
is about 1.3 bits/letter (.28 letters/letter).
◆
Many letter combinations don
’
t occur (or don
’
t occur frequently) in
English (
qz
,
xg
,
cfn
)
◆
Many words don
’
t occur together often (
“
educated car
”
)
◆
This ratio can be derived by compressing English text and looking at
the compression ratios (8/1.3
≈
6)
How many meaningful 20letter messages are there in
English?
22
Introduction to Cryptography
43
CMPS 122, Spring 2004
Redundancy &
unicity
Redundancy (D) is deﬁned as:
◆
D = R
–
r
Redundancy in English:
◆
D
English
= 4.7
–
1.3 = 3.4 bits/letter
◆
Each letter is 1.3 bits of content, and 3.4 bits of redundancy. (~72%)
English encoded as ASCII: 1 byte per letter
◆
D = 8
–
1.3 = 6.7
◆
84% redundancy, 14% information
Unicity
◆
Theoretical and probabilistic measure of how much
ciphertext
is
needed to determine a unique plaintext
◆
Does
not
indicate how much
ciphertext
is needed for cryptanalysis
◆
U = H(
K
) / D
–
Minimum
amount of
ciphertext
needed for
bruteforce attack
to
succeed.
Introduction to Cryptography
44
CMPS 122, Spring 2004
Unicity Examples
OneTime Pad
◆
H(
K
) = inﬁnite
◆
U = H(
K
)/
D
= inﬁnite
Monoalphabetic
Substitution
◆
H(
K
) = log
2
26!
≈
87
◆
D
= 3.4 (redundancy in English)
◆
U = H(
K
)/
D
≈
25.5
–
Intuition: if you have 25 letters, probably only matches one
possible plaintext.
Random bit stream (message)
◆
D
= 0
◆
U = H(
K
)/
D
= inﬁnite
◆
No amount of text will be enough!
23
Introduction to Cryptography
45
CMPS 122, Spring 2004
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο