Linear Algebra in the Foundations of Cryptography

innocentsickAI and Robotics

Nov 21, 2013 (3 years and 10 months ago)

61 views

Reed  Palmer
 
Linear Algebra in the Foundations of Cryptography

Reed Palmer

Linear Algebra with Applications (MATH 547), Spring 2013

Dr. Jeremy Marzuola

The University of North Carolina at Chapel Hill


One field in linear algebra’s broad range of applications is
Cryptog
raphy, the study of securely transmitting data over insecure
channels.

Instances of basic cryptography are evident throughout recorded
history
, seen in Da Vinci’s notebooks, and used heavily for secure
communications in war
. As long as people have been abl
e to write, there
has been a need for communicating sensitive information. With advances
in mathematic operations and computing power, the need, and also
capability, of Cryptography is constantly increasing. Today, Cryptography
plays an important part in o
nline services such as bank transactions, online
currencies, and all sorts of services where secure transmission is
necessary.

In this paper, I present fundamentals of the field of
cryptography that

rely heavily on tools defined by linear algebra.



To il
lustrate some of the foundations of cryptography, s
ay person
a

wanted to

send person B a priva
te message of the text, “LINEAR
ALGEBRA”
. This unencrypted
message is called the plaintext.

A logical

first step
to encrypting plaintext is

to assign
and
replace
e
very possible character

in the message with

an integer value
. Since computers
only fundamentally deal with numbers, a

map from a character to an integer value makes
it easier
for the computer
to algebraically

manipulate each character. This map

is
usually
called a substitution cipher.

A computer scientist searching for a map may immediately
think of Unicode, the industry standard encoding of characters to integers. For computer
encryption, it makes sense to use the Unicode values of each character as the in
teger
value, because of Unicode’s universality. For the purpose of illustrating the basics of
cryptography, I
simply assign each uppercase character A
-
Z to an integer 1
-
26
.

This map
is shown below in figure 1.


Figure 1: Substitution Cipher

A

B

C

D

E

F

G

H

I

J

K

L

M

1

2

3

4

5

6

7

8

9

10

11

12

13


N

O

P

Q

R

S

T

U

V

W

X

Y

Z

14

15

16

17

18

19

20

21

22

23

24

25

26


Note that it is important that both persons know which character maps to which
integer, and vice versa.
If both parties are not aware of this ma
p, the message will not be
immediately decodable. After applying the substitution cipher, the

enciphered message
would now read



12 9 14 5 1 18 1 12 7 5

2

18 1
”.


Reed  Palmer
 
This translated message is called the ciphertext.
Someone who is trying to figure
out the sp
ecific function and substitution cipher of the encryption so that they can
decipher and read enciphered messages or send falsely enciphered messages is called an
attacker.
The message
above
could just be sen
t as is, but would be
relatively
trivial for
and
attacker

without the cipher to

decode

it
,
defeating the ultimate goal of encrypting a
message
. To further secure the message,
more computation needs to be done.
One
computational method would be to enact a

linear

function
, or a key,

on
to each character
of
the enciphered message.
The recipient could enact the inverse of this function,
followed by the inverse of the substitution cipher, on each character in the ciphertext
message, and they would

get the original message back.
This
method

add
s

a
n extra

layer
o
f

encryption to the message, but changing

one letter of the plaintext

only change
s

one
letter of the cipher
text. This makes it easier
to identify the effects the cipher has on text,
and as a result makes it easier to figure out wha
t the key of the encrypti
on is.

Recall that the products of computing a matrix
-
vector multiplication depend on
every single element of the vector, which implies that changing one element of the vector
outputs a completely different resultant vector. By this definition, using matri
x operations
is a more secure form of encryption than a linear function. To perform this encryption, I
first
group
elements into
an
!
×
!

matrix
,
where
!

is an arbitrary integer (usually less
than the length of the message)
and
!
=
(
!"##$%"
 
!"#$%

)
/
!
. I will refer to this
matrix from now on as
!
. Next, I
compu
te the product of
!

with a
n

invertible

!
×
!

matrix
, known a
s the key. I will refer to this matrix as
!
.
The product
!

!

outputs an
!
×
!

matrix

containing the integer values of the ciphertext message.
I will refer to this
product as
!
.
There is no guarantee, however, that these integers will map to any
charact
er in the substitution

cipher. To fix this, I operate modulo 26 on each of
!
’s
entries, so
that each integer in
the encrypted message equals

an integer 1
-
26, which is
guaranteed to map to
a letter in the substitution cipher.
Once the resultant
matrix
!

i
s
calculated, its entries can be arranged by column
back into a string of integers, and then
mapped to characters. Now t
he original message is much harder to decipher.

To illustrate these properties with a simple example, I
encipher

the
plaintext
message


LINEAR
ALGEBRA”

into an integer
string

12 9 14 5 1 18 1 12 7 5 2 18 1
2


defined as
!
. I then arrange the string into a
 
2
×
7

matrix defined as
!

such
that the first
integer

of
!

is stored in
!
!!
, the second stored in

!
!"
, the third stored in
!
!"
, and so forth.
It is necessary to include the extra integer

on the end
of
!

bec
ause
!
.
!"#$%


(henceforth
defined as the number of integers in e)
is not
divisib
le

by 2. All of the entries of matrix
!

mus
t be defined, i.e.
!
.
!"#$%


must be divisible by the length of the key matrix
, or
!
. T
o
accomplish tha
t, an arbitrarily

selected

character

from the integer subset of the
substitution cipher

is
appended

to
!
.

As a digression, t
here are different ways of calculating the extra values, ranging
from random selection to a useful and precise function of the other elements of
!
.

The
method I chose

for this encryption

was to take the summation of all the integer values of
!

and o
perate modulo 26 on that value. Using a function

to calculate the extra value

is
beneficial. Besides introducing minimal interference with the original p
laintext, using a
function

is a linear time computation, not adding excessive time to the encryption.
Also,
if
!

is already divisible by
!
, the writer of the function may choose to append the result of
the function
!

times onto the end of
!
.
A side eff
ect of this method is that it also allows
the recipient to
validate

the message
, or see if it

was corrupted in any way in transition. If
Reed  Palmer
 
the recipient does the same summation and modulo function once they decrypt the
message and their result is not equal,
they know the message has been altered in some
way and is not valid. In theory and for the purpose of explaining underlying mechanics,
the

verification of

the message

is irrelevant, and will not be discussed further in thi
s
paper. If you are interested in
learning about more advanced methods of verifying
transmissions,
I recommend researching
more about
credentials

and validation within the
field of cryptography
.

I then multiply
!

by

a

2
×
2

matri
x
!

with entries ranging from 1
-
26

and det(A)
not equal to 0
and
relatively prime
to 2
6. The
determinant must not equal 0 because any
other case guarantees that the matrix is invertible.
This matrix is easy to construct, I
simply start with the identity matrix
!
!
, and perform basic row operations, making sure
th
at each entry is between 1 and 26, until a satisfactory matrix is constructed. I
f a number
itself
is prime, it is relatively prime to any

other number. T
o fulfill the preconditions, I
calculate the determinant of the matrix
after each row operation, and se
ttle on
a

matrix
when the determinant is equal to a
prime
.
This process guarantees that the matrix is
invertible and that the determinant is relatively prime to the number of elements in the
substitution cipher. Note that this process is not exclusive to t
he subset of
2
×
2

matri
ces,
and will work for any size matrix, albeit the increased difficulty of fulfilling the
preconditions.
In this example,
!

is
defined as


!
=
15
1
4
5
 
!"#
 
26
 



Note that the determinant of
!

is 71, a prime number.
In this ca
se, the message
contains an odd number of characters, so we must add an arbitrarily selected character to
the end of the message so the string has an even number of characters and can be equally
distributed.
To translate the message, compile
every two inte
gers from the enciphered
message into
a

2
×
(
!
.
!"#$%

!
)

matrix, defined as
!
.
The final step of encryption is to
calculate the matrix product of
!

!

and store it as a new matrix
!

with the same
dimensions as
!
.

Sample
calculations
for the fi
rst few columns of
!
 
are shown below.


15
1
4
5
12
9
!"#
 
26
=
 
189
93
!"#
 
26
=
 
7
15


15
1
4
5
14
5
!"#
 
26
=
 
215
81
!"#
 
26
=
 
7
3


15
1
4
5
1
18
!"#
 
26
=
 
33
94
!"#
 
26
=
 
7
16





Once all result
ing elements of

!

are calculated,
I

deconstruct the matrix

from top
to bottom by column

in the same manner as the columns of
!

were constructed,
and
assemble the integers

in that order
.
I calculated the

encrypted message to be



7 15 7 3 7 16 1 12 6 1 22 20 12 14



which maps to


GOGCGPALFAVTLN
”.
Now that the me
ssage is translated and
sent, how does person B extract the original message from the encrypted message?
The

Reed  Palmer
 
product
!

!

can be seen
as a linear transformation

!
!
=
 
!

!
 
!"#
 
26


!
×
!


!
×
!
,

or more generically

!
×
!


!
×
!
, where
!
=
!
.
!"#$

and
!
=
!
.
!"#$%

!
.
the determinant of the matrix i
s by definition not equal to 0, which
implies that the
trans
formation must be
invertible.
The only way to get the original
matrix
!

back from
the resultant

matrix
!

is to apply the in
verse transformation on the resultant. It follows
that the key must be an invertible matrix for person B to be able to decrypt the message.
Both persons A and B must have access to the key to send, receive and read encrypted
messages.

The key matrix
!

was

well chosen and yields the inverse


!
!
!
=
15
1
4
5
 
!"#
 
26
!
!
=
 
71
 
!"#
 
26
!
!
5

1

4
15
 
!"#
 
26


Inverting the determinant
alone
would cause this matrix to have messy fractions
and would surely not
unconditionally
output integers
,

which is necessary for the
substitution cipher to map the integers back
.
Conveniently, The multiplicative inverse of
a modulo operated on a matrix

is not exactly intuitive. In this case,

71
 
!"#
 
26
!
!

is
equal to any
integer

!

such that
71
!
=
1
 
!"
!
 
26
.

If the determinant o
f the matrix is a
multiple of 26
, then this equation has no solutions, and the matrix
cannot

be inverted.
This is why it is necess
ary to check that the determinant of A is relatively prime to the
number of characters in the subs
titution cipher. A
fter applying the modulo operator to the
inverse matrix, the final result is shown below.


!
!
!
=
11
5

1

4
15
!"#
 
26
 
=
 
3
15
8
9
 


This inverse is then multiplied with the
ciphertext matrix
!

in the

same way as the
original p
laintext matrix
.


3
15
8
9
7
15
!"#
 
26
=
 
246
191
!"#
 
26
=
 
12
9


3
15
8
9
7
3
!"#
 
26
=
 
66
83
!"#
 
26
=
 
14
5


3
15
8
9
7
16
!"#
 
26
=
 
261
200
!"#
 
26
=
 
1
18





Verify the rest of the calculations if you wish,
but
it is clear that so far,
these values are mapping back to the original integer string
!
. By showing you this
simple example,
I hope to convey that this technique of encryption,
known as the Hill
cypher,
is one of the most elegant

ways to illustrate th
e roots of all cryptography in terms
of linear algebra
, as well as an excellent way to illustrate an application of matrix algebra
and other properties
.

The tradeoff of this simplicity is very low security, even as
!

increases.
If I put the vectors of the

plaintext substitution cipher into an
!
×
!

matrix
!
,
where
!

is the size of the key matrix
!
, and
!

accomodates the length of the plaintext
message divided by
!
, then the formula


!

!
=
 
!

Reed  Palmer
 


represents the Hill cipher transformation from plaintext
to ciphertext.
Say I am an
attacker and I wish to find the key matrix.
Ideally,
I can inject a plaintext message matrix
!

into the cipher, and I have access to the resulting ciphertext matrix
!
. Using these
matrices, I
can select
!

columns of
each matri
x so that I

can combine them to form an
!
×
!

invertible matrix mod 28.
Assuming

that these submatrices can be found,
I will

r
efer
to them

as
!
!

and
!
!

respectively. Knowing nothing other than a plaintext string and its
cor
responding ciphertext str
ing, I can find the key matrix
!

by the following

simple
operation


!

!
!
=
 
!
!
!
=
!

(
!
!
)
!
!



Notice that the formula above has no reference to the size of the matrices. This
method of breaking a cipher works for plaintext matrices, key m
atrices, and ciphertext
matrices of all sizes, so long as an invertible submatrix can be assembled. So as long as
an attacker has access to the plaintext and corresponding ciphertext, they can calculate
the key and will be able to decrypt all future sent m
essages using the same key. This form
of attack is known as a known
-
plaintext attack and is the most powerful method of
cryptanalysis, or attempting to break the securit
y of a cryptographic algorithm.

There are
innumerable methods of increasing the security of an encryp
tion algorithm.
The method
of encoding
that the basic Hill cipher uses,
such that

each
unique plaintext
input
corresponds to a unique ciphertext output, and each input i
s encrypted independently
from all

others
,

is known as t
he Electronic Code Book
(ECB) method
[3]
.
While this
unique map is integral to the correctness
and invertibility
of this
method of crypto
logy, it
is also ultimately
its

downfall.
Hill ciphers with 2 x 2 keys similar to the example above
are computationally trivial to break. Enacting th
e
example

cipher
from above on the
string “
EASYTO
BREAK” results in the ciphertext



XYXSCY
VT
XY
AD




Includ
ing the calculated appended character,
while the slightly modified string

EASYTOCREAK
” results in the ciphertext



XYXSCY
KX
XY
BI




W
here w
e can see that
changing one letter results in
only
four letters being

changed between the two ciphertexts
, as bolded
in each string
.
Clearly, the first change in
the string is a direct result of changing the B to a C in the two strings. The other change is
a result of the
function that

evaluates the appended character. This will differ as the input
differs. Recall the o
bservation that if the cipher is a linear function, changing one element
of the plaintext only changes the element at that same index in the ciphertext, ignoring
the possible appended character functions. Observe that in the above example, changing
one cha
racter results in a change of two characters in the same space. Using a matrix
operation cipher
involves grouping every
!

characters of the plaintext into an integer
column vector of a matrix. Enacting the matrix vector multiplication on this vector
introduces a dependence between the grouped elements of the vector.
A
n immediate
observation is
that as the size of the key m
atrix increases, the number of letters that
Reed  Palmer
 
change in the ciphertext per letter changed in the plaintext increases as well. Recall that
t
he integer string is separated into vectors
of the

same
dimension as the hei
ght of the key
matrix. For any
!

integers
grouped into a vector, every plaintext character in that group
corresponds to a ciphertext character in the output. Since the matrix produ
ct depends on
the value of all
!

plaintext integers to calculate each ciphertext integer
, if you change one
of the
!

plaintext characters, all
!

ciphertext characters will be affected. That shows that
,
ignoring the appended character function,

!

characters change

in the ciphertext for every
one character that is changed in the plaintext.

A way to increase t
he difficu
lty of breaking a

cipher is to use Cipher Block
Chaining (CBC) as a method of encryption instead of ECB.
Instead of having a direct
map to from plaintext to ciphertext as in the ECB,
CBC

saves the ciphertext value of
previous encryptions, and enacts a func
tion involving that value on the plaintext before
encrypting it with the main cipher. This method increases security in two ways.
Using
this method adds a level of abstraction to the encryption process, making it harder to
crack the cipher, as well as intr
oduces a method of validation for all messages.

An example of an application where this is useful is the budding decentralized
online currency named Bitcoin. Bitcoin uses CBC to encrypt each transaction with the
encrypted data of every
B
itcoin

transaction
that has ever happened. An attacker might be
interested in editing a previous transaction. To do this, they would first have to decrypt
the cipher chain back to that transaction and then redo all the computation required to get
back to the present time, wh
ich is
practically impossible to from a computational
standpoint. This leads to another advantage of CBC, which
is that the more messages are
sent in the chain, the more secure the encryption is
, and the harder it is to edit message
history
.

To illustrate
CBC, I will extend the Hill cipher example used previously. The way
a computer fundamentally deals with numbers is in binary notation, meaning each digit is
either a 0 or a 1. For a CBC, the

function
often used between the previous ciphertext and
the curre
nt plaintext
is a bitwise XOR.
A bitwise XOR iterates through both binary
strings, comparing each significant digit, and returning 1 if there are an odd number of
1’s between the two, and 0 otherwise.
Say I

want to

encrypt the
plain
text
string
“NOTSOE
ASY”,

refer
r
ed

to as
p
1
, using the ciphertext string
“XYXSCYVTXYAD”

from earlier,
refe
r
red

to as
c
0
, as the block
.
F
irst
, I

append 3 characters onto the end of
p
1

so that the lengths of the two messages match.

Next,
I convert
p
1

to its integer form,
which is “
14
15 20 19 15 5 1 19 25 3 3 3
”, and recall the integer form of
c
0
, which is “
24
25 24 29 3 25 22 20 24 25 1 4
”. Because I am illustrating this concept by hand and not on
a computer, I must now convert
p
1

and c
0

to binary.
For this cipher, the largest integer

value we need to represent in binary is 26.
Recall that each

binar
y digit represents a
power of 2. Representing a decimal number
!

in binary involves a linear combination of
!

powers of 2, where
!
<
 
2
!
. Solve this relation for
!

and add a ceiling o
perator to
es
tablish

the following equation for determining
!
, the number of bits needed to encode
a set of
!

values.



!
=
 
log
!
!


Using this equation,
it

is clear that I only need 5 bits
to represent 1
-
26 in binary.
Determining each number’s re
presentation is a simple greedy
algorithm, which

subtracts
Reed  Palmer
 
each power of 2 from the desired number. If the subtraction is greater than or equal to 0,
then perform it and set that bit equal to 1. Otherwise, do not perform the subtraction and
set that bit equ
al to 0.
I

now calculate and

compare each significant digit of
c
0

and p
1
.



c
0

=

110001100111000100110001111001101101010011000110010000100100

XOR

p
1

=
011100111110100100110111100101000011001111001000110001100011



The result of this calculation will be
ref
er
r
ed

to as
p
1


and is shown below


p
1
’=

101101011001100000000110011100101110011100001110100001000111



I now

convert this binary string to back to decimal
and get the integer string “22
22 12 0 12 28 23 7 1 26 2 7”. Then, I assemble a
2
×
6

plaintext matrix
and

multiply it by
the key matrix.

This gives me the ciphertext matrix, which I transform into the ciphertext
integer string,
referred to as

c
1
, which

is “
14

16 24 22 0 6 14 23 15 4 11 17
”.

This sort of
transformation seems as though it would be one way, a
nd an inverse function to decrypt
the text co
uld not be found. However, the b
itwise XOR function is actually
its own
inverse
, meaning applying the function to a string twice will return the
original
string.
The recipient first applies the inverse
matrix
ke
y to the ciphertext c
1

to

get p
1
’ ba
ck. We
then perform a bitwise XOR

on c
0

and p
1



c
0

=
110001100111000100110001111001101101010011000110010000100100

p
1
’=

101101011001100000000110011100101110011100001110100001000111



and get back


p
1

=

011100111110100100
11011110010100001100111100100011000110001


This is the exact
bitstring that

was sent earlier!

Since the ciphertext string c
0

is
transmitted publicly over the insecure channel, every intended recipient knows the value
they need to use for the bitwise opera
tion. While only a marginal increase in security,
CBC allows an encrypted storage of message
history, which

is computationally
infeasi
ble

to edit.

Another issue of security while using the Hill cipher and all other encryption
algorithms is that the key need
s to be shared with both people in order for both to be able
to read and write messages. If two people who wish to communicate have access to a
secure channel over which they can share the key, then this is easy to do, but ultimately,
they could just share

their message over that channel. Also, in today’s age of global
communication, there is almost no such thing as a secure channel. A major issue in
cryptography is how to share a
key over an insecure channel. An encompassing term for
the method most commonly used to deal with this issue is public
-
key cryptography, in
which

every user has access to t
he
key, which

encrypts the data. Typically, this key is

a
special function

in which it is very easy to calculate one way, making it trivial to encrypt
items, but is computationally impossible to calculate the inverse function, and instead,
each user keeps
a separate function

to d
ecrypt data private and secure
. If you wish to
Reed  Palmer
 
know more about these functions, search for public
-
key cryptography, partic
ularly the
RSA algorithm and
the
Diffie
-
Hellman key exchange protoc
ol.

While matrix
-
based ciphers are not the most secure form of encryption, they are
useful for illustrating the fundamental concepts of the field of cryptography, and are a
common choice to do so. If you are interested in further research I recommend reading
the works I have cited, or to search for cryptography involving applications of other
mathematical topics.


References:


1.

J.
-
S. Coron.

What is C
ryptography?


IEEE
Security & Privacy
,

v
ol.

4
, no. 1
,
2006,
pp.

70
-
73.


2.

M. Mokhtari and H. Naraghi.

“Analysis and Design of Affine and Hill Ci
pher,”
Journal Of
Mathematics Research,

vol. 4, no. 1, 2012, pp. 67
-
77.


3.

A.
McAndrew
.


Using the Hill
cipher to

teach cryptographic principles,”

International Journal Of
Mathematical Education In Science & Technology
,

vol 38. no. 7
,
2008, pp.
967
-
979.


4.

W. Diffie and M.E. Hellman.

“New Directions in Cryptography,”

IEEE Transactions
on

Information Theory
,
vo
l. 22, no. 6
,
1976, pp.
644
-
654.