Jared Yost, John Ehring, Nikul Patel 12/10/2010 - Drexel University

southdakotascrawnyData Management

Nov 29, 2012 (4 years and 9 months ago)

227 views


MDSHA

A unique cryptographic hash function


Jared Yost, John Ehring, Nikul Patel

12/10/2010






A fundamental tenet of information assurance, confidentiality, can be secured via the implementation
of a new cryptographic hash function based on

MD5 and SHA
-
1.

Introduction

One of the basic tenants of information assurance is the maintenance of confidentiality. Confidentiality
serves to prevent unauthorized people from accessing certain information. The confidentiality of
information is an

integral part of any organization’s information assurance and security policies. This
aspect of information assurance can be satisfied via a number of methods; data separation, need
-
to
-
know policies, compartmentalization, classification, and encryption al
l seek to accomplish this objective.
Of these methods, encryption is an essential tool in protecting the confidentiality of information

(Boyce
and Jennings, 2002)
.

Specifically, encryption maintains confidentiality by enciphering text using a particular ki
nd of
mathematical algorithm. Essentially, plain text is transformed via the encryption algorithm to
enciphered text. This enciphered text is then transported via some channel to a recipient. If the text was
intercepted, it would be meaningless without the

algorithm’s encryption key. Once the recipient
receives the encrypted text, it is then decrypted, and the recipient successfully receives the original plain
text message.

But decrypting the message requires the successful and secure transmission of the
en
cryption key. Boyce and Jennings note the inherent difficulty of managing this encryption key:

Managing the key is the tricky part. The key must be securely generated, securely transferred, securely
stored, securely updated, secured used/controlled, secur
ely recovered, and, when no longer needed,
securely destroyed” (2002).

There are several ways to encrypt data or manage encryption. For example,
public key infrastructure (PKI) is very useful in ecommerce systems. Additionally, cryptographic hash
functions

can be used to automatically generate keys to encrypt data, often used in the generation of
digital signatures. In this paper, we will discuss our creation of a new type of cryptographic hash
function. This hash function uses both MD5 and SHA
-
1 to create
a
unique cryptographic

hash function
which can be used in encryption.



Motivation

MD5 i
s an algorithm that was created

by Professor Ronald L. Rivest of MIT in 1991, as an extension to
the
MD4 message digest algorithm as it was not as secure. This
algorithm works pretty fast
,

however it
is not as fast as MD4. The MD5 message digest algorithm is fairly simple to implement on digital
signature applications. When transferring files and messages, security can be added quickly and simply
with the use of
MD5. Through the creation of a cryptographic has
h

function with a 128
-
bit message
digest
,

data integrity can be verified. MD5 can be described
to be
as unique as a fingerprint
is to a
person. MD5 is

a way to verify data integrity
,

as it is much more reliab
le than checksum or any other
methods.
However, it is currently obsolete because methods to break the MD5 algorithm have
emerged
.

SHA
-
1 (Secured Has
h Algorithm) was

designed by
the National Security Agency (
USA
) and published in
1995 as a government
standard for digital data protection
. SHA
-
1 is better represented in RFC 3174
,

and
this document
can help answer many questions about its existence. SHA
-
1 provides a 160
-
bit value for
any messages less than 2^64 bits input. The message should be considered

to be a bit string because the
length of the message is equivalent to the number of bits in a message. The signature of the message
can be verified when the message digest is put into a signature algorithm. SHA
-
1 is usually used from
the
SHA series of cry
ptographic hash functions. SHA
-
1 is slower than MD5 because it is a larger digest
.
H
owever
, it is stronger against many types of brute force

attacks.

MD5 and SHA
-
1 also share some common ground
. They are both hash functions used to process a
compressed r
e
presentation of a message or
data file. When the data is condensed and protected using
these algorithms
,

they are doing so by
utilizing their mathematical formulas and transformation
methodologies
. The compressed messages are of a fixed length in both case
s and are known as a
message digest because of their uniqueness.

MD5 and SHA
-
1 c
an be used to guarantee that a

r
eceived
file
is correct and unchanged by simply
matching the hash with the original file. In both MD5 and SHA
-
1
,

the message digest is irrever
sible and
also does not support the option to retract data from the algorithms. These algorithms also
reduce
the
signing of a message digest compared to
t
he actual message because the message size is much larger
than the digest. Allowing a signature to
lin
k to a message digest
improve
s

efficiency.

Computationally
infeasible


is

a term relative to MD5 and SHA
-
1 because it is physically impossible to find two different
messages, which have produced the same message di
gest, no matter what method it

rela
tes to
. If the
message is

altered through the use of these two methods then the message digest will change and be
different causing the verification to fail.

Table 1 shows that even though a message may be the same key
length and data length, the digest will not

be the same. The digest produced will be different in ev
ery
case whe
ther you are using SHA
-
1 or MD5.


Table 1: Test Cases for HMAC
-
MD5 and HMAC
-
SHA
-
1


2. Test Cases for HMAC
-
MD5

test_case

4

key

x0102030405060708090a0b0c0d0e0f10111213141516171819

key_length

25

data

0xcd repeated 50 times

data_length

50

digest

0x697eaf0aca3a3aea3a75164746ffaa79



3. Test Cases for HMAC
-
SHA
-
1

test_case

4

key

0x0102030405060708090a0b0c0d0e0f10111213141516171819

key_length

25

data

0xcd repeated 50 times

data_length

50

digest

0x4c9007f4026250c6bc8414f9bf50c86c2d7235da




Related Work


Though we have used SHA
-
1 and MD5 in the creation of our unique cryptographic hash function, there
are many alternatives available today. SHA
-
1 hash functions are known to
be weak against collision
attacks. For this reason, the SHA
-
2 line of hash functions, such as SHA
-
224 and SHA
-
256, have been
created to help mitigate this weakness. However, due to SHA
-
2’s reliance on SHA
-
1 architecture, it is
suspected that SHA
-
2 hash fun
ctions will soon become significantly weak to collision attacks as well. As a
result, SHA
-
1 and SHA
-
2’s creator, the National Institute of Standards and Technology (NIST) has begun a
competition for who can most efficiently design SHA
-
3, which is expected
to be much more resistant to
collision attacks. Sasaki, Wang, and Aoki have demonstrated how it is possible to afflict SHA
-
256 and
SHA
-
512 with collision and preimage attacks

(2009)
. Again, this serves to illustrate that no hash function
is completely invu
lnerable, and new, better design
ed

hash functions should always be investigated to
maintain a standard level of security.

Message
-
Digest algorithm 5 (MD5) has a number of related hash functions. MD5 was originally based on
the MD4 hash function.

This funct
ion was released to much fanfare and was quickly expanded upon and
made more secure.

However, the need for a successor arose when

the

security

of
the full MD4 hash
function

was critically compromised
, effectively disabling the function in seconds
. While MD5 is the
current Message
-
Digest standard, work has begun on MD6. MD6 was submitted to the NIST competition
to find the best algorithm for SHA
-
3, but has so far been unsuccessful in being formally adopted

(
Dobbertin, 1998)
.

Message
-
Digest and Secu
re Hash Algorithm are not the only cryptographic hash functions available.
Other hash functions, each with their own strengths and weaknesses, exist as alternatives to MD5 and
SHA. For example, Tiger is a hash function designed for 64
-
bit based systems. GO
ST, which uses the
GOST block cipher, is a 256
-
bit cryptographic hash function which was broken in 2008.

In addition to
these specific hash functions, there are also certain categories of hash functions which can

achieve a
particular goal or are useful for

a specific purpose, such as symmetric cryptology (which can guarantee
privacy and authentication), asymmetric (public key) cryptology, and others (
Preneel, 1994)
.





Idea and Design

The main idea behind

MDSHA is to take two
encryption algorithms

(MD5 and SHA
-
1)

and combine them
in order to create an even more secure way to encrypt text. By themselves, MD5 and SHA
-
1 are both
vulnerable algorithms which have been around quite a while (MD5 since 1991, SHA
-
1 since 1995).
However, efforts to combine t
he ways in which encryption algorithms produce their security keys and
utilize their hash functions is new territory which has not been explored

very

thoroughly
.
One main
reason that this

has not been
pursued

is because MD5 produces a 128
-
bit encryption wh
ile SHA
-
1
produces a 160
-
bit encryption, so creating a
decipher
key that is able to distinguish between both types
of encr
yption hash functions, when both encryption algorithms’ individual outputs are ra
ndomly
arranged eac
h time within the combined hash fu
nction output
, can prove rather di
fficult. However, if
the function is designed to randomly generate hash output

in the same w
ay over and over again for text
,
this eliminates the problem of the hash function determining which part of the text was encrypted

with
MD5 and which was encrypted with SHA
-
1. Our example algorithm is based on this theory, but we
would not recommend that o
ur sample be implemented in a production
environment since it is a very
straightforward way to combine the encryption algorithms.
More random
combination
algorithms
should be further researched before this type of encryption is used in a production environment.

The
main resources we would like our algorithm to be focused around are databases. The algorithm
would need to be refined mu
ch more before our group would recommend its use for encrypting wide
network traffic, such as traffic flowing over the internet from major online retailers or banks. Another
reason we would like to focus on databases
is because our group’s designed a
lgorit
hm works well with
databases that a
re programmed using MySQL or PostgreSQL,

since the sample code was made with PHP.
By turning the focus towards databases, our algorithm can be tested and verified until we believe that it
is
refined enough to implement fo
r flowing internet traffic.

Overall, our design reflects a new way of thinking about encryption. Rather than focusing efforts on
trying to get one particular encryption algorithm to become more secure, or even trying to design an
entirely new algorithm, w
e believe that security can be enhanced further through already existing
methods used in a different way.
Researchers should continue trying to create new algorithms, as
eventually every algorithm is cracked by some entity and no longer becomes secure. How
ever, b
y
utilizing what already exists, research efforts
for new, more secure algorithms can be fully fleshed out
before their releas
e
.


Performance Evaluation

In order to demonstrate the ease with each these encryption algorithms can be combined, we decid
ed
to use PHP’s MD5 and SHA
-
1 encryption modules. The code to actually encrypt the text in MD5 and
SHA
-
1, then combine them, is as follows:

<?php


if (isset($_POST[text])) {



$plain = $_POST[text];



$md5 = md5($_POST[text]);



$sha1 = sha1($_POST[text]);



$mdsha

= substr($md5,0,20).substr($sha1,0,20);


}

?>

The algorithm takes a string of text, encrypts the text using MD5, encrypts the text again as SHA1, and
then takes the first half of the MD5 hash output text and combines it with the last ha
lf of the
SHA1 hash
output text. This produces a string which is ha
lf MD5, half SHA1.

In terms of performance, the algorithm
works very well with plain strings of text (less than 1 second to process and combine them), and works
acceptably well with larger blocks of
text (less than 1 minute to find a hash output for a text file or
database,
but this processing time
will increase depending on the size of the text file or database).

The reason the algorithm was decided to be implemented in PHP was because our g
roup was
familiar
with PHP, shortly realizing

that it had potential for encryption projects with some research into the
language, and because
we realized
the algorithm could be programmed efficiently using PHP.

After

testing several setups of the code, we found tha
t using the MD5 and SHA
-
1 algorithms directly worked
the most efficiently. This also enabled us to get the specific algorithm down to eight lines of code,

which
enabled it run fast
with plain text, text files, and database text files.

Our sample algorithm
shows that it is possible to draw out certain parts of a hash output

and use them
in whichever way that you want. Even though our sample only shows an even divide between the two
algorithms, a potential of drawing out any part of the MD5 string and combini
ng it with any other part of
the SHA
-
1 string can be done as well (by slightly modifying the sixth line of code to change the range of
the string from either encryption meth
odology). This provides the developer

with a significant amount
of ways to create a

combined hash
function, which means that the algorithm can be securely changed to
create different
types of
encryption keys if it is ever suspected that
the
current combination algorithm
has been compromised by an outside entity.

In order to actually test

or use

the

algorithm, a
PHP server needs to be configured along with an
interface (text analyzer program, web i
nterface that allows a user to upload

a text file for encryption,

etc.) that the user can utilize in order to run the algorithm.Though our sample algorithm was created
using PHP, there are potentially other languages that could be used to replicate our algorithm (python,
java, ruby, etc.) as long as the language has a m
odule that can create hash output for both MD5 and
SHA
-
1 from plain text.
Server e
nvironments for these languages would need to be setup

and configured

in ord
er to run the algorithm as well
,

if
a developer
decides that they do not want to use PHP
.





Concl
usion

To sum up, the MDSHA algorithm takes already existi
ng technology and algorithms then

combines them
to
produce a new type of encryption key
,

which is harder to decipher than individually using MD5 or
SHA
-
1.
By utilizing the MD5 and SHA
-
1 modules of
PHP, our group was able to produce a very efficient
algorithm for creating combined hash output from both methodologies. By modifying our sample
algorithm, different ways of arranging the newly combined has
h

output can be configured to create
even more
ran
dom (and thus
secure
)

hash outputs than by simply using the first half of MD5 and the
latter half of SHA
-
1. In this way, organizations and individuals can tailor our algorith
m in secret in order
to create

better encryption for themselves

when using our alg
orithm to encrypt plain text, text files, and
text database files.

Though not without issue, we believe that our algorithm can be used in conjunction with other types of
encryption
functions
as well. Starting with MD5 and SHA
-
1 is just the beginning, as ot
her types of
encryption methodologies could also be combined to create different hash function outputs
. The limit to
which this algorithm can be taken is only restricted by the amount of encryption functions currently
available. Though we do not recommend
that researchers discontinue creating new encryption
algorithms, because as technology becomes better and better current encryption algorithms become
easier and easier to break, we believe that our algorithm provides a great interim solution
for when
resea
rchers are trying to implement

new
, more secure (but mostly untested)

encryption methodologies
for production

environments and the internet.



References

Boyce, Joseph and Dan Jennings (2002). “Chapter 2
-

Basic Security Concepts, Principles, and
Strategy”
.Information Assurance: Managing Organizational IT Security Risks. Butterworth
-
Heinemann.
Retrieved from the Drexel University Library Online

Cheng, P., and R. Glenn. "RFC 2202
-

Test Cases for HMAC
-
MD5 and HMAC
-
SHA
-
1 (RFC2202)."

Internet
FAQ Archives
-

On
line Education
-

Faqs.org. Sept. 1997. Web. 07 Dec. 2010.
<http://www.faqs.org/rfcs/rfc2202.html>.

Dobbertin, Hans (1998). “Cryptanalysis of MD4”.Journal of Cryptology.International Association of
Cryptologic Research. Retrieved from SpringerLink

Eastlake,

D., and P. Jones. "RFC 3174."

IETF Datatracker. Cisco Systems, Sept. 2001. Web. 07 Dec. 2010.
<http://datatracker.ietf.org/doc/rfc3174/?include_text=1>.

Preneel, Bart (1994). “Cryptographic Hash Functions”.European Transactions on Telecommunications.
Retr
ieved from Wiley Online Library

Rivest, R. "RFC 1321."

IETF Datatracker. MIT Laboratory for Computer Science and RSA Data Security,
Inc., Apr. 1992. Web. 07 Dec. 2010. <http://datatracker.ietf.org/doc/rfc1321/>.

Sasaki, Yu, Lei Wang, and Kazumaro Aoki (200
9). “Preimage Attacks on 41
-
Step SHA
-
256 and 46
-
Step
SHA
-
512”
. Retrieved from
http://eprint.iacr.org/2009/479.pdf

"What Is MD5?"

AccuHash 2.0
-

CRC32, MD5 and SHA1 Windows Utility to Verify Accuracy of Y
our Files.
06 Nov. 2008. Web. 06 Dec. 2010. <http://www.accuhash.com/index.html>.

"What Is MD5?
-

Definition from Whatis.com."

Information Security: Covering Today's Security Topics.
04 Apr. 2002. Web. 06 Dec. 2010.
<http://searchsecurity.techtarget.com/sD
efinition/0,,sid14_gci527453
,00.html>.

"What Is SHA
-
1?"

AccuHash 2.0
-

CRC32, MD5 and SHA1 Windows Utility to Verify Accuracy of Your Files.
06 Nov. 2008. Web. 06 Dec. 2010. <http://www.accuhash.com/index.html>.