Hybrid Cloud Security: Replication and Direction of Sensitive Data Blocks

lovingbangInternet and Web Development

Nov 3, 2013 (3 years and 7 months ago)

70 views

Hybrid Cloud Security:

Replication
and Direction of Sensitive Data Blocks

Glenn Michael
Koch
Eric
Drew Advisor: Dr. XiaoFeng Wang Mentor: Kehuan Zhang

School
of Informatics and Computing, Indiana University, Bloomington Indiana

INTRODUCTION

Processing of large scale data sets in a cloud computing environment carries inherent security concerns(see

FIGURE 2
) . Data sent out to public
commodity servers is at greater risk of being compromised than data that is kept on local servers. A hybrid cloud solution in
vol
ves separating sensitive
data which is confined to a private domain (private cloud), from public data (public cloud).This research involves one compo
nen
t of the hybrid cloud
security solution, the replication and direction of sensitive data with changing replica values. The task was to create and m
odi
fy java source code within
the Hadoop Distributed File System, to implement alternative replication factors and then test to verify that data was replic
ate
d to the proper domain
based on its security tag.

REFERENCES

1.
http://atbrox.com/2010/02/17/hadoop/

2.
Hadoop, The Definitive Guide p.2

3.
http://www.businessweek.com/magazine/content/07_52/b4064000281756.htm

FIGURE 2:

Source: Awareness, Trust and Security
to Shape Government Cloud Adoption, Lockheed
Martin, LM Cyber Security Alliance and Market
Connections,
Inc. April
, 2010


HADOOP AND CLOUD COMPUTING

Hadoop is a set of open source technologies that supports reliable
and cost
-
efficient ways of dealing with large amounts of data[1]. The
exponential growth of individual data footprints, as well as the
amount of data generated by machines[2] calls for a means to process
said data. Hadoop is able to deal with large amounts of data because
it splits it into subsets and sends the data to multiple processors.
Multiple processors tied together will process the data at a much
higher rate and then Hadoop reassembles the data into a single result
set. Complex data operations shifted to clusters of computers are
known as clouds[3] and software such as Hadoop orchestrates the
merging of these processes. Hadoop in its present form does not
provide data security. Hadoop does provide data replication, for the
purposes of performance enhancement and fault tolerance, but does
not distinguish private from sanitized data. Our work is to modify
data replication and control in a hybrid cloud structure.

PUBLIC
CLOUD

PRIVATE
CLOUD

CLIENT

DATA

NODE

NAMENODE

DATA

NODE

DATA

NODE

DATA

NODE

DATA

NODE

HYBRID CLOUD DATA REPLICATION

Data replication in a secure Hybrid Cloud environment involves:


Replicating data that is tagged sensitive only to private nodes as
identified in namenode metadata


Replicating sanitized or public data to random nodes, either public
or private as to provide optimum performance and fault tolerance


PUBLIC
CLOUD

CLIENT

NAMENODE

DATA

NODE

DATA

NODE

DATA

NODE

EDITED HADOOP JAVA CODE


The original Hadoop system was designed to work on a single
cloud (
figure 1
).


Thus Hadoop is not designed to automatically detect sensitive data
to ensure that this data will be secure and prevented from being
accessible by public cloud.


Here we modified the original java code to be able to distribute
the data over the public and private clouds, while keeping data
that is considered sensitive on the private cloud (
figure 3
).



The code makes two distinct calls to the public and private cloud
which is distinguished be a true or false value.


FIGURE 1:
Hadoop


original structure

FIGURE
3:
Hybrid cloud structure

DATA

NODE

**This
project is supported in part by NSF CNS
-
0716292
.