NetApp_Presentationx - iitk.ac.in

hedgebornabaloneΛογισμικό & κατασκευή λογ/κού

2 Δεκ 2013 (πριν από 3 χρόνια και 11 μήνες)

75 εμφανίσεις

HBASE


An Introduction

ANURAG AWASTHI

AVANI NANDINI



IIT
KANPUR

Overview


Hadoop

:


A brief
introduction :The need


Some basic elements & Concepts


HBASE


Some essentials


Cluster and Data Layout


Architecture &
HFile


Memory Hierarchy


Additional :


Peek into the
HBase

file system


Peek into the Source code


Conclusion

Overview


Hadoop

:


A brief
introduction :The need


Some basic elements & Concepts


HBASE


Some essentials


Cluster and Data Layout


Architecture &
HFile


Memory Hierarchy


Additional :


Peek into the
HBase

file system


Peek into the Source code


Conclusion

HADOOP: Introduction


Why ?


RDBMS


Scaling Problem


Advent of Big Data


Facebook : 15 TB of data per day !


Increase machines number not capacity

HADOOP:
Introduction


Why ?


RDBMS


Scaling Problem


Advent of Big Data


Facebook : 15 TB of data per day !


Increase machines number not capacity


Hadoop

: Distributed System Simplified


Write Once Read
-
Many


Batch Processing


Uses Commodity Hardware

HADOOP:
Introduction


Inspired by HPC (High Performance
computing) but
-


Data Locality is at the core


Works on Semi
-
structured and
unstructured

Overview


Hadoop

:


A brief introduction :The need


Some basic elements & Concepts


HBASE


Some essentials


Cluster and Data Layout


Architecture &
HFile


Memory Hierarchy


Additional :


Peek into the
HBase

file system


Peek into the Source code


Conclusion

Some basic elements & Concepts


Sub
-
projects: Three level hierarchy


Some still under development


Works on large logical blocks


Default Size 64
MB

Some basic elements & Concepts


Sub
-
projects: Three level hierarchy


Some still under
development


Works on large logical blocks


Default Size 64 MB

Our Interest

Some basic elements & Concepts


Core



S
et

of

components

and

interfaces

for

distributed

file

systems

and

general

I/O

(serialization,

Java

RPC,

persistent

data

structures
)

Some basic elements & Concepts


MapReduce



D
istributed

data

processing

model

and

execution

environment

that

runs

on

large

clusters

of

commodity

machines

Some basic elements & Concepts


HDFS



Hadoop

Distributed

File

System


A

distributed

file

system

that

runs

on

large

clusters

of

commodity

machines

Some basic elements & Concepts


ZooKeeper



Distributed
,

highly

available

coordination

service
.

ZooKeeper

provides

primitives

such

as

distributed

locks

that

can

be

used

for

building

distributed

applications

Some basic elements & Concepts


HBase


A
distributed, column
-
oriented
database,
using
HDFS for its underlying
storage

HDFS: Cluster Nodes


Namenode



Master


Maintains file system tree


All the metadata including permissions


Uses Edit Log and Namespace image


To be merged by secondary name node periodically


Datanode


Slave


Performs tasks for client and master

MAP
-
REDUCE


MapReduce

job is a unit of work that the
client wants to
be performed


Consists
of
input
data,
MapReduce

program,
and
configuration information


Map Program, Reduce Program given by
programmer


Hadoop

divides job into tasks (
splits
)


Map
tasks and
Reduce tasks (number
specifed
)


Each task performed independently


Map task enjoys locality, Reduce does not.


Splits
-
>
Hadoop

Block Size

MAP
-
REDUCE

MAP
-
REDUCE
: Cluster Nodes


Jobtracker


Similar to Master


The
jobtracker

coordinates all the jobs run on
the system by scheduling tasks to run on
tasktrackers


If
a task
fails, the
jobtracker

can reschedule
it on a different
tasktracker



Tasktrackers


Similar to Slave


Tasktrackers

run tasks and send progress
reports to the
jobtracker
, which keeps a
record of the overall progress of each
job

Overview


Hadoop

:


A brief
introduction :The need


Some basic elements & Concepts


HBASE


Some essentials


Cluster and Data Layout


Architecture &
HFile


Memory Hierarchy


Additional :


Peek into the
HBase

file system


Peek into the Source code


Conclusion

HBASE: Some Essentials


Distributed
column family

oriented
database built on top of HDFS


Hadoop

application for real
-
time
read/write
random
-
access

to very large
datasets


Not Relational
, does not support SQL


Three
major components
of
HBase


One
master
server


M
any
region
servers


Client library


HBaseAdmin
, Scan, Put, Get, primarily


Nodes terminology



Master
” <
-
>
Namenode




RegionServer
” <
-
>
Datanode

HBASE: Some Essentials

HBASE: Some Essentials


-
ROOT
-

and .META.


Special catalog tables


Maintains the current state, recent history
-


-
ROOT
-

holds list of .META. table region (Master)


.META. holds the details of all user space regions


Based on this information splitting of regions, or
redeployment is done for crash


Load Balancing




Each regionserver hosts {average
-
1,average,average+1
}

number of regions


Master ensures : checks periodically

HBASE: Some Essentials


Split




R
eference
is put
first in “splits” directory


Once
successful, pass
up to
Table directory, lock
it


Atomically
write the values to daughters
,
pass info
to
regionserver

updating
its .META.


Pass
to
master,
will perform load
balancing
accordingly


Compaction

-

2 types


Minor
-

start from older and choose all
<
MinSize


Major
-

every 24 hours for files > max for minor,
or
if minor
includes all files


Can be manually launched

Overview


Hadoop

:


A brief
introduction :The need


Some basic elements & Concepts


HBASE


Some essentials


Cluster and Data Layout


Architecture &
HFile


Memory Hierarchy


Additional :


Peek into the
HBase

file system


Peek into the Source code


Conclusion

HBASE


Cluster Layout

Role of Zookeeper


Zookeeper perform tasks such as:


negotiate
ownership, register services,
watch
for
updates


R
egion servers create ephemeral
node in
ZooKeeper

which


Master uses
to discover available
servers at
start


Heartbeat keep
-
alive mechanism


Used
to track server
failures

Role of Zookeeper


Ephemeral nodes are also bound for the session
between
ZooKeeper

and the
clients


Keep track of the active sessions


HBase

can have its own instance of zookeeper


HBase

also uses
ZooKeeper

to ensure
-


There is only one master running
for a given
cluster


S
tore
the bootstrap location for region
discovery (node having

ROOT
-

table)

Client Operation Path


Zookeeper hosts several Clusters


Client follows the following steps



Gets host address of

ROOT
-


-
ROOT
-

gives
regionserver

hosting the region


The range containing the request determines region
(part of identifier for regions)


.META. gives the address of actual data


Regionserver performs the request


Path cached and reused till fault



Will happen when



Split, Compaction, or load balancing occurs

Data Layout


“Column family Oriented”


Sparse
table model


Column families are
fixed at declaration


Columns can be added to each family,
making the table sparse at convenience


One row id


similar to primary key


Timestamp associated with each entry


Past ‘k’ (default =3) snapshots are stored

Data Layout

Row
-
ID

(Sorted)

Column

Family 1

Column

Family 2

Column Family 3

C
11

C
12

C
21

C
22

C
31

C
32

C
33


Tables can be large


even petabytes of data


Data stored sorted according to the primary
key

Data Layout

Row
-
ID

(Sorted)

Column

Family 1

Column

Family 2

Column Family 3

C
11

C
12

C
21

C
22

C
31

C
32

C
33

Region

1


Region

2


Region

‘I’


Divided into
several regions

-

Maximum
size used for
splitting into daughter

regions

Data Layout

Row
-
ID

(Sorted)

Column

Family 1

Column

Family 2

Column Family 3

C
11

C
12

C
21

C
22

C
31

C
32

C
33




Regions are subdivided into
“Stores”

Store
11


Store
12


Store
3
1


Store
3
2


Store
2
1


Store
22


Store
1i

Store
1i

Store
3
1

Data Layout


Stores are subdivided into “
Storefiles


-

Storefiles

are light weight wrappers around
HFile
,

the actual stored files.

Row
-
ID

(Sorted)

Column

Family 3

C
31

C
32

C
33


Storefile

11

(
HFile

11
)


Storefile

21

(
HFile

21
)


Storefile

31

(
HFile

31
)




Storefile

12

(
HFile

12
)


Storefile

22

(
HFile

22
)


Store
31

HBASE


Data Layout

Overview


Hadoop

:


A brief
introduction :The need


Some basic elements & Concepts


HBASE


Some essentials


Cluster and Data Layout


Architecture &
HFile


Memory Hierarchy


Additional :


Peek into the
HBase

file system


Peek into the Source code


Conclusion

HBASE


Architecture

HFile


HFile

is the actual
consistent file
stored in the file
system


Can be accessed outside
HBase

for retrieval of
data


Variable length file with an upper bound in size


Consists of blocks
-


Only
fixed blocks are the
file info

and
trailer

blocks


Recommend minimum
block size between 8KB to
1MB for general usage (
default 64KB
)


Larger
block size is preferred if files are primarily for
sequential
access


As opposed to
hadoop

blocks of 64 MB


Hbase

blocks are taken by
Hadoop

and written in its
blocks sequentially

HFile
: Structure

Divided into Basic Blocks

HFile
: Structure

Data Blocks in the form of
key value pairs

HFile
: Structure


Meta Data ex



Block index size as per
heapsize
:
208,

compression
=none
,


inMemory
=false
,

firstKey
=row
-
550/colfam1:50/1309813948188/P
ut
,

lastKey
=row
-
699/colfam1:99/1309812292635/P
ut

HFile
: Structure

Index

blocks record the
offsets of the
data

and
meta

blocks


Use
BTree

indexing

File Info


Avg

Key
length,
Value
Length, filter code,
Max
sequence
ID, Time
range,
etc.

HFile
: Structure

HFile
: Structure


Trailer
read first for every
HFile


Has
the pointers to
other blocks


Written
at the end of persisting
the data to the file, hence
finalizing the now immutable data
store


Trailer
ex



fileinfoOffset
=11408,
dataIndexOffset
=11664,
dataIndexCount
=1,
metaIndexOffset
=0,
metaIndexCount
=0,

totalBytes
=11408,

entryCount
=300,

version
=1

Overview


Hadoop

:


A brief
introduction :The need


Some basic elements & Concepts


HBASE


Some essentials


Cluster and Data Layout


Architecture &
HFile


Memory Hierarchy


Additional :


Peek into the
HBase

file system


Peek into the Source code


Conclusion

Memory Hierarchy


The data storage uses Log Structured
Merge (
LSM
) Trees


Uses the same implementation as in Google’s

Bigtable
: A Distributed Storage System for
Structured
Data
” Paper of 2006


Advantage: Converts the
random writes to
sequential writes


Useful when there are frequent updates

2 Component LSM Trees


Trees (typically B
-
Tree): C
0
resides in
Memory, C
1

in Disk


WAL
-
> Merge Changes to
MemStore

-
>
Flush to Disk to C
1

Tree C
0

Tree C
1

In Memory

O
n Disk

2 Component LSM Trees


Blocks read from C
1
merged with C
0
,then
flushed to disk


Expected C
0
entries flushed into each
Leaf Block of C
1



M =
S
block
S
entry



S
0
leaf
S
0
leaf

+

S
1
leaf



S
-
> Respected Sizes, thus total entries in a
page * fraction of entries from C
1

leaf in C
0

2 Component LSM Trees


Problem



Too small M
=> too
small
S
0
leaf
to S
1
leaf


=>
larger
number of seeks
for given C
0


Too large M => Memory cost increases


Solution



Have Multiple Trees C
0
, C
1
, C
2
, …. ,
C
k


Merging between C
i
-
1

and C
i


Storage Hierarchy
-

LSM


Each tree on Disk denotes
StoreFile



a more recent version


Maximum Sizes for each tree are
defined

Storage Hierarchy
-

LSM

Some support for Bloom
Filters is provided

Storage Hierarchy
-

LSM

Overview


Hadoop

:


A brief
introduction :The need


Some basic elements & Concepts


HBASE


Some essentials


Cluster and Data Layout


Architecture &
HFile


Memory Hierarchy


Additional :


Peek into the
HBase

file system


Peek into the Source code


Conclusion


hadoop

fs


lsr

/
hbase



/
hbase





.
logs


Created by
HLog

(uses
hadoop

sequence file,
HLogKey

instances, contains data and key along with region and
table
info)
contains subdirectories
corresponding
to
each
regionserver’s

log


.
oldlogs


L
og
file is not needed anymore
after
edits have been
persisted into store files, they are
moved
into the
.
oldlogs

directory,
done while the log file is rolled
based on
every
hbase.regionserver.logroll.period

configuration property (set to 60 minutes by default
)


Deleted
after 10 min by default,
HMaster

checks
every min by
default


hadoop

fs


lsr

/
hbase



/
hbase


Splitlog


Split log files in case of splitting region


.
corrupt


Contains
corrupt logs


/
hbase
/table




.
tableinfo


This
includes the table and column family schemas, and
can
be
read, for example, by tools to gain insight on
what the table looks like.


.
tmp


C
ontains
temporary data, and is used, for example, when
the .
tableinfo

is updated.


.
oldlogs


hadoop

fs


lsr

/
hbase



/
hbase
/table




Colfamily
/


Separate
directory for every column family the
table schema
contains;
names of these directories
are the MD5 hash portion of a region name.


.
regioninfo



S
erialized
information of the
HRegionInfo

instance for
the given region


.
tmp


C
reated
on demand, used to hold temporary files, for
example the rewritten files from a compaction, are usually
moved out into the region directory once the process has
completed

Overview


Hadoop

:


A brief
introduction :The need


Some basic elements & Concepts


HBASE


Some essentials


Cluster and Data Layout


Architecture &
HFile


Memory Hierarchy


Additional :


Peek into the
HBase

file system


Peek into the Source code


Conclusion

HBASE


Key Packages


org.apache.hadoop.hbase.catalog



Mainly contains
the files to manage the

ROOT
-

and .META.
catalog

tables, namely MetaReader.java, MetaEditor.java
and RootLocationEditor.java


org.apache.hadoop.hbase.client



Contains the files
that manage the client side actions. As mentioned
previously HBaseAdmin.java is a part of this module
and also contains several other files such as
Put.java, Get.java, MultiPut.java, Scan.java,
Delete.java and similar supporting files for these
operations. HTable.java (also mentioned before) is
also a part of it. HConnection.java in this package
manages all the connections of the client to the
zookeeper and also to the master and
regionserver
.

HBASE


Key Packages


org.apache.hadoop.hbase.io
./
org.apache.hadoop.hbase.
io.hfile



Manages most of the
Input/Output

in the
database at the architectural level. The main
members are HFile.java, which is the main class
containing the implementation details of the
HFile

and its writer, reader etc. The
HFile

and its
description has been provided below. Other
important members include HFileScanner.java which
helps in scanning the
HFile

and also lets us place the
cursor to any specific key/value pair in the
HFile
.


org.apache.hadoop.hbase.master



HMaster.java,
which starts the master on the given host and
HMasterCommandLine.java through which the other
components communicate with the master are the
main components. Various other helping modules such
as LoadBalance.java,
etc

HBASE


Key Packages


org.apache.hadoop.hbase.io.regionserver



This is
the primary module that deals with all the work
that is relevant from the perspective of client. The
HRegionServer.java launches the region server on
all the machines and
HRegionServerCommandLine.java, as previously, is
the interface through which the other modules
communicate the Region Server.

Overview


Hadoop

:


A brief
introduction :The need


Some basic elements & Concepts


HBASE


Some essentials


Cluster and Data Layout


Architecture &
HFile


Memory Hierarchy


Additional :


Peek into the
HBase

file system


Peek into the Source code


Conclusion

Conclusion


Can place
Trailer

of
HFiles

in Flash


Would need proper pointers for the same


Though Skeptical
-

too small sized


Might be beneficial in read
-
many scenario


Storage Hierarchy also worth considering


Can put
restrictions on smaller store files to be written
to Flash


HLog

can be written to Flash ?


But frequent writes and updates also there, involves
log
rolling, syncing, splitting


Performance should improve considering the work done in
Arizona university


Doubled speed by merely placing logs in
Flash


Need to
build indexing
on values other than
Row
-
IDs


Thank you !

Backup

Other Components


Avro

-
A data serialization system for efficient,
cross
-
language
Remote Procedure Call, and persistent data
storage
. (Not much in use yet)


Pig

-

A data flow language and execution environment for
exploring
very large datasets. Pig runs on HDFS and
MapReduce

clusters.


Hive

-

A distributed data warehouse. Hive manages data
stored
in HDFS and provides a query language based on
SQL
(and which is translated by the runtime engine to
MapReduce

jobs) for querying the data.


Chukwa

-

A distributed data collection and analysis
system
.
Chukwa

runs collectors that store data in HDFS,
and
it uses
MapReduce

to produce reports. (At the time of
this
writing,
Chukwa

had only recently graduated from a

contrib
” module in Core to its own subproject.)