Oracle Data Warehouse

cornawakeΛογισμικό & κατασκευή λογ/κού

4 Νοε 2013 (πριν από 3 χρόνια και 7 μήνες)

64 εμφανίσεις

Oracle Data Warehouse

Mit

Big Data
neue

Horizonte
für


das Data Warehouse
ermöglichen




Alfred
Schlaucher
,
Detlef

Schroeder

DATA WAREHOUSE

DATA WAREHOUSE


Big Data Buzz Word
oder

eine

neue


Dimension und
Möglichkeiten




Oracles Technologie zu Speichern von

unstrukturierten und teilstrukturierten Massendaten




Cloudera Framwork




„Connectors
“ in die neue Welt Oracle Loader for

Hadoop und HDFS



Big Data Appliance



Mit Oracle R Enterprise neue

Analyse
-
Horizonte entdecken



Big Data Analysen mit Endeca




Themen

Hive


Hive is an abstraction on top of MapReduce


Allows users to query data in the Hadoop cluster without
knowing Java or MapReduce


Uses the
HiveQL

language


Very similar to SQL


The Hive Interpreter runs on a client machine


Turns
HiveQL

queries into MapReduce jobs


Submits those jobs to the cluster


Note: this does
not

turn the cluster into a relational database
server!


It is still simply running MapReduce jobs


Those jobs are created by the Hive Interpreter

Hive (cont’d)


Sample Hive query:








SELECT
stock.product
,
SUM(orders.purchases
)


FROM stock INNER JOIN orders



ON (
stock.id

=
orders.stock_id
)


WHERE
orders.quarter

= 'Q1'


GROUP BY
stock.product
;

Pig


Pig is an alternative abstraction on top of MapReduce


Uses a dataflow scripting language


Called
PigLatin


The Pig interpreter runs on the client machine


Takes the
PigLatin

script and turns it into a series of MapReduce jobs


Submits those jobs to the cluster


As with Hive, nothing ‘magical’ happens on the cluster


It is still simply running MapReduce jobs

Pig (cont’d)


Sample Pig script:








stock = LOAD '/user/
fred
/stock' AS (id, item);

orders= LOAD '/user/
fred
/orders' AS (id, cost);

grpd

= GROUP orders BY id;

totals = FOREACH
grpd

GENERATE group,
SUM(orders.cost
) AS
t
;

result = JOIN stock BY id, totals BY group;

DUMP result;



Flume and
Sqoop


Flume provides a method to import data into HDFS as it is
generated


Rather than batch
-
processing the data later


For example, log files from a Web server


Sqoop

provides a method to import data from tables in a
relational database into HDFS
-

HIVE


Does this very efficiently via a Map
-
only MapReduce job


Can also ‘go the other way’


Populate database tables from files in HDFS

Oozie


Oozie

allows developers to create a workflow of MapReduce
jobs


Including dependencies between jobs


The
Oozie

server submits the jobs to the server in the correct
sequence

HBase


HBase is ‘the Hadoop database’


A ‘
NoSQL

datastore


Can store massive amounts of data


Gigabytes, terabytes, and even
p
etabytes of data in a table


Scales to provide very high write throughput


Hundreds of thousands of inserts per second


Copes well with sparse data


Tables can have many thousands of columns


Even if most columns are empty for any given row


Has a very constrained access model


Insert a row, retrieve a row, do a full or partial table scan


Only one column (the ‘row key’) is indexed

HBase
vs

Traditional
RDBMSs


RDBMS

HBase

Data layout

Row
-
oriented

Column
-
oriented

Transactions

Yes

Single row only

Query language

SQL

get/put/scan

Security

Authentication/Authorizati
on

TBD

Indexes

On arbitrary columns

Row
-
key only

Max data size

TBs

PB+

Read/write throughput
limits

1000s queries/second

Millions of
queries/second

Kontakt und mehr Informationen

Oracle Data Warehouse Community Mitglied werden

Viele kostenlose Seminare und Events


Download


Server:


www.ORACLEdwh.de




Nächste deutschsprachige Oracle DWH Konferenz:


19. + 20. März 2013 Kassel