Hive_To_Oracle


Create Hive
staging table

to open up a window with its details.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






90



24.

In th
e window that opened up click on the
Code

tab


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






91



In this widow you will see exactly what code was run. If an error occurs this information becomes
quite useful in debugging your transformations.




25.

To check that data that

is now in the oracle database g
o back to the d
esigner
Tab, by going to the
left upper corner of the screen and clicking on
Designer


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






92



26.

Then in the
Models Section at the left bottom of the screen right click on the table LOGS in the
Oracle folder and select
View Data




On the

left

side of the screen a new window will pop up with all of the data
inside tha
t table of the
oracle database.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






93



27.

We can now go ahead and close all of the
open windows in the right side of the screen to prepare
for the next exercise
.




5.6

Using ODI to import Hive Table into
Hive

1.

The
last ETL process we are going to show involves moving data from a

Hive table into another
Hive table.
Although this might sound a bit weird there are a lot of circumstances where you might
Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






94

want to move data
from one Table to another verifying the data for constraint violations of
transforming the data.

Let’s go ahead and create an interface for this type of transaction.
In the
Project tab right click on Interfaces and select
New Interface




2.

As before let’s
give the interface a name. In the new tab that opened on the right side of the screen
type in the following information.


Name:
Hive
_To_Hive



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






95

3.

Next at the bottom of the screen lets go to the
Mapping

table




4.

We will first drag the

Hive
dividends

table as our source
window on the right


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






96



5.

Next we will
drag the
dividends2

table into the target window on the right


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






97



6.

You will be asked if you would like to perform auto mapping. Just click
Yes




7.

All of the mapping
auto complete without a problem.

We now need to specify the

Integration

Knowledge Modules (IKM)

which will be used to perform the integration. In ODI an IKM is the
engine which has all of the code templates for the integration
process;

hence without an
appropriate IKM the integration is
now possible. In the previous section there was only one
appropriate IKM hence it was auto chosen for use. In this case there are multiple possible IKMs so
we need select one. In the left upper corner of the screen in the Designer window right click on
Kno
wledge Modules and select
Import Knowledge Modules
.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






98






8.

A window will pop up which will allow you to import Knowledge Modules. First we need to specify
the folder in which the Knowledge Modules are stored. Fill in the following information.


File
import directory: /
u01/ODI/oracledi/xml
-
reference


Then Press
Enter






Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






99

9.

A list of different Knowledge Modules

should appear in the space below.
Scroll down until you find
the

file(s) to import:


IKM Hive Control Append

Then press
OK





10.

An import report will pop up. Just click
Close


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






100



11.

Let’s now
add a constraint
to the target tables to see what ha
ppens during the data movement. In
the left bottom part of the screen in the models window expand out the dividends2 store buy
pressing the
+

beside it.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






101



12.

In the subsections that appear in the dividends2

you will see a section called Constraints. Right
click on it and select
New
Condition
.





Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






102

13.

On the right side a new window will open allowing you to define the properties of this condition.
We
will set a check condition which will check if the dividends value is too low. Enter the following
information.


Name:
Low_Dividend

Type:
Oracle Data Integrator Condition

Where:
dividends2.dividend>0.01

Message:
Dividend Too Low




14.

We now need to save
our constraint. In the top right corner of the screen click on the
Save

button


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






103



15.

We are now ready to run the Interface. In the top right section of the screen click back to our
interface by click on the
Hive_To_Hive

tab.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






104




16.

Now at the top of the screen we can click the
Play

button to run the interface


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






105



17.

A new window pop up saying

you need to save all of the changes before the interface can be run.
Just click
Yes




18.

A new window will pop up asking for the execution context, just click
OK


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






106



19.

An informational pop will show up telling you the execution has started. Simply click
OK




20.

It is now time to check our constraint. In the left bottom part of the screen (in the Models section)
right click on the dividends2 model then to to the Control section and click on
Check




21.

This check is its own job that must be run; hence a window will pop up asking you to select a
context for the execution. The default option are good so just click
OK

Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






107



22.

An informational window pops up telling to the execution has started. Just click
OK




23.

We can now see all of the rows that failed our check. Again in the right bottom part of the screen
(in the Models section) right click on the dividends2 model go to the Control menu and select
Errors…



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






108

A now tab will pop up on the right side of the scre
en. You will see of the columns which did not
pass the constraint.




24.

This concludes the ODI section of the workshop. Go to the right upper corner of the screen and
click the
X

to close ODI.




25.

Then in the terminal
type
exit

to close it as well.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






109




5.7

Summary



In this exercise you were introduced the Oracle’s integration of ODI with
Hadoop. It is worthy to
note this

integration

with ODI

is only available for the oracle database and only

available

from Oracle. It is
a
custom

extension for ODI
developed by Oracle
to allow users how already have ETL as part of the Data
Warehousing
methodologies to continue using the same tools and procedures with the new Hadoop
technologies.


It is quite important to note that ODI is a very powerful ETL tool whic
h can offer all of the
functionality typically found in an enterprise quality ETL. Although the examples given in this exercise are
quite simple this does not mean the integration of ODI and Hadoop is. All of the power and functionality of
ODI is available

when working with Hadoop. Workflow definition, complex transforms, flow control, multi
source just to name a few of the functionalities of the ODI and inherently feature that can be used
with
Hadoop.


Through this exercise you were introduced to three Kn
owledge Modules of ODI. Reverse
Integration for Hive, Integration into hive and Integration from hive to Oracle. These are not the only
knowledge modules available
,

and we encourage you to review the table available

in section 5.2 of this
document

to get a

better idea of
all the functionality currently available.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






110

6
.

W
ORKING WITH
E
XTERNAL
T
ABLES

6
.1 Introduction to External Tables

Oracle Direct Connector runs on the system where Oracle Database runs. It provides read access to HDFS
from Oracle Database by u
sing external tables.

An

external table

is an Oracle Database object that identifies the location of data outside of the database.
Oracle Database accesses the data by using the metadata provided when the external table was created.
By querying the externa
l tables, users can access data stored in HDFS as if that data were stored in tables
in the database. External tables are often used to stage data to be transformed during a database load.

These are a few ways that you can use Oracle Direct Connector:



Acce
ss any data stored in HDFS files



Access CSV files and Data Pump files generated by Oracle Loader for Hadoop



Load data extracted and transformed by Oracle Data Integrator

Oracle Direct Connector uses the

ORACLE_LOADER

access driver.

6
.2 Overview o
f Hands o
n

Exercise

This Exercise will

involve working

with oracle external tables.
We will create 3 text files

with some data in
each.
We will upload these files into HDFS and connect them to the Oracle database using external
tables.
The data within these files
will

then be
accessible

from

within

the oracle datab
ase
.

In this e
xercise you will:

1.

Create

and query external tables stored in HDFS

NOTE: During this exercise you will be asked to run several scripts. If you would like to see the content of
these scripts t
ype cat
scriptName

and the contents of the script will be displayed in the terminal

6
.3 Configuring External Tables

1.

All of the setup and execution

for this
exercise can be done from the terminal, hence

open a

terminal by double clicking on the
Terminal ic
on

on the desktop.




2.

To get into the folder where the scripts for the
external tables

exercise are, type in the terminal:

Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






111


cd /home/oracle
/exercises/
external

Then press
Enter




3.

This first step in this exercise is to create some random files.
This i
s
just so we have some data in
Hadoop

to load as an external file.
We will create three files called
sales1, sales2 and sales3 with a
single row comprised of 3

number
s

in each file. To create
the files go to the terminal and type:


./createFiles.sh

Then press
Enter




4.

Next we will load these files in HDFS. We have a script for that processes as well. Go to the
terminal and type:


./load
Files.sh

Then press
Enter





5.

Next we will need to create the external table in Oracle.
As the SQL code is quite
lon
g we have
written a script with that code. This being quite important let’s look at what that code looks like.
In
the terminal type:


gedit createTable.sh

Then press
Enter




Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






112



Looking at the code for creating the table you will notice very similar syn
tax for other types of
external tables except for 2 line; the preprocessor and type highlighted in the image below




6.

When you are done evaluating

the code you can close the window by clicking the
X
in the right
upper corner of the window


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






113



7.

Let’s go
ahead now and run that piece of code. In the terminal type:


./createTable.sh

Then press
Enter




8.

Now that the table is create
d

we need to connect that

table with the files we loaded

in
to

HDFS. To
make this connection we must run a Hadoop job which calls oracle loader code. Go to the terminal
and type:


./connect
T
able.sh

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






114



9.

You will be asked to enter a password
for the code to be able to login to the database user. Enter
the following information


[Enter Database Password:]:
tiger

Then Press
Enter


NOTE: No text will appear while you type

Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






115



10.

We can now use SQL from oracle to read those files in HDFS. Lets experiment with that. First we
connect to the database using SQL* Pl
us. Go to the terminal and type:


sqlplus scott/tiger

Then press
Enter



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






116



11.

Now let’s query that data.
Go to the terminal and type:


select * from sales_hdfs_ext_tab;

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






117



They query returns the data that is is all three files.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






118



12.

This
concludes this exercise. You can now exit SQL* Plus. Go to the terminal and type:


exit;

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






119



13.

Then close the terminal. Go the the terminal and type:


exit

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






120



6
.4 Summary

In this chapter
we show how data in HDFS can be queried using standard SQL right from the oracle
database. With the data stored in HDFS all of the parallelism and striping that would naturally occur is
taken full advantage of while at the same time you can use all of the
power and functionality of the Oracle
Database.

When implementing this method in interaction with data
parallel

processing is extremely important when
working with large volumes of data. When using external tables,
consider

enable parallel query with this

SQL command:

ALTER SESSION ENABLE PARALLEL QUERY;

Before loading data into Oracle Database from the external files created by Oracle Direct Connector,
enable parallel DDL:

ALTER SESSION ENABLE PARALLEL DDL;

Before inserting data into an existing database
table, enable parallel DML with this SQL command:

ALTER SESSION ENABLE PARALLEL DML;

Hints such as

APPEND

and

PQ_DISTRIBUTE

also improve performance when inserting data.



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






121

7.

W
ORKING WITH
M
AHOUT

7.1 Introduction to Mahout

Apache Mahout is an Apache project

to produce free implementations of distributed or otherwise scalable
machine learning algorith
ms on the Hadoop platform.

Mahout is a work in progress; the number of
implemented
algorithms has grown quickly,

but there are still various algorithms missing.

While Mahout's core algorithms for clustering, classification and batch based collaborative filtering are
implemented on top of Apache Hadoop using the map/reduce paradigm, it does not restrict contributions to
Hadoop based implementations. Contributions t
hat run on a single node or on a non
-
Hadoop cluster are
also welcomed. For example, the 'Taste' collaborative
-
filtering recommender component of Mahout was
originally a separate project, and can run stand
-
alone without Hadoop
.

Currently Mahout supports mai
nly four use cases: Recommendation mining takes users' behaviour and
from that tries to find items users might like. Clustering takes e.g. text documents and groups them into
groups of topically related documents. Classification learns from existing catego
rized documents what
documents of a specific category look like and is able to assign unlabelled documents to the (hopefully)
correct category. Frequent item

set mining takes a set of item groups (terms in a query session, shopping
cart content) and identi
fies, which individual items usually appear together.

7.2 Overview of Hands on Exercise

In this exercise you will be using the K
-
means algorithm to cluster data using mahout’s implementation of
K
-
means. To give a bit of background on K
-
means it is an algor
ithm which clusters data, and despite its
simplistic nature it can be quite powerful. The algorithm takes two inputs, a series of input values (v) and
the number of groups those values need to be split into (k). The algorithm first picks randomly k centres

to
represent the centre of each group, than continuously moves those centres so that the distance from the
centre to every point in that group is as small as possible. Once the centres are at a point where any
movement would just increase the distance to
all of the points the algorithm stops. This is a great algorithm
to find pattern in data where you have no information what patterns are in the data. Given the power the K
-
means algorithm is quite expensive computationally hence using a massively distribut
ed computation
cluster such as Hadoop would offer great advantage when dealing with very large data sets. This is exactly
what we will be experimenting with in this exercise

In this exercise you will

1.

Use mahout to cluster a large data set

2.

Use the graphic
library in Java to visualize a mahout k
-
means cluster

7.3 Clustering with K
-
means

1.

All of the setup and execution

for this
exercise can be done from the terminal, hence

open a

terminal by double clicking on the
Terminal icon

on the desktop.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






122



2.

To get into the folder where the scripts for the Hive exercise are, type in the terminal:


cd /home/oracle
/exercises/mahout

Then press
Enter




3.

To get an idea of what our forum file looks like let’s look at the first couple of rows. In the terminal
type:


head

n 1 synthetic_control.data

Then press
Enter




As you can see on the screen all there is in the file are random data points. It is within the field of
that we would like to find patterns.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






123



4.

The first step in analyzing this data is loading it
into the HDFS. Let’s go ahead and do that. Go to
the terminal and type:


./loadData.sh

Then press
Enter




5.

Now that the data is loaded we can run mahout against the data. This is an example we are
running were the data is already in vector form and a dist
ance function has already been compiled
into the example. When clustering your own data, the command line for running the clustering
should include the distance function written and compiled in java. Go to the terminal and type:


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






124



This would be an excellent time to get a cup of coffee. The clustering is quite computationally
intensive and execution should take a couple of minutes to execute.


6.

Once you get back the command prompt the clustering is done, but the results are stored in

binary
format inside Hadoop. We need to first bring all of the results out of Hadoop and then convert the
data from binary format to text format. We have a script which will perform both tasks. Let’s run that
script go to the terminal and type:


./extract
Data.sh

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






125



7.

We can how go ahead and look at the results of the clustering. We will look at the text output of the
results. Go to the terminal and type:


gedit Clusters

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






126



The output is not very user friendly but there are

several indicators to look for as followed:


n=
the number of clusters

c=
the centers of each one of the clusters

r=
the
radius of the circle which defines the cluster


Points
= the data points in each cluster


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






127



8.

Once you are done evaluating the results you can click the
X

in the right upper corner of the screen
to close the window.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






128



9.

Despite the highlighting data points are not very representative. Mahout also has some graphing
functions for simple data points.
We will run a much simpler clustering with points that can be
displayed on a X,Y plane to visually see the results. Go to the terminal and type:


./displayClusters

Then press
Enter



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






129



A new window will pop up with a visual display of a K
-
means cluster. T
he black squares represent
data points the red circles define the clusters. The yellow and green lines represent the error
margin for each cluster.


10.

Once you are done evaluating the image you can click the
X

in the right upper corner of the
window to close

it.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






130



11.

This concludes our mahout exercise. You can now close the terminal window. Go to the terminal
and type:


exit

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






131


7.4 Summary

In this exercise you were introduced to the

K
-
mean clustering algorithm and how to run the algorithm using
mahout and inherently on a Hadoop cluster. It is important to note that Mahout does not only focus on k
-
mean but also has many different algorithms in the categories of Clustering, Classificat
ion, Pattern Mining,
Regression, Dimension reduction, Evolutionary Algorithms, Recommendation/ Collaboration Filtering and
Vector Similarity. Most of these algorithms have special variants which are optimized to run on a massively
distributed infrastructur
e (Hadoop) to allow for rapid results on very large data sets.

Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






132

8
.

P
ROGRAMMING WITH
R

8
.1 Introduction t
o Enterprise R

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar
to the S language and
environment which was developed at Bell Laboratories (formerly AT&T, now Lucent
Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S.
There are some important differences, but much code written for S runs un
altered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time
-
series
analysis,

classification, clustering, etc.
) and graphical techniques, and is highly extensible. The S language
is often the
vehicle of choice for research in statistical methodology, and R provides an Open Source route
to participation in that activity.

One of R's strengths is the ease with which well
-
designed publication
-
quality plots can be produced,
including mathematical sy
mbols and formulae where needed. Great care has been taken over the defaults
for the minor design choices in graphics, but the user retains full control.

Oracle R Enterprise integrates the open
-
source R statistical environment and language with Oracle
Data
base 11g, Exadata, Big Data Appliance, and Hadoop massively scalable computing. Oracle R
Enterprise delivers enterprise
-
level advanced analytics based on the R environment.

Oracle R Enterprise allows analysts and statisticians to leverage existing R applic
ations and use the R
client directly against data stored in Oracle Database 11g

vastly increasing scalability, performance and
security. The combination of Oracle Database 11g and R delivers an enterprise
-
ready, deeply integrated
environment for advanced a
nalytics. Data analysts can also take advantage of analytical sandboxes, where
they can analyze data and develop R scripts for deployment while results stay managed inside Oracle
Database.

As an embedded component of the RDBMS, Oracle R Enterprise eliminat
es R’s memory constraints since
it can work on data directly in the database. Oracle R Enterprise leverages Oracle’s in
-
database analytics
and scales R for high
-
performance in Exadata and the Big Data Appliance. Being part of the Oracle
ecosystem, ORE enab
les execution of R scripts in the database to support enterprise production
applications and OBIEE dashboards, both for structured results and graphics. Since it’s R, we’re able to
leverage the latest R algorithms and contributed packages.

Oracle R Enterpr
ise users not only can build models using any of the data mining algorithms in the CRAN
task view for machine learning, but also leverage in
-
database implementations for predictions (e.g.,
stepwise regression, GLM, SVM), attribute selection, clustering, fe
ature extraction via non
-
negative matrix
factorization, association rules, and anomaly detection.

8
.2 Overview o
f Hands o
n Exercise

In t
his exercise you will

be introduced the
R programming language as well as
the enhancements Oracle
has brought to the pro
gramming language. Limitations were all data must be kept in system memory are
now gone as you can save and load data from and to both the Oracle database and HDFS.
To exemplify
the uses of R will be doing K
-
means clustering again, as in Exercise 7, this t
ime using the R programming
language.
If you would like a review of
K
-
means please see the introduction to section 7.

In this exercise you will:

1.

Generate a set of random data points

2.

Save the data in both the Oracle database and HDFS

Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






133

3.

View the data in Oracle

and HDFS

4.

Load the data from Oracle back into R

5.

Perform K
-
means clustering on the data points

6.

View the results

8
.3 Talking data from R and inserting it into database

1.

All of the setup and execution

for this
exercise can be done from the terminal, hence

open

a

terminal by double clicking on the
Terminal icon

on the desktop.




2.

To get into the folder where the scripts for the
R

exercise
s

are, type in the terminal:


cd /home/oracle
/exercises/
R

Then press
Enter





3.

To
work with R you can write scripts for the

interpreter to execute or you can use the interactive
shell environment. To get a more hand
s on experience with

R we will use the interactive shell.
To
start the interactive shell go to the terminal and type:


R

Then press
Enter




4.

During
the login
process many

different library

l
oad which

extend functionality

of

R. If a particular
library

is not loaded automatically one can load it manually after login.
We will need to load a library
to interface with HDFS so let’s load that now. Go to the R shell a
nd type:

Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






134

l
ib
ra
r
y(ORHC)

Then press
Enter





5.

Now let’s go ahead

and generate

some

pseudo

random data

points
so

have some data to play
with
.

We will generate 2D data points so we can easily visualize the data.
Go to the R terminal
and
type:


myD
ataPoints=
rbind(matrix(rnorm(100, mean=0, sd=0.3),ncol=2)

,matrix(rnorm(100, mean=1, sd=0.3), ncol=2))

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






135



Now the variable
my
D
ataPoints

will have
some

data points in it.


6.

To be able to save data

into the database or HDF
S you need to have the data
in

columns
(
as we
already do
) and you

also need to have

each of the

the columns labeled
. This is because column
names are required within

a database

to be able to identify the columns
. Let’s go ahead and label
the columns

x and y
. Go to

the R terminal and
type:


colnames(
myD
ataPoints
)

<
-

c(“x”, “y”)

Then press
Enter




Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






136



7.

We can now create a data frame which will load the data into the Oracle Database. Go to the
terminal and type:


ore.create(as.data.frame(
myD
ataPoints, optional = TRUE),

table=”
DATA_
P
OINTS

)

Then press
Enter



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






137



8.

If
required

we can even load this data into HDFS. Let’s go ahead and do that.
Go to the R terminal
and type:


h
dfs.put(DATA_POINTS,dfs.name=’data_points’)

Then press
Enter



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






138



9.

Now that we have loaded the data into both the database

and HDFS lets exit R and look at that
d
ata. Go to the R shell and type:


q()

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






139



10.

You will be asked if you want to save
workspace image. Type:


n

Then press
Enter


Note: whe
n typing n the information typed

does not appear on the screen.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






140



11.

At this point all data and calculated data would be wiped from the memory and hence lost in class
R. With R Enterprise Edition we saved our data in the database, so let’s go and query that data. Go
to the terminal and type:


./queryDB.sh

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






141



On the screen you will see the table displayed which contains our data points.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






142



12.

We can also look at the data we stored inside HDFS. Go to the terminal and type:


./queryHDFS.sh

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






143



Again on the screen you will see all of the data points displayed.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






144



As you can see all of the work done in R can now be exported to the database or HDFS for further
processing based on business needs.

8
.4 Taking data from database and using it in R

and clustering

1.

Data can not only be pushed out to the database but
it

can also be retrieved from the database or
HDFS
to be used within

R.
Let’s see how that is done. First let

s go back into the R environment.
Go to the terminal and type:


R

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






145



2.

Let’s now go ahead and load the data from the Oracle database. Go to the R shell and type:


myData=ore.pull(DATA_POINTS)

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






146



3.

Now that we have our data inside the database we can manipulate the data. Let do k
-
means
clustering on the data.
Go the the R shell and type:


c
l <
-

kmeans(myData, 2)

Then Press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






147



4.

The clustering is now done, but displaying the data in text format i
t not very interesting. Let’s graph
the data. Go to the R terminal and type
:


p
lot(myData, col = cl$cluster)

Then press
Enter



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






148



5.

A new window pops up with the data. The two color

(red and black) differentiate

the two clusters
we asked the algorithm to find.
We can even see where the cluster centers are. Go back to the R
shell. The terminal might hidden behind the graph move the windows around until you find the
terminal then type:


points(cl$centers, col=1:2,
pch = 8, cex=2)

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






149



When you go back to the graph you will see the centers marked with a * and points marked with
circles, data clustered
for raw random data using the K
-
means algorithm.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






150



6.

When you are done evaluating the image you can click on the
X

in the right upper corner of the
window.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






151




7.

You can also close the R terminal by going the R shell and typing:


q()

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






152



8.

When asked if you want to save workspace image go to th
e terminal and type:


n

Then Press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






153



9.

This concludes this exercise. You can now go ahead and close the terminal. Go to the terminal and
type:


exit

Then press Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






154


8
.5

Summary

In this exercise you were introduced to the R programming language

and how to do clustering using the
programming language.
You a
lso saw one of the advantages of Oracle R enterprise Edition
where you can
save your results into the Oracle database as well as extract data from the database for further
calculations. Oracle
R Enterprise edition also has a small set of function which can be run on data in the
database directly in the database. This enables the user to use very large data sets which would not if into
the normal memory of R.

Oracle R Enterprise provides
these c
ollections of functions:



ore.corr



ore.crosstab



ore.extend



ore.freq



ore.rank



ore.sort



ore.summary



ore.univariate




Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






155

9
.

O
RACLE
N
O
SQL

D
ATABASE

9
.1 Introduction To NoSQL

Oracle NoSQL Database provides multi
-
terabyte distributed key/value pair storage that
offers scalable
throughput and performance. That is, it services network requests to store and retrieve data which is
organized into key
-
value pairs. Oracle NoSQL Database services these types of data requests with a
latency, throughput, and data consisten
cy that is predictable based on how the store is configured.

Oracle NoSQL Database offers full Create, Read, Update and Delete (CRUD) operations with adjustable
durability guarantees. Oracle NoSQL Database is designed to be highly available, with excellent

throughput
and latency, while requiring minimal administrative interaction.

Oracle NoSQL Database provides performance scalability. If you require better performance, you use more
hardware. If your performance requirements are not very steep, you can purc
hase and manage fewer
hardware resources.

Oracle NoSQL Database is meant for any application that requires network
-
accessible key
-
value data with
user
-
definable read/write performance levels. The typical application is a web application which is servicing
requests across the traditional three
-
tier architecture: web server, application server, and back
-
end
database. In this configuration, Oracle NoSQL Database is meant to be installed behind the application
server, causing it to either take the place of the
back
-
end database, or work alongside it. To make use of
Oracle NoSQL Database, code must be written (using Java) that runs on the application server.

9
.2 Overview of Hands o
n Exercise

In this exercise you will be experimenting with the Oracle NoSQL databas
e. Most of the exercises will have
you look at pre written java code and the compiling and running that code.
Ensure you und
erstand the
code and all of its nuances as it is what makes up the NoSQL database interface.

If you would like to
understand all of

the functions that are available there is a javadoc available on the Oracle web

In this exercise you will

1.

Insert and retrieve a simple key value pair from the NoSQL database

2.

Experiment with the multiget

functionality to retrieve multiple values at the same time

3.

Integrate NoSQL with Hadoop code to do word count on data in the NoSQL database

9
.3 Insert,
and retrieve Key



Value pair
s

1.

All of the setup and execution

for this
exercise can be done from the te
rminal, hence

open a

terminal by double clicking on the
Terminal icon

on the desktop.



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






156


2.

To get into the folder where the scripts for the R exercises are, type in the terminal:


cd /home/oracle
/exercises/
noSQL

Then press
Enter





3.

Before we do anything with the noSQL database we must first start it. So let go ahead and do that.
Go to the terminal and type:


./startNoSQL.sh

Then press
Enter




4.

To check if the database is up and running we can do a ping on the database. Let’s do
that. Go to
the terminal and type:


./pingNoSQL.sh

Then press
Enter




You will see Status: RUNNING displayed within the text. This show the database is running.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






157



5.

Oracle NoSQL database uses a Java interface to interact with the data.
This is a dedicate
d
java

API which will let you insert update delete and query data in the Key


Value store that is the
NoSQL database.
Let
’s

look at a very simple example of java code where we insert a Key
-
Value
pair into the database and then retrieve it. Go to the termi
nal and type:


gedit Hello.java

Then press
Enter




Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






158

A new window will pop up with the code. In this

code there are a couple of thing to be noted
.

We
see the con
fi
g variable which holds our connection string and the store variable which

is

our
connection
f
actory
to the database. They are the initiation variable for the Key
-
Value Store and

are

highligh
ted in yellow. Next we define 2

varia
ble of type Key and Value, they

will serve as our
payload to be inserted. These are highlighted in green. Next we have hig
hlighted in purple the
actual insert command. Highlighted in blue is the retrieve command for getting data out of the
database.




6.

When you are done evaluating the code press the
X

in the right upper corner of the window to
close it.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






159



7.

Let
’s

go ahead and compile that code. Go to the terminal and type:


javac Hello.java

Then press
Enter



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






160



8.

Now that the code is complied l
et’s run it. Go to the terminal and type:


java Hello

Then press
Enter




Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






161

You will see printed on the screen
Hello Big
Data World

which
is the k
ey, and the value we inserted
in the database.




9.

Oracle NoSQL database has the possibility of having a major and minor component to the key.
This feature can
be very useful when trying to group and retrieve multiple items at the

same time
from the database.
In the next code we have 2

major components to the key (Mike and Dave) and
each major component has a minor component (Question and Answer). We will insert a value for
each key but we will use a multiget function to retrieve a
ll of the values regardless of the minor
component of the key for Mike and completely ignore Dave. Let’s see what that code looks like. Go
to the terminal and type:


gedit Keys.java

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






162



10.

A new window will pop up with the code. If you
scroll to the bottom with will remark the following
piece of code. Highlighted in purple as the insertion calls which will add the data to the database.
The retrieval of multiple records is highlighted in blue, and the green show
s

the display of the
retrie
ved data.
Do note there were 4 Key
-
Value pairs inserted in to the database.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






163



11.

When you are done evaluating the code press the
X

in the right upper corner of the window to
close it.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






164



12.

Let go ahead and compile that code. Go to the terminal and type:


ja
vac

Keys
.java

Then press
Enter



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






165



13.

Now that the code is complied
let’s

run it. Go to the terminal and type:


java
Keys

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






166



You will see
the 2 v
alues that are stored under the Mike major key displayed on the screen
,

a
nd

no
data points for Dave

major key
.

Major and minor parts of the key can actually be composed of
multiple
string and further filtering can

b
e done. This is left up the participants

to experiment with.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






167



14.

The potential of a Key
-
Value store grows significantly when integrated with the power of a Hadoop
and distributed computing.
Oracle NoSQL database can be used as a source and target for the
data used by and produced by NoSQL. Let’s look at a modified examp
le of word count run in
Hadoop only this time we will coun
t the number of Values under the

Major component of the key in
the

NoSQL
database. To see the code go to

the terminal and type:


gedit Hadoop.java

Then press
Enter



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






168



The code see is very similar
to the Word Count seen in the first section of the workshops. The
re
are

only difference

2 differences. The first

(highlighted in yellow) is the retrieval of data from the
NoSQL database rather than a flat file.




Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






169

The second can be seen of scroll down int
o the run function and notice the InputFormatClass is
now KVInputFormat




15.

When you are done evaluating the code press the
X

in the right upper corner of the window to
close it.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






170



16.

Let
’s

go ahead and

run

that code.
We will need to go through
the entire procedure of
the f
irst
exercise where we compile the code, create a jar then execute it on the Hadoop cluster. We have
written a script which will do all of that for us.

Let’s run that script, g
o to the terminal and type


./runHadoop.sh

Then pr
ess
Enter



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






171



17.

You will see a Hadoop job being executed with all of the terminal display it comes with.

Once the
execution is done

it is time to see
results.

We will just cat the results directly from HDFS.

Go to the
terminal and type


./
view
H
adoop
.sh

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






172



You will see, displayed on the screen a word count based on the major component of keys in the
NoSQL database. In the p
revious exercises we inserted 2

pieces of data under the major key Dave
and Mike. We also inserted a hello key for t
he first exercise.
T
his

is exactly the data the word count
displays.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






173



18.

That concludes our exercises on NoSQL database. It is time to shutdown our NoSQL database. Go
to the terminal and type:


./stopNoSQL

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






174



19.

We can now close our terminal

window. Go to the terminal and type:


exit

Then press
Enter


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






175





9
.4 Summary


In this exercise you were introduced to Oracle’s NoSQL database. You
saw

how to insert and retrieve key
-
value pairs

as well as the mutiget function where multiple values could
be retrieved under the same major
component of a key.
The last example showed
how

a

NoSQL database can be used as a source for a
Hadoop job and how the two technologies can be integrated.

It is important to note
here the differences between the NoSQL

database and a traditional RDBMS. With
relational data the queries performed
are much more powerful and more complex while NoSQL simply
stores and retrieves values from a specific key. Give
n

that

simplicity

in
NoSQL
storage type
, it

has a
significant perf
ormance
and scaling advantage
.

A
NoSQL database can store petabytes worth of
information in a distributed cluster and still maintain very good performance on data interaction at a much
low cost per megabyte of data.
NoSQL has many uses and has been impleme
nt
ed

successfully in many
different circumstances but at the same time it does not mimic or replace the use of a traditional RDBMS.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






176

A
PPENDIX
A

A.1 Setup of a Hive Data Store




1.

Once we

are connected to ODI we need to setup our models; the logical and physical definition of
our data sources and targets. To start off, at the top of the screen click on
Topology
.




2.

Next in the left menu make sure you are on the
Physical Architecture

tab a
nd expand the
Technologies

list

Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






177



3.

In the expanded list find the folder
Hive

and expand it




Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






178

4.

In this folder we need to create a new Data Server. Right click on the
Hive

Technology and select
New Data Server




5.

A new tab will open on the right side of the screen. Here You can define all of the properties of this
data server. Enter the following details:


Name:
Hive
Server


Then click on the
JDBC

tab in the left menu


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






179



6.

On the right of the JDBC Driver field click on the
Magnifying Glass

to select the JDBC Driver




7.

A new Window will pop up which will allow you to select from a list of drivers. Click on the
Down
Array

to see the list

Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






180




8.

For the list that appears select
Apache Hive JDBC Driver
.




9.

Now click
OK

to close the window


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






181



10.

Back at the main window enter the following information


JDBC Url:
jdbc:hive://bigdatalite.us.oracle.com:10000/default




11.

We need to set some Hive specific variable. On the menu on the left go now to the tab
Flexfields


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






182



12.

It the Flexfields tab
uncheck

the Default check box and write the following information:


Value:
thrift://localhost:10000


Don’t forget to press
Enter

when done typing to set the variable




13.

It is now time to test to ensure we set everything up correctly. In the left upper corner of the right
windows click on

Test Connection


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






183




14.

A window will pop up asking if you would like to save you data before testing. Click
OK




15.

An informational message will pop up asking to register a physical schema. We can ignore this
message as that will be our next step. Just click
OK




16.

You need to s
elect an agent to use for the test. Leave the default


Physical Agent:
Local(No Agent)


Then click
Test


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






184



17.

A window should pop up saying
Successful Connection
.
Click
OK




If any other message is displayed please ask for assistance to debug. It is
critical for the entirety of
this exercise this connection is fully functional.


18.

Now in the menu on the left side of the screen, in the
Hive

folder, there should now be a Physical
server created called
Hiv
e Server
. Right click on it and select
New Physical

Schema
.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






185



19.

A new tab will again open on the right side of the screen to enable you to define the details of the
Physical Schema. Enter the following details.


Schema (Schema):
default

Schema (Work Schema):
default



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






186


20.

Then click
Save All

in the left uppe
r part of the screen




21.

A warning will appear about No Context specified. This again will be the next step we undertake.
Just click
OK




22.

We now need to expand the
Logical Architecture

tab in the left menu. Toward the left bottom of
the screen you will see
Logical Architecture

tab click on it.

Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






187



23.

In the Logical Architecture tab you will need to again find the
Hive

folder and click on the
+

to
expand it.

Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






188



24.

Now to create the logical store, right click on the Hive Folder and select
New Logical Schema
.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






189



25.

In the new window that open on the right of the screen enter the following information:


Name:
Hive Store

Context:
Global

Physical Schemas:
Hive Server.
defau
lt


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






190



26.

This should setup the Hive data store to enable us to move data into and out of Hive with ODI. We
now need to save all of the changes we made. In the left upper corner of the screen click on the
Save All

button.




27.

We can close all of the tabs we have opened on the right side of the screen. This will help in
reducing the clutter. Click on the
X

for all of the windows.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






191



We would theoretically need to repeat steps 7


29 for each of the different type of data store.

As
the procedure is almost the same a flat file source and an Oracle database target have already
been setup for you. This is to reduce the number of steps in this exercise. For details on how use
flat files and Oracle database with ODI please see the exc
ellent ODI tutorials offered by the
Oracle
by Example Tutorials

found at
http://www.oracle.com/technetwork/tutorials/index.html
.


28.

We now need to go to the Designer Tab in the left menu to perform the rest of our exercise. Near
the top of the screen on the left side click on the
Designer

tab.




29.

Near the bottom of the screen on the left side there is a
Models

tab click on it.


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






192



30.

You will notice there is already a File and Oracle mode created for you. These were pre
-
created as
per the note at step 29. Let’s now create a model for the Hive data store we just created. In the
middle of the screen in the right panel there is a folder i
con next to the work Models. Click on the
Folder

icon and select
New Model…



Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






193



31.

In the new tab that appears on the right side enter the following information:


Name:
Hive

Code:
HIVE

Technology:
Hive

Logical Schema:
Hive Store


Big Data Workshop



http://www.oracle
-
developer
-
days.com


Copyright ©
2012
, Oracle and/or its affiliates. All rights reserved






194



32.

We can now go up the left

upper corner of the screen and save this Model by clicking on the
Save
All

icon.