Benchmark Financial Brokers

clutteredreverandData Management

Oct 31, 2013 (3 years and 7 months ago)

150 views

UNIVERSITY OF HOUSTO
N
-
CLEAR LAKE

Ben
chmark Financial Brokers

A Case Study on Building a Data Warehouse


Prepared by:

Terry Lee

Tran Ngo

Sandeep Udhani

Navin Negi


Prepared for:

Dr. Rob

ISAM 5332


5/5/2010







University of Houston
-
Clear

Lake

ISAM 5332

Dr. Rob


Benchmark Financial Brokers: A Case Study on Building a Data Warehouse


Terry Lee

Tran Ngo

Sandeep Udhani

Navin Negi



Abstract


The purpose of this paper is to discuss how we
approached building a data warehouse for
Bench
mark Financial Brokers (Benchmark),
starting with taking the wrong approach and
ending up with a functioning project.


Introduction


Benchmark is involved in the securities trading
business and as such has to maintain multiple
transactional systems. These systems are
adequate to run the day to day operations of
the business, however they do not allow for
timely access to strategic info
rmation. This is
information that will help the owner and
managers maintain the long
-
term health of the
organization. The purpose of
a data warehouse
is to collect

desired data from

various
transactional systems, convert it

into a common
format,

cleanse it
,

and allow it to be queried for
useful

strategic

information in the future. While
the concept of a data warehouse sounds quite
simple, in reality it is almost completely
opposite. Data must be continually collected,
transformed, and cleansed from each of the
different transactional systems in order to k
eep
the warehouse properly maintained.








Case Study: Benchmark Financial Brokers


1
.

Business Scenario

Benchmark started with a single location in
Texas and has now expanded to forty five
locations throughout Texas, Louisiana, and
Florida. The busines
s owner plans on expanding
to include more locations throughout the
country. The problem that the owner and the
mana
gers of this business face

is that with their
continual expansion it is becoming increas
ingly
difficult for them to conduct timely visits to

each of the locations in order

to monitor
employee performance, track revenue, and
ensure sound financial advice is being offered to
the customer. Benchmark has operational
systems that allow the advisors to monitor the
constantly fluctuating stock market
s, as well as
transactional systems that allow them to buy or
sell various financial instruments for their
customers. These various systems prevent a
challenge when it comes to generating useful
reports that provide the information that
the
owner is lookin
g for. The owner is convinced
that if he can find a way to extract the data that
he needs from each of these systems he will be
able to maintain greater control over his
business.


2
.

Why a Data Warehouse?

Benchmark’s owner needs better strategic
informati
on about his business. This is what a
data warehouse i
s perfect for. The concept of a
data warehouse is

to be able to collect data
from multiple operational and transactional
systems, combine them in some manner, store
them in a central location, and provi
de Online
Analytical Processing abilities in the future. This
is exactly what the owner of Benchmark has
been looking for. He needs the ability to collect

and query this data in order to obtain

report
s
that will provide him with the strategic
information t
hat he needs. This can include
reports such as r
evenue for each location by
state, revenue
by investment type, customers
who have money in high risk investment, and
employees who may be purposely selling these
investments. The a
bility to q
uery this kind of

system online from his office is the perfect
solution to his problem.


3. Methodology

Beginning this proje
ct as a group, we did not
have a

very

clear picture of how we needed to
go about designing a data warehouse. We
began with designing what was essenti
ally a
transactional database. One of our primary
concerns when we started the process was how
to track the price of the securities
that
are being
sold, as the stock market is consta
ntly
fluctuating. Even though it made complete
sense to us, our initial ap
proach proved to be
wrong. A data warehouse is not meant to be a
transactional database; it is a database that
warehouses historical data about the
organization. This is important because the
stored data will be used strictly for designing
the reports that

management needs to help
them make strategic decisions about the future.
Approaching the project a second time we
realized that instead of trying to figure out how
to collect the data, we needed to figure out
how to present the data that we already had.

W
hen designing the data warehouse, probably
the best method to employ is to approach it
from a reporting aspect. Approaching the
project this way means that you must be
thinking about the information that
is
ultimately desired by the end user. Once we
reali
zed that we needed to look at the project
from this aspect, it led us to completely rethink
our strategy, and led us to completely redesign
our dimensional models.


4. Dimensional Modeling

and defining data
structure


Dimensional modeling
incorporate
s

the
business dimensions into the logical data
model.

(Ponniah 206)

After defining
require
ments from the business needs, data
structures are designed within the logical
model.
When
defining
requirements, users may
not be able to

precisely describe what the
y
want in a data warehouse, they can provide you
with very important insights into ho
w they think
about the business, as well as

tell us what
measurement un
its are important for them.
Managers think of the busin
ess in terms of how
they want to measure it.
These measurements
are the facts that indicate to the users how
their departments are doing

in fulfilling their
objectives. We can say business m
etrics or facts
are what

managers want to analyze in order to
know the current situation of their business
.
The
se are important when making decisions
that will affect the future of their organization.
When designing dimensions for a data
warehouse it is extremely important to pay
attention the hierarchies that are contained
within each of the dimensions. These
d
ime
nsional hierarchies are

the

various

levels of
detail contained within
a business dimension.
Managers can use the dimensional hierarchies
as the paths for drilling down or rolling up in
analysis.

Dimensional modeling is the technique
that is in designing a
data warehouse. Many
software
vendors have expanded their
modeling case tools to include dim
ensional
modeling. Modern software is very useful when
designing
fact tables, dimension tables, and
establish
ing

t
he relationships between them
.

When you have finis
hed modeling the
dimensions and establishing the relationships,
you end up with a database schema
.

The two
types of schemas that are generally used in a
data warehouse are the STAR schema and the
snowflake schema.

The STAR schema is a simple
database schem
a f
or data design using

a
dimensional model.

This schema consists of a
fact table in the center that is directly related to
the dimension tables that surround it.

Although
the STAR schema is a relational model
, it is not
a normalized model.
The snowflake m
ethod
normalizes
the dimension tables in a STAR
schema
.



As mentioned earlier, it is very
important to employ the correct approach
w
hen designing the schema for a

data
warehouse. This is where we made our first
mistake. As a result of this, w
e have two
di
fferent database schemas
.

The

first schema
that we designed was a complex snowflake
schema which is depicted in figure 1.1
.



Figure 1.1


When the dimensions in a STAR schema are
completely normalized the resulting structure
resembles a snowflake with
the fact table in the
middle
.
In the ca
se of Benchmark
, the
most
important fact to
analyze is the sales
commission which is the revenue for the
company. The t
ransaction

table

is
the fact table
which contains

commission as an attribute.
Transa
ctions are ana
lyzed base on

dimensions
such as Customer,

Employe
e, Investment, Time,
Date, and the Commission collected.
At first, we
approached this project from the wrong
direction, and because of that we
normalize
d

the
tables

in our

schema.
This was done
because we w
ere thinking in terms of a
transactional system where we would need to
track
changes

in the price of stocks, bonds, and
other marketable securities.

We store
d detail
information about

employee
s

such as position,

and

salary.
After we realized that we weren’
t
looking at project correctly
, we recognize
d that
this information was not
necessary
for
th
e data
warehouse. Although a data warehouse is
based on

a relat
ional database like one that
would be used in an operational system what
we were

building
was a decis
ion
-
support
system. The important component to the entire
project is the ability to track and analyze the
commission. We do not need to worry about

the price of investments or employee salary.
In
order to correct our
error, we redesign
ed

our
d
atabase schem
a and arrived at a Star schema
which is depicted in figure 1.2.

In this schema,
we have
the
Transaction as the fact table with
four dimension tables: Employee, Customer,
Investment and Date_Time.

We kept

thes
e
dimensions because we needed

to analyze
commi
ssion
by state,
by em
ployee, by
customer,
by investment type
, and by
investment risk class.


Figure 1.2



From
our new

St
ar schema
, we

were able to
define the data format that we need for the
warehouse.
We define
d

table name, attribute
name, data type of each attribute in each table,

and relationship among tables. From
Benchmark’s
operational system, we extract
ed
data into an excel file that is depicted in figure
1.3. Many of the fields that were extracted are
nece
ssary for an operational system
,

but were
not needed in the data warehouse. This is
where cleansing data becomes important. In
order to keep the warehouse efficient,

we used
Excel to remove
the extran
eous data before it
was

imported.

Afte
r cleansing the da
ta
, we had
attributes that were important to the structure
of our system. An example of the cleansed data
is found in figure 1.4. When we were satisfied
that are data was formatted and cleansed
correctly, we moved to our next step which was
to implement ou
r database schema in Microsoft
Access. This schema is displayed in figure 1.5.


We used Access to defin
e our relationships
and

make sure that the system functioned before
importing the database into SQL Server 2008.







Figure 1.3




Figur
e 1.4




Figure 1.5




5. Implementation in SQL Server 2008

In order to browse the data that is contained
within the data warehouse, you must design a
data structure called a cube.
Constructing the
cube is done in SQL Server Analysis Services.
A
cube is comprised of the fact table and all of the
data that is directly related to it. The cube
organizes the data into a format that can be

easily queried, rolled up,
drilled down
, and
sliced and diced

based on the measures and
hierarchies that are appli
cable to your particular
data set.
Importing a database from Access to
SQL Server is supposed to be an easy process,
but trust us it is anything but. Trying to figure
out how to get the program to accept your data
and process it turned out to be one of the

biggest challenges of the whole project.
According to Scott Cameron’s SQL Server 2008
Analysis Services: Step by Step, if you
have an
existing
relational database such as

Access,
T
eradata, Oracle, IBM DB2, as well as some
others
, you should be able to sel
ect the
appropriate driver and connect to your data
source without any difficulties.(Cameron 39)

If
only this were true. Due to not having sufficient
security clearance to upload our database onto
the University of Houston
-
Clear Lake (UHCL)
server, we dec
ided to use a personal laptop to
run the software. Operating system
compatibility was one of the first problems that
we encountered. The solution to this problem
was to download the applicable service pack
from Microsoft Update. Once the software was
insta
lled we attempted to import the database
from Access
. The next attempt to import the
database resulted in being able to import the
data, but this time we could not build or deploy
our cube. Do not get frustrated when you
encounter this problem. We have cho
sen not to
outline the steps that we took to get the
program to function correctly as they will be
different for each application. Once we had the
database imported and functioning properly,
we commenced building our cube. The ability to
roll up, and drill

down your data is based on the
hierarchies that are defined within your
dimensions. This very important step is
depicted in figure 1.6.



Figure 1.6


Without defining the hierarchies in each of your
dimensions, you will not have access to all of
the data
. When a manager is looking for
information, he may
want a very high level of
granularity, or a very low level of granularity
.
These types of details are very important when
deciding how to define the dimensions that are
contained within your data warehous
e. When
all of the hierarchies are defined, you must set
up the relationships that are contained within
the dimensions. An example of these
relationships can be seen in figure 1.7.



Figure 1.7


When all of

hierarchies and relationships are set
up, the c
ube can be launched. A fully
implemented cube will look something like
figure 1.8.



Figure 1.8


6. Browsing the Cube

Keeping in mind that the ultimate goal of the
data warehouse is to provide strategic
information to managers and business owners,
it is n
ow time to browse the cube that you have
created. This is the process where you are
actually designing the queries that will provide
the reports the end user is looking for. The
Benchmark project is concerned primarily with
commission that is collected fro
m each
transaction. In order to get a picture of the
business as a whole it is more reasonable to
query the data for commission from a particular
region, or in our case, by each state. In figure
1.9, we have shown commission by state as it is
presented in
the cube browser.



Figure 1.9



This is a very high level of detail. If you were to
add all of the hierarchies that are available to
this query, you can drill the data down to
provide commission for each employee in each
zip code, as it relates to each type and name of
investment. This
is shown in figure 1.10
.



Figure 1.10




7. Generating useful reports

Being able to browse the cube and design
queries is a very powerful and useful tool.
Unfortunately, to the end user of the system,
some of these queries are almost unreadable
within in the cube browser. Remember that the
final result of this project is to
provide strategic
information that will be useful to management
in making decisions that will affect the future
health of the organization. These reports are
not going to be provided to a member of the IT
staff who would be comfortable viewing the
format i
n the browser. Management will want a
report that can be read and interpreted easily.
Providing these kinds of reports is easily done
once you have a functional cube.
The cube that
was initially created within Analysis Services can
also be accessed with Re
ports Service which is
another very powerful tool that is included in
SQL Server 2008. By creating a Reports Services
project, we were able to generate reports from
the Benchmark warehouse that will be useful to
the owner and management. The same
informat
ion that is depicted in figure 1.9 is
again displayed in figure 1.11 in a much easier
to read format.



Figure 1.11


Another report that the Benchmark
management wanted was Customers with High
Risk Investments, figure 1.12, which would
allow them to find
customers who have money
in an investment that is now considered to be
high risk.



Figure 1.12


Even though this particular report does not
directly track commission, it is directly related
to the amount of commission that the company
collects. The goal
of Benchmark is help grow the
retirement funds of their customers, and if they
were to ignore these risky investments, they
would lose money, ruin the reputation they
have strived to build, and drive new and
existing customers away. When there are no
custo
mers, there is no commission to keep
track.


Conclusion

Entering into the process of constructing a data
warehouse with no prior kno
wledge of the
subject

proved to be quite a challenge
. It also
turned into an exceptional learning experience.
We learned to
carefully analyze the project that
has been presented before diving into it head
first. It is essential to do this so that you can be
sure that the correct approach is being taken in
regards to the end result. Starting with the
desired result and working b
ackwards turned
out to be the direction that we ultimately took
with this project, and is probably a viable
approach to take when designing a data
warehouse. A data warehouse is a report
centric system, so beginning with an
understanding of the desired out
put will lead to
a much more efficient design plan. We believe
that we have constructed a system that
Benchmark will be able to rely on for their
reporting needs for the foreseeable future.










Works Cited

Cameron, Scott. (2009)
Microsoft SQL Server 2008 Analysis Services Step by Step.

Redmund, Washington:


Microsoft Press.


Ponniah, Paulraj. (2001)
Data WarehousingFundamentals: A Comprehensive Guide for IT Professionals.


New York, New York: John Wiley & S
ons.