Delivering Enterprise Value with

inspectorwormsΗλεκτρονική - Συσκευές

27 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

57 εμφανίσεις

Copyright © 2004, SAS Institute Inc. All rights reserved.


Wayne Embry


Technical Account Manager


March 17, 2005

Delivering Enterprise Value with
SAS
®

9 Architecture:


GRID COMPUTING and SAS

Copyright © 2004, SAS Institute Inc. All rights reserved.

2

Agenda


Defining Grid


Why is Grid Computing Important?


Who’s Interested in Grid and Why?


SAS Technology Behind Grid


Packaging


Architecture


Supported Platforms


Summary

Copyright © 2004, SAS Institute Inc. All rights reserved.

3

Defining Grid in the IT World…


According to Gartner, "a grid is a collection of
resources owned by multiple organizations that
is coordinated to allow them to solve a common
problem." Gartner (and Wayne) further define
three commonly recognized forms of grid:


*
Computing Grid

-

multiple computers to solve one
application problem


Data Grid

-

multiple storage systems to host one very
large data set


Collaboration Grid

-

multiple collaboration systems for
collaborating on a common issue.


Other
:


Utility Grid



Resources are chosen for you; ASPs

Copyright © 2004, SAS Institute Inc. All rights reserved.

4

Why is Grid Computing Important to
SAS?


SAS believes that 2005 will be the year customers begin to
view grid computing as a practical solution to their
business problems, so the timing is right for it to be an
important focus.


Our ability to speak to our Grid capabilities will further
positions our solutions and toolsets as enterprise class,
and substantially differentiates our offerings from those of
competitors. We will also be able to build additional
enterprise credibility.


Proof:


A recent IDC report projected that the grid computing market
may exceed $12 billion by 2007.


Gartner reported that 56% of large IT customers had not been
contacted by a single vendor regarding Grid.

Copyright © 2004, SAS Institute Inc. All rights reserved.

5

Oracle

Copyright © 2004, SAS Institute Inc. All rights reserved.

6

Why is Grid Computing Important?


Grid computing leverages under
-
utilized and un
-
tapped
computing resources to drastically
reduce processing
times

which in turn
saves money
.


Grid computing allows organizations to
further leverage
their current IT investment

by harnessing the collective
processing power of existing computers to more rapidly
solve complex problems and to run
increasingly data
-
intensive applications
.


IT spending continues to be substantially restricted while
demands on the IT department continue to increase
. Grid
computing is a strategic alternative to resolve this
dilemma, providing one of the biggest “bangs
-
for
-
the buck”
in IT.

Copyright © 2004, SAS Institute Inc. All rights reserved.

7

Reality Check: Who’s Interested and
Why?…

Frugal Phyllis


Title: CIO of a business unit of a large
corporation


Report to: CEO of the business unit


Computing Skills: Advanced


Top ETL
-
related issues:

1. Faced with processing ever
-
increasing volumes of data

2. Challenged to provide useable results in ever
-
shorter
time
-
frames

3. Short on funds, especially for additional hardware







Copyright © 2004, SAS Institute Inc. All rights reserved.

8

Reality Check: Who’s Interested and
Why?…

Al the Architect


Title: Head Information Architect and “right
-
hand”
to CIO


Report to: CIO of a business unit of a large
corporation


Computing Skills: Expert


Top ETL
-
related issues:

1. Charged with building fast and flexible architectures
without spending much money

2. Needs to find ways to cope with more jobs, and larger
jobs, all being squeezed into the same batch window

3.

Would be nice if his solutions to the above could inspire
the Enterprise as a whole, or at least integrate with their
existing tools





Copyright © 2004, SAS Institute Inc. All rights reserved.

9

Reality Check: Who’s Interested and
Why?…

Silo Sandy (somewhat similar to Frugal Phyllis)


Title: CEO (or Director) of a business unit of a
large corporation


Report to: CEO of the Enterprise


Computing Skills: Average


Top ETL
-
related issues:

1.

Trying to build own information organization because
she is not satisfied with corporate IT

2.

Needs to do so using only existing hardware resources

3.

Needs solutions running quickly and with reliability and
maintainability





Copyright © 2004, SAS Institute Inc. All rights reserved.

10

Reality Check: Who’s Interested and
Why?…



And a user persona who influences the above
buyers:

Forever Fred


Title: Business Analyst (a.k.a Power User)


Report to: Director or Sr. Manager of a business
unit of a large corporation


Computing Skills: Power User


Top ETL
-
related issues:

1.

Takes too long to load data for his job, so he misses
batch windows

2.

Constantly being admonished for monopolizing system
resources

3.

“Beaten up” for not delivering reports fast enough

Copyright © 2004, SAS Institute Inc. All rights reserved.

11

Types of Applications Suitable for Grid



Long running jobs (batch window)


Many repetitive iterations of a fundamental task


Simulation


BY GROUP processing


Parallelism


Independent tasks against large data sources


Scoring, Risk analysis


Pipeline parallelism (Piping)


Both




Copyright © 2004, SAS Institute Inc. All rights reserved.

12

RFID Data

Collector

RFID Data

Collector

RFID Data

Collector

RFID Data

Collector

REALTIME

SAP/R3

REALTIME

REALTIME

REALTIME

DB/2

ORACLE

SYBASE

RFID COMPLEXITY

Copyright © 2004, SAS Institute Inc. All rights reserved.

13

SAS Technology Behind Grid


Today…

Analytics Scenario

Base, Connect,….

Base, Connect,…

Base, Connect,….



n

Connect Client

%Distribute

SAS

Copyright © 2004, SAS Institute Inc. All rights reserved.

14

SAS Technology Behind Grid


Today…

Data Integration Scenario

ETL Studio

SAS MC

Schedule Manager

SAS

Servers


Base Connect,….

Base, Connect, …..

Base, Connect,…..



n

Metadata Server

Workspace Server

Connect Client

LSF

Job Scheduler

Copyright © 2004, SAS Institute Inc. All rights reserved.

15

SAS Technology Behind Grid


2005…

Improving our Capabilities

Base, Connect,.....

LSF


Base, Connect,……

LSF

Base, Connect, ……

LSF




n

Connect Client

LSF

SAS

Server

Copyright © 2004, SAS Institute Inc. All rights reserved.

16

SAS Grid

2005…

ETL Studio

SAS MC

Schedule Manager

Grid Manager
-

New

SAS

Servers

Metadata Server

Workspace Server

Connect Client

LSF

Job Scheduler

Base, Connect,…

LSF

Base, Connect,….

LSF

Base, Connect,.…

LSF



n

Enterprise Miner

Copyright © 2004, SAS Institute Inc. All rights reserved.

17

SAS 9 Packaging…


Head Start


SAS
\
Connect is already included in
ETL Server and EETL Server


Any solution including ETL Server


Copyright © 2004, SAS Institute Inc. All rights reserved.

18

Supported Platforms…


Good News


Any platform that supports Base
and Connect


Heterogeneous architecture


Copyright © 2004, SAS Institute Inc. All rights reserved.

19

Architecture Guidelines


There are guidelines to keep in mind when
architecting SAS Grid environments:


Permanent data


SASWORK


Data Accessibility

-

Where it is and how each of the
machines on the grid are attached to it (NFS, SAN)
greatly affects performance.



For help architecting SAS Grids, please call SAS
Account Representative


Copyright © 2004, SAS Institute Inc. All rights reserved.

20

Example Grid Job 1

ETL Studio

SAS

Server

Workspace Server

-
Base


Connect

L8364
-

1 CPU (1.6 GHz; 2 GB RAM)

Base, Connect
Data Quality

Demo0505


2 CPU (3.06 GHz; 4 GB RAM)

Base, Connect
Data Quality

Demo0507


2 CPU (3.06 GHz; 4 GB RAM)

Customer

Orders_grid

Order_item_grid

Copyright © 2004, SAS Institute Inc. All rights reserved.

21

Example Grid Job 2

ETL Studio

SAS

Server

Workspace Server

-
Base


Connect

L8364
-

1 CPU (1.6 GHz; 2 GB RAM)

Base, Connect
Data Quality

Demo0505


2 CPU (3.06 GHz; 4 GB RAM)

Base, Connect
Data Quality

Demo0507


2 CPU (3.06 GHz; 4 GB RAM)

Orders_grid

Order_item_grid

LXYZ

SASWORK

Customer

Copyright © 2004, SAS Institute Inc. All rights reserved.

22

An Example
-

The Scenario…


Single Platform Job
-

Local_Complicated


Run locally on my laptop in sequential order


Source Data


3 local SAS tables:


Customer: 16 Mb; 89,954 rows; 12 columns


Orders_grid: 214 Mb; 5,710,014 rows; 8
columns


Order_item_grid: 315 Mb; 4,487,718 rows; 7
columns


Target


1 local SAS table with 15 columns



Copyright © 2004, SAS Institute Inc. All rights reserved.

23

Local_Complicated Job

ETL Studio

SAS

Server

Workspace Server

-
Data Quality

-
Base

L8364
-

1 CPU (1.6 GHz; 2GB RAM)

Order_item_grid

Orders_grid

Customer

Elapsed Wall Clock
Time: 4 minutes


Copyright © 2004, SAS Institute Inc. All rights reserved.

26

Leveraging the Grid
-

The Scenario…


Enable Job to Run on a SAS Grid
-

Remote_Complicated


Grid Strategies:


Independent parallelism


Independent data and
processes


Pipeline parallelism


Source Data:


2 remote SAS tables:


Orders_grid: 214 Mb; 5,710,014 rows; 8
columns


Order_item_grid: 315 Mb; 4,487,718 rows; 7
columns


1 local SAS table:


Customer: 16 Mb; 89,954 rows; 12 columns


Target


1 local SAS table with 15 columns



Copyright © 2004, SAS Institute Inc. All rights reserved.

27

Remote_Complicated Job

ETL Studio

SAS

Server

Workspace Server

-
Base


Connect

L7875
-

1 CPU (1.6 GHz; 1 GB RAM)

Base, Connect
Data Quality

Demo0505


2 CPU (3.06 GHz; 4 GB RAM)

Base, Connect
Data Quality

Demo0507


2 CPU (3.06 GHz; 4 GB RAM)

Customer

Orders_grid

Order_item_grid

Elapsed Wall Clock
Time: 30 seconds


90% improvement!

Copyright © 2004, SAS Institute Inc. All rights reserved.

28

Performance Issues


Competition answer to performance issues


Buy a bigger server (i.e., 32 way to a 64 way)


Increase the number of RDMS instances (i.e., Oracle)


More $$$$


SAS’ answer


Grid computing leverages under
-
utilized and un
-
tapped
heterogeneous

computing resources to drastically
reduce processing times


Grid computing allows organizations to further leverage
their current IT investment by harnessing the collective
processing power of existing computers


Save $$$$


Copyright © 2004, SAS Institute Inc. All rights reserved.

29

Architecture Guidelines


There are guidelines to keep in mind when
architecting SAS Grid environments:


Permanent data


SASWORK


Data Accessibility

-

Where it is and how each of the
machines on the grid are attached to it (NFS, SAN)
greatly affects performance.


Copyright © 2004, SAS Institute Inc. All rights reserved.

30

How is it Set Up? The SAS Technology
Behind the Scenario…


Components and Considerations:


Base, SAS/Connect


ETL Studio


Metadata Server


Data Quality

Copyright © 2004, SAS Institute Inc. All rights reserved.

31

GRID ETL JOB

Copyright © 2004, SAS Institute Inc. All rights reserved.

32

GRID STATS

Copyright © 2004, SAS Institute Inc. All rights reserved.

34

Connect Servers and Spawners

Copyright © 2004, SAS Institute Inc. All rights reserved.

35

Connect Servers and Spawners

Copyright © 2004, SAS Institute Inc. All rights reserved.

36

Libraries

Copyright © 2004, SAS Institute Inc. All rights reserved.

37

Logins

Copyright © 2004, SAS Institute Inc. All rights reserved.

38

Closing Thoughts…


Mileage may vary


Next step in evolving the SAS9 Platform


Enterprise credibility


Competition


Buy more servers and license more DBMS instances


These 50 jobs will use this server, these 30 jobs run on
this server….


Manageability


BI


Stored processes


EMiner and LSF Integration


ITMS


ITRM will have a generic collector to collect LSF
performance data


Copyright © 2004, SAS Institute Inc. All rights reserved.

39

Collateral…


White Papers


SUGI29
-

http://support.sas.com/rnd/scalability/papers/sugi29_grid.pdf


Connect Syntax
-

http://support.sas.com/rnd/scalability/papers/mpconnect0401.pdf


%DISTRIBUTE

http://support.sas.com/rnd/scalability/papers/distConnect0401.pdf


Web Site


http://support.sas.com/rnd/scalability/grid/index.html


Customer Reference Stories


http://support.sas.com/rnd/scalability/grid/gridcust.html


Copyright © 2004, SAS Institute Inc. All rights reserved.

40

Copyright © 2003, SAS Institute Inc. All rights reserved.

40

Questions ?