NIST BIG DATA WG

desertcockatooΔιαχείριση Δεδομένων

20 Νοε 2013 (πριν από 3 χρόνια και 4 μήνες)

64 εμφανίσεις

NIST BIG DATA WG

Reference Architecture Subgroup

Intermediate Report

Co
-
chairs:

Orit
Levin (
Microsoft)

James
Ketner

(
AT&T)

Don
Krapohl (Augmented Intelligence
)


July 24th, 2013

Reference Architecture Objectives


Addresses a broad range of stakeholders (e.g., data owners,
industries, academia,
p
olicy makers)


Wide scope:


Encompasses the whole data life cycle or in the ecosystem


Can be applied to different use cases (including various verticals)


Represents different system architectures (e.g., an enterprise data
warehouse, distributed cloud
-
based system using multiple service providers)


Focus


Potentially with initial focus on the Big Data analytics and tools


Assists in identifying security and privacy issues


Agnostic to any specific technologies

2

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

RA Diagram Independent Submissions


Different styles and perspectives, but easy to map between them


Data centric (Wo Chang)


Data
F
low centric (Orit Levin, Bob
Marcus)


Technology
Layers / Stack
diagram (Gary
Mazzaferro
)


The vocabulary used in these submissions and on the mailing list has
been compiled and submitted as M
-
0057

3

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

Abstract Reference Architecture

by Wo Chang / NIST

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

4

Independent RA Proposals: Big Data

Sources, Usage, Transformation, and Infrastructure

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

5

Data Flow

Diagram by Bob Marcus

Technology Stack / Layers

Diagram

by G.
Mazzaferro

Data Flow Ecosystem
Diagram by Orit Levin

Data Sources and Usage

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

6

Data Flow

Diagram by Bob Marcus

Technology Stack / Layers

Diagram

by G.
Mazzaferro

Data Flow Ecosystem
Diagram by Orit Levin

Infrastructure
:


Storage, Security, and Management

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

7

Data Flow

Diagram by Bob Marcus

Technology Stack / Layers

Diagram

by G.
Mazzaferro

Data Flow Ecosystem
Diagram by Orit Levin

Data Transformation
:


Processing, Analytics, and Visualization

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

8

Data Flow

Diagram by Bob Marcus

Technology Stack / Layers

Diagram

by G.
Mazzaferro

Data Flow Ecosystem
Diagram by Orit Levin

Draft Agreement / Rough Consensus


Transformation

includes


Processing functions


Analytic functions


Visualization functions


Data Infrastructure

includes


Data stores


In
-
memory DBs


Analytic DBs



7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

9

Sources

Transformation

Usage

Data Infrastructure

Security

Management

Cloud Computing

Network

Next Steps and AIs


Deliverable I
: Write the White Paper draft showing one or more (e.g.,
Data
Flow and
Stack approaches
)
using the same or similar terminology


AI: Chairs will start the draft of the document incorporating the
submissions to the
Ref Arch subgroup


AI:
C
lose cooperation between “Ref Arch” and “
Def&Tax
” sub
-
groups to produce the
Output: taxonomy for the RA diagrams with definitions for major entities/blocks;
Input: M
-
0057.


Deliverable II
: A draft of a single RA requires more discussion and inputs
based on the work of all sub
-
groups


AI: Chairs will start the draft of the document incorporating the findings of the Ref
Arch subgroup


AI: Review the latest contributions to the Ref Arch and incorporate their findings (See
email from Yuri
Demchenko

/ University
of
Amsterdam)


AI:
C
lose cooperation with the “Use Cases” and “Security” sub
-
groups to identify the
areas of focus for “zooming” into their architecture



10

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

Backup Slides

11

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

Submitted RAs

12

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

Data Centric by Wo Chang / NIST

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

13

Data Flow Diagram by Bob Marcus

14

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

Individual Data Transfer

Big Data Transfer

Selected Data Storage and Retrieval

Big Data Storage and Retrieval

Aggregation

Dat a Obj ect s

Data Sources

Data Usage

Government (incl. health & financial institutions)

Industries / Businesses

Network Operators / Telecom

Academia

Data Mining

Matching

Collection

Data Transformation

Data Infrastructure

Storage &
Retrieval

Management

Security

C
onditioning

Anonymized

Pseudo
-

anonymized

PII

VOLUME

VARIETY

VELOCITY

Aggregation

15

Data Flow Ecosystem Diagram
by Orit Levin

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

Technology
Layers / Stack diagram

by
Gary
Mazzaferro

M i c r o s o f t

16

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

Mapping to Technologies and Use
Cases

Prepared by the authors of the original RAs

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

17

18

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

19

An Example of Cloud Computing Usage in Big
Data Ecosystem

Individual Data Transfer

Big Data Transfer

Selected Data Storage and Retrieval

Big Data Storage and Retrieval

Aggregation

Dat a Obj ect s

Data Sources

Data Usage

Government (incl. health & financial institutions)

Industries / Businesses

Network Operators / Telecom

Academia

Data Mining

Collection

Data Transformation

Data Infrastructure

VOLUME

VARIETY

VELOCITY

Data Warehouse

Cloud Provider

/ Service Layer

SaaS

P
aaS

I
aaS

Matching

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

Online Data Aggregator

Data Subject / Person

Online Sources

Public Records (commons,
government, etc.)

Offline Sources

Internal Records

Other devices (Smart Grid,
surveillance, scientific, etc.)

End User
d
evices incl. OS
(mobile phones, etc.)

Applications (search,
publishers, etc.)

Match/Bridge Service

Networks

Government, health,
financial institutions,
academia

Industries /

Businesses

Network
Operators

Collection

Data

Management

Platforms

(DMPs)

UI: Do Not Track (DNT)

HTTP: DNT

Analytic Cookie

DMP Cookie

DPI

Match Cookie

Appl. with customers
(communications, social
network, etc.

Match Container Tag
or Pixel request

Offline Data Aggregator

Web Browsers

Data Mining

Person Attribution

Users

SSP

D
SP

AdNet

AdX

Agency

Publisher

Advertiser

Advertising Industry Ecosystem

DMP Container Tag

or Pixel request

Control

Aggregated

1
st

Party

2
nd

Party

De
-
identified

PII

3
rd

Party

Contextual
Data Collection

Behavioral
Data Creation

Big Data Transfer

Individual Data Transfer

20

Use Case: Advertising

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

Individual Data Transfer

Big Data Transfer

Selected Data Storage and Retrieval

Big Data Storage and Retrieval

Online Analytical
Processing (OLAP)

Data Usage


Department Data
Mart


Regional Data
Mart


Subject Data Mart

Application Data
Mart


Data Mining /

Knowledge Discovery in Databases (KDD)

Extraction, Transformation, and Loading

(
ETL)

Data Transformation

Data Infrastructure

Central Data
Warehouse

Management

Security

Archives

Files

Online Transaction Processing
(OLTP) Systems

MS Office Documents

Functional Data
Mart


Operational
Data Store

Staging Area

Data Sources

Manual

Managed Report
E
nvironment (MRE)

Dat a Obj ect s

21

Use Case: Enterprise Data Warehouse

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

7/24/2013

NIST Big Data WG / Ref Arch Sub
-
group

22