Hadoop In Windows Azure - Windows Azure User Group NL

spongereasonInternet and Web Development

Nov 12, 2013 (7 years and 10 months ago)

263 views



Map tasks

18

53705

$65

53705

$30

53705

$15

54235

$75

54235

$22

02115

$15

02115

$15

44313

$10

44313

$25

44313

$55

5

53705

$15

6

44313

$10

5

53705

$65

0

54235

$22

9

02115

$15

6

44313

$25

3

10025

$95

8

44313

$55

2

53705

$30

1

02115

$15

4

54235

$75

7

10025

$60

Mapper

Mapper

4

54235

$75

7

10025

$60

2

53705

$30

1

02115

$15

10025

$60

5

53705

$65

0

54235

$22

5

53705

$15

6

44313

$10

3

10025

$95

8

44313

$55

9

02115

$15

6

44313

$25

10025

$95

DataNode3

DataNode2

DataNode1

Blocks

of the

Sales

file in

HDFS

Group

By

Group

By

(
custId
,
zipCode
, amount)

One output bucket
per reduce task

Reducer

Reducer

Reduce tasks

Reducer

53705

$65

54235

$75

54235

$22

10025

$95

44313

$55

10025

$60

Mapper

53705

$30

53705

$15

02115

$15

02115

$15

44313

$10

44313

$25

Mapper

53705

$65

53705

$30

53705

$15

44313

$10

44313

$25

10025

$95

44313

$55

10025

$60

54235

$75

54235

$22

02115

$15

02115

$15

Sort

Sort

Sort

53705

$65

53705

$30

53705

$15

44313

$10

44313

$25

44313

$55

10025

$95

10025

$60

54235

$75

54235

$22

02115

$15

02115

$15

SUM

SUM

SUM

10025

$155

44313

$90

53705

$110

54235

$97

02115

$30

Shuffle


Azure Blob
Storage

Azure Blob
Storage

Name
Node

Data
Node

Data
Node

Data
Node

Data
Node

S3

On
P
remise
E
nterprise
C
ontent


Transactional DBs


On Prem logs


Internal sensors

Cloud Enterprise Content


Generated in Azure

3
rd

Party Content


Azure Datamarket


Generated/stored
elsewhere


P
ublic content


Delivered online

Azure Blob
Storage

SQL Azure

Application
end point

What does Hadoop in the Cloud mean?

Where is
HDFS?

Where is my data stored?

Azure Blob Storage vs. HDFS