Big Data Infrastructure for Scientific Computing

coleslawokraSoftware and s/w Development

Dec 1, 2013 (3 years and 11 months ago)

73 views

Big Data Infrastructure for Scientific Computing

Mathijs Kattenberg


mathijs.kattenberg@surfsara.nl

Big Data Landscape

Large Hadron
Collider:

-
Uses: Grid

-
Volume:
~15 PB per year


(~4PB @ SURFsara)

-
Type of data
: structured

Next Generation Sequencing (
GoNL
):

-
Uses: Grid, Cloud, Cluster

-
Volume: ~100 GB to 300 TB

-
Type of data: various formats and
noise




Big Data Landscape

Big Data Landscape

Information retrieval and NLP

-
Uses:
Hadoop
, Cloud

-
Volume: ~70 TB

-
Type of data: Text, unstructured




http://
bit.ly
/173ddfz

Where having and exploiting
data
leads to
insights:

-
Brainscanr

-
Healthmap


Effectiveness of Data


Lots of open data:

-
Open data Nederland

-
CitySDK

-
Community of Amsterdam

-
Rijkswaterstaat

-
Twitter

-
Facebook

-
Google



Different formats:

-
Excel files

-
JSON

-
Webservices



Different quality:

-
Noise

-
Missing values

-
Availability





(Open) Data Sources

Capacity
:


CPU cores


Hard drive space


Network bandwidth





Solutions
:


Scale up: get faster tools


Scale out: work with more tools

Complexity
:


Data:

-
Noise, missing data

-
Formats

-
Access


Distributed computing

-
Failures

-
Parallel programming


Solutions
:


Data: deal with it


Distributed computing:

-
Super/Cluster computer

-
Grid

-
Hadoop


Computing Big Data

Computing Big Data

Computing Big Data

SURFsara provides:


1.
Infrastructure: Supercomputer, clusters, grid, cloud,
hadoop


1.
Support: development, parallelization, consultancy


2.
R&D: piloting new technologies


1.
Hosting datasets for common use









What SURFsara Offers

www.surfsara.nl

Mathijs Kattenberg

mathijs.kattenberg@surfsara.nl

www.sendsteps.com

Prepare to react; keep your phone ready!

TXT

1

2

Text

to

+316 4250 0030

Type
Session

<
space
>
WS4

<
space
>
your

answer

Internet

1

2

Go
to

sendc.com

Log in
with

Session

Posting messages is anonymous

No additional charge per message

3

Type
WS4
<
space
>
y
our

answer

What kind of technologies would you consider
using in order to deal with technical Big Data
challenges?

Internet

Go
to

sendc.com

and

log in
with

Session

Type
WS4
<
space
>

Your

answer

TXT

Send

to

06 4250 0030
:
Session

Type
WS4
<
space
>

Your

answer