Big Data at Ancestry.com

siberiaskeinData Management

Nov 20, 2013 (3 years and 6 months ago)

164 views







OVERVIEW




In the last 10 years big d
ata
has become main stream but what does that mean to a family
historian? Come learn what big data is, what Ancestry’
s large
sets are, and
three specific
initiatives involving big data: DNA, data mining,

and machine lear
ning.



The goal is to b
ring the
discussion
down to earth for

non
-
technical people inter
ested in how
Ancestry leverages these innovative technologies

in our service

WHAT IS BIG
DATA?




Big d
ata
is a collection of data sets so large and complex that it becomes difficult to process
using
commonly used software

tools
.

o

Aggregation vs. Personalization

o

Technology changes that are making

b
ig

d
ata collection and analysis mainstream

o

The
3
-
Vs: Velocity, Variety, and Volume



Ancestry has Big Data

o

What are the l
arge data sets at Ancestry
?



Three specific Big D
ata
initiatives

at Ancestry

o

DNA data and results processing

o

Data mining and personalization

o

Machine learning

DNA RESULTS PROCESSI
NG


What happens after you “spit in a tube” and send in your DNA? Analyzing DNA is a big d
ata
challenge.



A look at DNA data



A continually growing DNA pool



Big data technologies
in use

for
DNA
Ethnicity and Matching

DATA MINING AND PERS
ONALIZATION


Data mining

is the

process that attempts to discover patterns in large data sets

using computer
science and statistics.



The data being mined



Stitching
large unstructured data sets
together



Analyzing the data to create a personal family history experience

MACHINE LEAR
NING


Machine learning
is about the construction and study of systems that can learn from data
.




Content Delivery Pipeline



Record Linking





Big Data at Ancestry.com


Presented by Bill Yetman

March 21, 2013


CONCLUSION:
IMPLICATIONS FOR THE

FAMILY HISTORIAN




Where is Ancestry going with big data?

o

Investing in the
hardware
,

technology
, and people

to improve the service




How you can help improve our big data analytics

o

Why am I being asked for feedback?



Questions? Suggestions?