PostgreSQL Genomic Databases - Scientific Application Case Example

offbeatlossΔιαχείριση Δεδομένων

22 Νοε 2012 (πριν από 4 χρόνια και 6 μήνες)

309 εμφανίσεις

PostgreSQL
PostgreSQL
Scientific
Scientific
Application
Application
-
-
Case
Case
example
example
PostgreSQL
PostgreSQL
Genomic
Genomic
Databases
Databases
S
S
é
é
bastien Cl
bastien Cl
é
é
ment
ment
sclement@cfrl.forestry.ca
sclement@cfrl.forestry.ca
Natural
Natural
Resources
Resources
Canada
Canada
Presented
Presented
at
at
the
the
PostgreSQL
PostgreSQL
Conference
Conference
2009 in
2009 in
Japan
Japan
November
November
20th
20th
Foreword
Foreword
«
«
What
What
is
is
this
this
guy
guy
doing
doing
here
here
?
?
»
»
«
«
Can
Can
PostgreSQL
PostgreSQL
handle
handle
scientific
scientific
databases
databases
?
?
»
»
What
What
is
is
genomics
genomics
and
and
why
why
bother
bother
?
?
Genomics
:
«
The
study
of the
entire
genome
(all
genes
) of a
species
»
Genome
size
Number
of
genes

Health
and
disease

Heredity

etc.

Genetic
improvement
3 000 000 000
3 000 000 000
~
~
23 000
23 000
390 000 000
390 000 000
~53 000
~53 000
DATA
Why
Why
study
study
TREE
TREE
genomics
genomics
?
?
CGACGTTA
A
TGCCACTC
CGACGTTAATGCCACTC
G
Cellulose
Cellulose
gene
gene
Normal
Variant
DATA
DB
DB
Why
Why
is
is
a
a
genomic
genomic
DB essential?
DB essential?
A single
A single
gene
gene




how about
how about
thousands
thousands
of
of
genes
genes




for
for
thousands
thousands
of
of
species
species
?
?
Name
Sequence
Size
Functions
Cell
wall
metabolism
Cell
structrure
Catalysis

Variations
Species
Interaction
with
other
genes
Similarity
with
other
species
more

(
phew
!)
Chromosome pos.
Public
Public
genomics
genomics
DBs
DBs
Genbank
Genbank
UniProt
UniProt
TAIR
TAIR
http://www.arabidopsis.org/
http://www.ebi.ac.uk/uniprot/
http://www.ncbi.nlm.nih.gov/Genbank/
Our
Our
PostgreSQL
PostgreSQL
Databases
Databases
TreeSNPs
TreeSNPs
Genes
and variations
PhenoTree
PhenoTree
Observable
attributes
(
physical
,
morphological
)

Ruby on Rails interface

Multi
-
language
support

38 tables

~
450 K records

Mostly
manual
entry

PhpPgAdmin
interface

21 tables

~
4.1 M records
TreeSNPs
TreeSNPs
overview
overview
General
General
views
views
TreeSNPs
TreeSNPs
overview
overview
(
(
cont
cont


d
d
)
)
Lab
Lab
plate
plate
Plate
Plate
view
view
Results
Results
TreeSNPs
TreeSNPs
overview
overview
(
(
cont
cont


d
d
)
)
Example
Example
of
of
calculations
calculations
(
(
views
views
):
):
TreeSNPs
TreeSNPs
overview
overview
(
(
cont
cont


d
d
)
)
TreeSNPs
TreeSNPs
download
download
page &
page &
demo
demo
version
version
http://treesnps
-
pub.arborea.ulaval.ca:3000/download
A
A
paper
paper
to
to
appear
appear
soon
soon
in
in
Adopted
Adopted
by U. of Alberta
by U. of Alberta


s (Canada)
s (Canada)
Laboratory
Laboratory
on
on
Moutain
Moutain
Pine
Pine
Beetle
Beetle
PhenoTree
PhenoTree
overview
overview
>
>
1000
1000
trees
trees
>
>
60 K
60 K
records
records
~
~
4 M
4 M
records
records
1
2
3
4
Dimensions &
Dimensions &
morphology
morphology
Wood
Wood
analysis
analysis
Other
data:

Geographical
locations

Tree
pedigree
What
What
data
data
is
is
stored
stored
?
?
PhenoTree
PhenoTree
overview
overview
(
(
cont
cont


d
d
)
)
Wood
Wood
analysis
analysis
properties
properties
table
table
read
every
25
µ
m
Pith
Pith
Bark
Bark
Radius
Radius
942
trees
~
2100
reads
/
tree
1.98 M
1.98 M
reads
!
etc.
etc.
Wood
Wood
density
density
Fibre dimensions
Fibre dimensions
Cells
Cells
count
count
Example
Example
of
of
calculations
calculations
1 (SQL
1 (SQL
views
views
):
):
PhenoTree
PhenoTree
overview
overview
(
(
cont
cont


d
d
)
)
Pith
Pith
Bark
Bark
Radius
Radius

Ring
width
(mm)
Ring area
(mm
2
)
Σ
x
Wood
density
(kg/m
³
)
Fibre
width
(
µ
m)
Cell
counts
(/mm
²
)

4.35 4.53 3.70
122.3 253.4 302.8
Growth
Growth
ring
ring
averages
averages
1

744 664 611
1 2 3

22.95 23.85 23.89
1369 1446 1699
942
trees
~16
rings/
tree
15 K
15 K
records
SELECT
tableau_croise.arbre
,
tableau_croise.height_1986, tableau_croise.height_1992,
tableau_croise.height_1997, tableau_croise.height_2004,
tableau_croise.height_2005
FROM
crosstab
('select
tree_name
AS
nom_ligne
,
year
AS
categorie,height
AS valeur
from
trunk_measures
ORDER BY 1,2'::
text
, 'SELECT DISTINCT
year
FROM
trunk_measures
ORDER BY 1'::
text
)
tableau_croise
(arbre
text
, height_1986 double
precision
, height_1992 double
precision
, height_1997 double
precision
, height_2004
double
precision
, height_2005 double
precision
);
PhenoTree
PhenoTree
overview
overview
(
(
cont
cont


d
d
)
)
Example
Example
of
of
calculations
calculations
2 (SQL
2 (SQL
views
views
):
):
crosstab
crosstab
function
function
crosstab
crosstab
Logical
Logical
,
,
but not
but not
very
very
useful
useful




this
this
is
is
it
it
what
what
end
end
-
-
users
users
want
want
Systems and user base
Systems and user base

Formerly
Access
projects
(2006
-
7)

Migrated
to
PostgreSQL
8.3
under
Fedora
(2007
-
8)

Migrated
back to Windows (2009)

Around
20
scientific
users
(
Universities
,
Federal
Government
)
Production
Production
Server
Server
Gov
.
Canada
network
University
network
Mirror
Mirror
server
server
Local
users
VPN
Local
users
PostgreSQL
PostgreSQL
and Windows
and Windows


can
can
it
it
really
really
work
work
?
?
Task
Task
automation
automation
with
with
DOS
DOS


Limited
Limited
functionnality
functionnality
Solution ?
Solution ?
*
Thanks
: Greg Smith (http://wiki.postgresql.org/wiki/Automated_Backup_o
n_Windows)
Windows
Windows
Task
Task
Manager
Manager
Cygwin
Cygwin
Unix/
Unix/
bash
bash
scripts
scripts
PostgreSQL
PostgreSQL
Script
Script
examples
examples
:
:


Start Rails server (DOS)
Start Rails server (DOS)


Backups (DOS)*
Backups (DOS)*


Backup files
Backup files
cleaner
cleaner
(
(
bash
bash
)
)


VPN connexion to production server (DOS)
VPN connexion to production server (DOS)


Mirror
Mirror
synchronizing
synchronizing
(
(
bash,DOS
bash,DOS
)
)


Database
Database
version
version
comparison
comparison
(
(
bash
bash
)
)


Users
Users
&
&
privileges
privileges
report (
report (
bash
bash
)
)
Developing
Developing
databases
databases
for the
for the
scientific
scientific
community
community
Suggestions:
Suggestions:


Have a user
Have a user
-
-
based
based
approach
approach


1. Know/
1. Know/
answer
answer
the
the
user
user


s
s
needs
needs


2.
2.
Limit
Limit
technical
technical
jargon
jargon


3.
3.
Think
Think


usability
usability


Aknowledgements
Aknowledgements
Jean Beaulieu
Jean Beaulieu


Lab
Lab
director
director
Jo
Jo
ë
ë
l Fillon
l Fillon


Ruby on Rails interface designer
Ruby on Rails interface designer
Jean
Jean
-
-
Philippe Dionne
Philippe Dionne


Rails
Rails
secure
secure
access
access
programming
programming
Jean Bousquet
Jean Bousquet


Collaborator
Collaborator
All end
All end
users
users
,
,
particularly
particularly
:
:
Sylvie Blais, St
Sylvie Blais, St
é
é
phanie
phanie
Beauseigle
Beauseigle
, Marie Deslauriers,
, Marie Deslauriers,
Pier
Pier
-
-
Luc Poulin, Patrick Lenz
Luc Poulin, Patrick Lenz
People
People
Organizations
Organizations
Arborea
Arborea
Forest
Forest
Genomics
Genomics
(
(
http://www.arborea.ulaval.ca/
http://www.arborea.ulaval.ca/
)
)
Canadian Forest Service, Natural
Canadian Forest Service, Natural
Resources
Resources
Canada
Canada
Genome
Genome
Qu
Qu
é
é
bec
bec
Genome
Genome
Canada
Canada
Done
?