Meta-analysis of genetic

apatheticyogurtSoftware and s/w Development

Dec 13, 2013 (3 years and 8 months ago)

93 views

Where Innovation Is Tradition




Meta
-
analysis of genetic
associations using knowledge
representation

J. Enrique Herrera
-
Galeano

Jeff Solka

Colloquium

Bioinformatics and Computational Biology

Systems Biology

George Mason University

September 24
th
, 2013


Where Innovation Is Tradition

1. Background

2. The
problem &

3.

Motivating examples

4.
Results

5.
OGA application


Outline


Hypothesis

Where Innovation Is Tradition

Genetics


Hippocrates (460
-
370 BC),
Celsus

(25 BC
-
50 AD), and
Galen (130
-
201 AD) Description of the Human body


Mendel distinguished between internal state (genotype)
and the external stage (phenotype)


Mendelian

inheritance
, biochemical pathway defects,
metabolic disorders



Phenylketonuria

-

Described by
Ivar

Asbjørn

Følling

In 1934
is a good example of a disorder caused by as single
mutation
Autosomal

recesive














<do> add gene </do>


<do> add SNP </do>



This caused the search for the gene for everything

Where Innovation Is Tradition

Genetic epidemiology


Segregation analysis = analysis of pedigrees


PCR 1980’s Short tandem repeats (STRs)


Highly polymorphic and neutral to selection


Whole Genome Mapping (WGM) or Linkage
analysis


1990’s linkage of Breast Cancer to
Chromosome 17q (D17S588 and D17S250)

Where Innovation Is Tradition


BRCA1 and BRCA2


Chromosome 17q

Where Innovation Is Tradition

Not as simple


Janine
Altmüller

in 2001 best summarized
these observations by stating “Positional
cloning based on whole
-
genome screens in
complex human disease has proved more
difficult than originally had been
envisioned…” (
Altmüller
, 2001)

Where Innovation Is Tradition

Candidate Gene Approach



1990’s Due to the limited success of WGM
Take all the
genes
associated
with the
phenotype by
different
methods, find
polymorphisms, genotype.


2000’s Human genome Sequencing
-
> SNPs
Illumina

Golden Gate array
-

Thousand of
SNPs


Hundreds of Genes
-

SNP selection
problem (NP complete)


Where Innovation Is Tradition

Candidate Gene Approach

Herrera
-
Galeano
, 2008

Metropolis
monte

Carlo
markov

chain


Min(
σ
(
distanace
) *

The probability of a SNP


being real



p = 0.3L + 0.2H + 0.2S + 0.1M + 0.1V,

where

L =
Illumina

score

H =
heterozygosity

(from

dbSNP
)


S = success rate (from
dbSNP
),

M = 1 if present as tag SNP in the
HapMap
,

or zero if not,


V = the number of validation sources/10.

Where Innovation Is Tradition

Candidate Gene Approach

Example:


PEAR1 Herrera
-
Galeano, ATVB 2008



Where Innovation Is Tradition

Complex Human Disease

Neurological abnormalities: Schizophrenia, depression

High Blood pressure

LDL cholesterol

Height

Weight

BMI


Vp

= Vg +
Ve


Vp

= Phenotypic variance

Vg = Genetic variance

Ve

= Environmental variance

Heritability in the broad sense


H = Vg/
Vp

(Falconer, 1993)

Where Innovation Is Tradition

Complex Human Disease

Where Innovation Is Tradition

Genome Wide association (GWAS)


High Density arrays now allowed for
millions of SNPs, left behind SNP
selection.


Missing Heritability

Where Innovation Is Tradition

GWAS

Solutions to the missing heritability problem:


Epigenomics
,
other
omics



Epistatic

effect:


1. Map/reduce for cloud brute force (Wang, 2011)


2. Random
handfuls
( Province, 2008)


3. Machine learning (
Lin, 2012)


4. Information theory (Lee, 2012)

Where Innovation Is Tradition

The problem/hypothesis


All of these focus on the search space of the
genotypes the relationships of
phenotypes
currently unutilized


Are closely related phenotypes associated to
the same genes?


What methodology can be utilized to answer
such a question?


GWAS General Well Being


QTL clearly related to Mental disorders,
what if a related SNP was associated with
a related
phenotype

Where Innovation Is Tradition

GWAS General Well Being Example

RsNumber

Pvalue

Position

ObsHET

MAF

HWpval

Genes

Fxn_Class

rs11588923

0.04847

147983660

0.066

0.034

1

LOC729130

intron

rs1046332

2.00E
-
07

148084132

0.038

0.019

1

NA

NA

rs15931

5.90E
-
10

148122974

0.032

0.016

1

HIST2H2BE

mrna
-
utr

rs1451641

2.30E
-
10

148132504

0.031

0.016

1

NA

NA

rs1349532

2.30E
-
10

148137627

0.031

0.016

1

BOLA1

locus
-
region

rs12078573

0.00402

148170233

0.092

0.052

0.1476

MTMR11

intron

rs10494363

5.60E
-
11

148176119

0.03

0.015

1

NA

NA

rs16841623

0.04478

148204570

0.116

0.059

0.3557

OTUD7B

intron

rs16841697

0.04478

148205144

0.116

0.059

0.3557

OTUD7B

intron

rs16832993

0.03906

148234790

0.116

0.059

0.3557

OTUD7B

intron

Where Innovation Is Tradition

Ontologies

and Genetic association

Requirements :


Phenotype ontology Human Phenotype
Ontology (HPO) Robinson (2010)


Database of Genetic associations
(NCBI
Genetic Association
Database)

Where Innovation Is Tradition

Ontologies

and Genetic association

Columbia Medical Entity Dictionary (MED)

-
A semantic network from ICD
-
10, SNOMED, UMLS

Is
-
a relationship

Where Innovation Is Tradition

Human Phenotype Ontology

Where Innovation Is Tradition

Linking HOP with GAD


How to match the ontology concepts with the
genetic association database entries?

Overlapping matching sets:

Coronary

Artery

Disease

Concepts that match

Coronary Artery
Disease

Where Innovation Is Tradition

Linking HOP with GAD

Pattern matching: Find string s in text T

Finite
-
state automaton (
grep
)

Blast

Suffix tree/array


Where Innovation Is Tradition

Linking HOP with GAD

Suffix array:

One common word:










percentage of assignment (41.1% vs. 27.5%)

error rate
30%

, one sample n=1,000

Complete string matching:








percentage of assignment 19%,






error rate
~2%

on 5 samples of n=1000





Where Innovation Is Tradition

OGA Entity Relationship Diagram

Where Innovation Is Tradition

SQLite DBs

3/28/12

Where Innovation Is Tradition

OGA Simplified UML Diagram

Where Innovation Is Tradition

Mockup OGA

3/28/12

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Implementation

Where Innovation Is Tradition

OGA Package contents

File

Description

oga.jar

The java jar file that contains all the
classes necessary to run the application

merge.db

The SQLite database that implements
the database design (see methods)

Concepts.data

The names of the HOP concepts

Concepts.data.bis

The index to support the suffix array
based pattern matching

Libraries

sqlite
-
jdbc
-
3.7.2.jar a dependency to
connect to the
SQLite

database

Genetic Associations on the

Phenotype Ontology

Where Innovation Is Tradition

Why these nine genes?

Gene Symbol

Gene Name

BDNF

brain
-
derived neurotrophic factor

CLOCK

circadian locomoter output cycles

CNR1

cannabinoid receptor 1

GHRL

ghrelin/obestatin prepropeptide

HTR1B

5
-
hydroxytryptamine (serotonin)

HTR2A

5
-
hydroxytryptamine (serotonin)

HTR2C

5
-
hydroxytryptamine (serotonin)

SLC6A4

neurotransmitter transporter

TPH1

tryptophan hydroxylase 1

Where Innovation Is Tradition

OGA: Ontology of Genetic Associations

Allows for answering questions such as:


What Genes are associated with Mental
Disorder?


What is the intersection of genes between two
or more phenotypes of interest?

Where Innovation Is Tradition

Observed count of phenotypes per gene : Mental
Disorder Concept

Gene

Gene Name

Phenotype Count

SLC6A4


neurotransmitter transporter

20

NOS1


nitric oxide
synthase

1

16

HLA
-
A

major
histocompatibility

complex, class I, A

13

APOE


apolipoprotein

E

11

HLA
-
DRB1


major
histocompatibility

complex, class II, DR beta 1

10

NOS2A


nitric oxide
synthase

2, inducible

10

TOR1A


torsin

family 1, member A

10

TOR1B


torsin

family 1, member B

10

BCHE


butyrylcholinesterase

9

CCL2


chemokine

(C
-
C motif)
ligand

2

9

SERPINI1


serpin peptidase inhibitor, clade I

9

VLDLR


very low density lipoprotein receptor

9

MAOA


monoamine
oxidase

A

8

Where Innovation Is Tradition

Phenotypes count found by chance?


Empirical p
-
value



Empirical p
-
value = 1 / sumi=1..n(C'i)

Where Innovation Is Tradition

OGA preliminary stats

GAD has 84,558 entries

23,303 unique matches (27.5%)

SLC6A4
-
> 20 phenotypes 178 iterations




p
-
value = 0.0056

NOS1
-
> 16 phenotypes 41 iterations


p
-
value = 0.02

All other > 0.05

SLC6A4, MAOA, NOS1, NOS2A and NOS3

Where Innovation Is Tradition

INFORMATION NETWORK

SLC6A4

Regulates

SEROTONIN

MAOA

Degrades

Oxidase

NOS1, NOS2A

Antioxidants and depression?

Where Innovation Is Tradition

Neurocarta

Where Innovation Is Tradition

OGA
vs

Neurocarta

OGA

Neurocarta

Number of links

98,698

30,000

Number of concepts

2,708

2,000

Number of genes

4,666

7,000

Backbone

HOP

HOP, DO, MPO

Curated

No

Yes

Statistical analysis

Yes

No

Interface

Standalone

Website

Where Innovation Is Tradition

Top 10 genes by phenotype count

Gene

Phenotypes in OGA

ACE

1,923

NOS3

1,659

APOE

1,573

GJB2

1,042

HLA
-
DRB1

1,008

AGT

971

MTHFR

960

NOS1

866

TNF

770

HLA
-
DQB1

689

Where Innovation Is Tradition

Top 10 phenotypes by gene count

Phenotype

Genes in OGA

Alzheimer’s Disease

2ⰴ,3

Schizophrenia

1,816

Colorectal Cancer

1,581

Hypertension

1,251

Breast Cancer

1,211

Asthma

911

Osteosclerosis

798

Rheumatoid arthritis

687

Myocardial infarction

643

Obesity

641

Where Innovation Is Tradition

Motivating examples

1.
Colon cancer and
Helicobacter pylori

infection
susceptibility

2.
Lipid metabolism, diabetes, obesity, and
hypertension

3.
Schizophrenia, bulimia, depression and
psychosis

4.
Autism and Cerebral palsy



Where Innovation Is Tradition

Motivating examples

1.
Colon cancer and
Helicobacter pylori

infection susceptibility

Strofilas

et al., 2012 Colon cancer & H. pylori
infection

O'Donoghue
, 2011 CYP2C19 and H. pylori

Yamamoto et al., 2013 CYP2C19 and cancer

CYP2C19 is the gene symbol for the
Cytochrome

P450, family 2, subfamily C, polypeptide 19
gene



Where Innovation Is Tradition

Motivating examples

2. Lipid metabolism, diabetes, obesity, and hypertension



Gene Symbol

Gene Name

Comment (associations according to OMIM)

APOE

Apolipoprotein

E

Alzheimer disease
-
2,
Hyperlipoproteinemia
, type III, Myocardial infarction susceptibility

ACE

Angiotensin

I
-
converting
enzyme

Myocardial infarction susceptibility, Alzheimer disease, Stroke

CETP

Cholesteryl Ester Transfer
protein

Hyperalphalipoproteinemia

AGT

Angiotensinogen

Hypertension

IL6

Interleukin 6

Diabetes

FGB

Fibrinogen B, Beta polypeptide

None

PON1

Paraoxonase 1

Coronary artery disease,
Microvascular

complications of diabetes

LPL

Lipoprotein lipase

Combined
hyperlipidemia
, familial

MTHFR

5,10
-
Methylenetetrahydrofolate
reductase

Vascular disease, Schizophrenia

Where Innovation Is Tradition

Motivating examples

2. Lipid metabolism, diabetes, obesity, and hypertension



Cytoscape

Where Innovation Is Tradition

Motivating examples

3. Schizophrenia, bulimia, depression and psychosis



Gene Symbol

Gene Name

Comment

HTR2A

HTR2A 5
-
hydroxytryptamine
(serotonin) receptor 2A, G
protein
-
coupled

A neurotransmitter associated
with depression, schizophrenia,
anorexia

SLC6A3

Solute carrier family 6
(neurotransmitter transporter,
dopamine), member 3

Eating disorders, attention
deficit
-
hyperactivity disorder,
Major affective disorder 1

SLC6A4

Solute carrier family 6
(neurotransmitter transporter,
dopamine), member 3

Anxiety, Obsessive
-
compulsive
disorder

Empirical
p

value < 0.001

Where Innovation Is Tradition

Motivating examples

3. Schizophrenia, bulimia, depression and psychosis



Where Innovation Is Tradition

Motivating examples

Autism

and

Cerebral


palsy



Gene Symbol

Gene Name

Comment (OMIM)

PTGS2

prostaglandin
-
endoperoxide

synthase

2
(prostaglandin G/H
synthase

and
cyclooxygenase
)

Prostaglandin synthesis

APOE

Apolipoprotein

E

Alzheimer disease
-
2, Hyperlipoproteinemia, type III,
Myocardial infarction susceptibility

SERPINE1

Serpin

peptidase inhibitor,
clade

E (
nexin
,
plasminogen

activator inhibitor type 1),
member 1

Plasminogen

activator inhibitor
-
1 deficiency 613329

{Transcription of
plasminogen

activator inhibitor,
modulator of}

TFPI

Tissue factor pathway inhibitor (lipoprotein
-
associated coagulation inhibitor)

Also known as lipoprotein
-
associated coagulation
inhibitor

ITGB3

Integrin
, beta 3 (platelet glycoprotein
IIIa
,
antigen CD61)

Glanzmann

thrombasthenia
,
purpura

posttransfusion
,
thrombocytopenia, neonatal
alloimmune

,
susceptibility to Myocardial infarction.

TNF

Tumor necrosis factor

Asthma, cardiovascular disease

MTHFR

5,10
-
Methylenetetrahydrofolate
reductase


Vascular disease, Schizophrenia

Where Innovation Is Tradition

Conclusions

1.
An indexing algorithm for pattern matching
has been successfully implemented to link
HOP and GAD with a low error rate (2%)

2.
OGA has been implemented and released

3.
One motivating example has results deviating
from the null

4.
Bioinformatics paper is under review


Where Innovation Is Tradition

Future directions

1.
Evaluation of observations by network
analysis or combined effect models

2.
Extension of OGA by adding
GWASdb

3.
Extension of backbone to include DO

4.
Automated updates

5.
Automated model evaluation from
dbGAP


Where Innovation Is Tradition

Thank you

Dr. Jeffrey
Solka

Dr.
Iosif

Vaisman

Dr. Patrick M
Gillevet

Dr. David Hirschberg

Dr.
Vishwesh

Mokashi