download presentation - Linked2Safety

levelsordData Management

Nov 20, 2013 (3 years and 27 days ago)

106 views

Linked2Safety Project
(FP7
-
ICT
-
2011
-
7


5.3
)

A NEXT
-
GENERATION, SECURE LINKED DATA MEDICAL INFORMATION SPACE FOR

SEMANTICALLY
-
INTERCONNECTING ELECTRONIC HEALTH RECORDS

AND CLINICAL TRIALS SYSTEMS

ADVANCING PATIENTS SAFETY IN CLINICAL RESEARCH


12
th

International Conference on Bioinformatics and Bioengineering,
Larnaka


Athos Antoniades


FP7, ICT
-
2011


5.3


Page
2




Why Share Data?


What are the current legal and ethical limitations?


How have scientists shared medical data so far?


Key Problems


Perturbation


Cell Suppression



FP7, ICT
-
2011


5.3


Page
3


Why share data:


Replication Testing


Statistical Power


Multiple Testing Problem



Legal and Ethical Issues


Anonymization

vs

Pseudoanonimization


Limitations derived from consent form signed by subjects


Other, regional, study, or subject specific issues.






FP7, ICT
-
2011


5.3


Page
4



aa

aA

AA

Case

U
00

U
01

U
02

Control

U
10

U
11

U
12


FP7, ICT
-
2011


5.3


Page
5

A paper that analyzes data
f
rom a specific study reports:









Marital Status

Age

Age

Married

Widowed

Single

0
-
16

0

1

50

18
-
24

10

5

50

25
-
34

40

7

40

35~

60

15

20


FP7, ICT
-
2011


5.3


Page
6

A paper that analyzes data
f
rom a specific study reports:









Marital Status

Age

Age

Married

Widowed

Single

0
-
16

0

1

50

18
-
24

10

5

50

25
-
34

40

7

40

35~

60

15

20


FP7, ICT
-
2011


5.3


Page
7

A paper that analyzes data
f
rom a specific study reports:









Marital Status

Age

Age

Married

Widowed

Single

0
-
16

0

1

50

18
-
24

10

5

50

25
-
34

40

7

40

35~

60

15

20


FP7, ICT
-
2011


5.3


Page
8

Paper 1 that analyzes data
f
rom a specific study reports:









Marital Status

Age

Age

Married

Widowed

Single

0
-
16

NA

NA

50

18
-
24

10

7

50

25
-
34

40

7

40

35~

60

15

20

Marital Status

Age

Age

Married

Widowed

Single

0
-
16

NA

NA

50

18
-
25

10

8

50

26
-
35

45

7

40

36~

55

14

20

Paper 2 that analyzes data
f
rom the same study reports:



FP7, ICT
-
2011


5.3


Page
9

Original Data









Marital Status

Age

Age

Married

Widowed

Single

0
-
16

0

1

50

18
-
24

10

7

50

25
-
34

40

7

40

35~

60

15

20

Marital Status

Age

Age

Married

Widowed

Single

0
-
16

NA

NA

51

18
-
24

9

8

49

25
-
34

40

7

41

35~

61

14

21

Perturbation (+
-
1
) and

Cell Suppression (<5)


FP7, ICT
-
2011


5.3


Page
10


Most common parameters tested

Perturbation:[0], [
-
1,1], [
-
3,3], [
-
5,5], [
-
10,10]

Cell
Supression
: <0, <=1, <=3,<=5,<=10



Standard main effect test using

Chi
Square



Pearson’s Correlation Coefficient used to
evaluate deviation of each parameter
combination to original results.



A
-
priory defined threshold for Pearson’s
correlation coefficient <=0.95.


FP7, ICT
-
2011


5.3


Page
11


FP7, ICT
-
2011


5.3


Page
12

Objectives:



Design and develop the data mining techniques and the scalable
infrastructure for the identification of phenotypic and genetic
associations related to adverse events.



Develop new and implement existing state of the art analytical
approaches for genetic data.



Define and implement the knowledge extraction and filtering
mechanisms and the knowledge base



Integrate the knowledge base into a lightweight decision support
system (Adverse events early detection mechanism)



FP7, ICT
-
2011


5.3


Page
13


FP7, ICT
-
2011


5.3


Page
14

Provides
the tools for identifying and removing erroneous
data or data that do
not conform
to the quality standards
that a user might
define.


Tools:



Hardy
-
Weinberg Equilibrium Test



Allele Frequency Test



Missing Data Test


FP7, ICT
-
2011


5.3


Page
15

Provides the tools for
removing redundant or irrelevant
features from a dataset.


Tools:



Rough Set Feature Selection



Information Gain Feature Selection



Chi Squared Feature Selection





FP7, ICT
-
2011


5.3


Page
16


FP7, ICT
-
2011


5.3


Page
17

Provides the tools for performing single hypothesis testing
on a dataset and test for associations.


Tools:


Pearson’s Chi Square Test


Fisher’s Exact
Test


Odds
Ratio


Binomial Logistic
Regression


Linkage Disequilibrium


Genetic
Region Based Association
Testing


FP7, ICT
-
2011


5.3


Page
18

Provides the tools for performing data mining analyses on
a dataset and extract association rules.


Tools:


Association Rules (
apriori
)


Decision Trees with Percentage Split (C4.5)


Decision Trees with Cross Validation (C4.5)


Random Forest with Percentage Split


Random Forest with Cross Validation


FP7, ICT
-
2011


5.3


Page
19


FP7, ICT
-
2011


5.3


Page
20


FP7, ICT
-
2011


5.3


Page
21


Knowledge Extraction Mechanism



This mechanism is responsible for storing statistically
significant associations and important association rules in the
Linked2Safety knowledge database



Has two steps:



Logging system



Storing important knowledge




Filtering mechanism



This mechanism allows users to insert or delete associations
and association rules



FP7, ICT
-
2011


5.3


Page
22


Uses the knowledge in the L2S knowledge base


Runs in the background to identify new associations and
association rules


Reruns analyses when updated datasets are available


Creates alerts for patients profiles associated with adverse
events


FP7, ICT
-
2011


5.3


Page
23


FP7, ICT
-
2011


5.3


Page
24


FP7, ICT
-
2011


5.3


Page
25

Overlapping non genetic data of at least 2 data providers:



Variables

Age

Weight

gain

Gender

Headaches

BMI

Gastrointestinal

symptoms

Smoking

Ever

Ophthalmological

problems

Dyslipidemia

Type

of

ophthalmological

condition


Diabetes

High

blood

pressure

Diabetes

type

I

Heart

conditions

exist

Diabetes

type

II

Type

of

heart

condition

Anemia

Hypertension

Depressive

personality

disorder

Myocardial

infarction

Major

depressive

disorder

Stroke

Schizotypal

personality

disorder

Coronary

heart

disease


FP7, ICT
-
2011


5.3


Page
26

We were able to identify for a given dataset the maximum
noise that can be added to the data without significantly
affecting the outcomes.


Results presented are only relevant to MASTOS, all other
datasets need to repeat the analytical approach described
to determine the maximum noise that can be added to the
results.


Further investigation is necessary to identify the minimum
parameter settings to satisfy legal and ethical requirements.




FP7, ICT
-
2011


5.3


Page
27





Athos
Antoniades

University of Cyprus

email:
athos@cs.ucy.ac.cy