Searching and Exploring Biomedical Data

hurriedtinkleΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

62 εμφανίσεις

Searching and Exploring
Biomedical Data

Vagelis Hristidis

School of Computing and Information Sciences

Florida International University

Roadmap


Why is it challenging to search EMRs?


XOntoRank
: Leveraging Ontologies to
improve sensitivity in EMR search


ObjectRank: Use authority flow to rank
EMR entities


BioNav
: Using
MeSH

to explore the
results of PubMed queries


2

Vagelis Hristidis, Searching and Exploring Biomedical Data

Roadmap


Why is it challenging to search EMRs?


XOntoRank
: Leveraging Ontologies to
improve sensitivity in EMR search


ObjectRank: Use authority flow to rank
EMR entities


BioNav
: Using
MeSH

to explore the
results of PubMed queries


3

Vagelis Hristidis, Searching and Exploring Biomedical Data

4

ELECTRONIC MEDICAL
RECORDS (EMRs)


Adoption of EMRs hard due to political reasons


No unique patient id


Confidentiality


HIPAA (Health Insurance Portability and Accountability Act)


Move towards XML
-
based format.


One of most promising:

Health Level 7’s Clinical Document Architecture (CDA).


EMRs pose new challenges for Computer Scientists


Confidentiality, authentication, secure exchange


Storage, Scalability


Dictionaries, terms disambiguation


Search for interesting patterns (Data Mining)


Data Integration, Schema mapping


Searching and Exploring

Vagelis Hristidis, Searching and Exploring Biomedical Data

5

SAMPLE CDA FRAGMENT

Vagelis Hristidis, Searching and Exploring Biomedical Data

6

CDA Document


Tree View

Vagelis Hristidis, Searching and Exploring Biomedical Data

7

LIMITATIONS OF

Traditional
IR General
XML Search


Text
-
based search engines
do not exploit the XML
tags, hierarchical structure
of XML


Whole XML document
treated as single unit
-

unacceptable given the
possibly large sizes of XML
documents


Proximity in XML can also
be measured in terms of
containment edges


EMRs have known but
complex semantics


EMRs include free text,
numeric data, time
sequences, negative
statements.


Routine references in
EMRs to external
information sources like
dictionaries and ontologies.


Vagelis Hristidis, Searching and Exploring
Biomedical Data

Syntax vs. Semantics in Schema


8

Example


query “Asthma Theophylline”

More details at

[Hristidis et al. NSF Symposium on Next Generation of Data
Mining ’07]

Vagelis Hristidis, Searching and Exploring Biomedical Data

Roadmap


Why is it challenging to search EMRs?


XOntoRank
: Leveraging Ontologies to
improve sensitivity in EMR search


ObjectRank: Use authority flow to rank
EMR entities


BioNav
: Using
MeSH

to explore the
results of PubMed queries


9

Vagelis Hristidis, Searching and Exploring Biomedical Data

XOntoRank
: Leverage Ontological
Knowledge


Algorithm to enhance keyword search using
ontological knowledge (e.g., SNOMED) [ICDE’08
poster, ICDE’09 full paper]

10

Medical
Dictionary
Medical Dictionary
50043002
Disorder of
Respiratory system
79688008
Respiratory
Obstruction
Is a
118946009
Disorder of
Thorax
41427001
Disorder of
Bronchus
Is a
195967001
Asthma
Is a
Is a
301229001
Bronchial
Finding
Is a
405944004
Asthmatic
Bronchitis
Is a
May be
266364000
Asthma attack
Is a
May be
955009
Bronchial Structure
Finding site of
Finding site of
Finding site of
82094008
Lower respiratory tract
structure
Is a
Vagelis Hristidis, Searching and Exploring Biomedical Data

Example 1

q = {“bronchitis”, “
albuterol
”}


result =

Observation

code

value


Bronchitis

value


Albuterol

11

Vagelis Hristidis, Searching and Exploring Biomedical Data

Example 2

q = {“asthma”, “
albuterol
”}

result = ???

12

Vagelis Hristidis, Searching and Exploring Biomedical Data

XOntoRank


A CDA node may be associated to a query
keyword
w
through ontology.


XOntoRank

first assigns scores to ontological
concepts


OntoScore
OS()
: Semantic relevance of a concept
c

in
the ontology to a query keyword
w
.


Then, given these scores, assign Node Scores
NS()
to document nodes



Other aggregation functions are possible.


13

Vagelis Hristidis, Searching and Exploring Biomedical Data

Computing
OntoScore

of Concept
Given Query Keyword


Three ways to view the ontology graph:


As an unlabeled, undirected graph.


As a taxonomy.


As a complete set of relationships.

14

Vagelis Hristidis, Searching and Exploring Biomedical Data

Roadmap


Why is it challenging to search EMRs?


XOntoRank
: Leveraging Ontologies to
improve sensitivity in EMR search


ObjectRank: Use authority flow to rank
EMR entities


BioNav
: Using
MeSH

to explore the
results of PubMed queries


15

Vagelis Hristidis, Searching and Exploring Biomedical Data

Authority Flow Ranking in EMRs

A subset of the electronic health record dataset.

Work under submission.


EventsPlan

TimeStampCreated
=

2004
-
11
-
03
11
:
57
:
00
.
0
"
Events
=
”…
.
small
residual
pericardial effusion

..

Hospitalization
TimeStampCreated
=

2004
-
10
-
27 22
:
00
:
00
.
0
"
History
=

18
year old boy with an aggressive
form of chest lymphoma…”
Allergies

=
“NKDA”…
...
Cardiac

PatientID
=

1438
"
Complication
=
”apical impulse … Echo
-
large increasing
pericardial effusion
…”
Employee

TimeStampCreated
=

2004
-
12
-
23 14
:
03
:
00
.
0
"
Title
=
”Pediatric
Cardiologist”…
.
EventsPlan

Events
=

4
month
old baby…
pericardial effusion
...

Medication

TimeStampCreated
=

20
03
-
02
-
13 21
:
57
:
00
.
0
"..
Hospitalization

History
=

48
year old
..

v
1
v
7
v
2
v
3
v
4
v
5
v
6
prescribed
_
by
associated
_
with
associated
_
with
prescribed
_
to
recorded
_
by
recorded
_
by
Query: “pericardial effusion”

16

Vagelis Hristidis, Searching and Exploring Biomedical Data

Authority Flow Ranking

Schema of the EMR dataset

Hospitalization
Employee
Associated
_
Events
Patient
Medication
A
-
E
P
-
M
H
-
M
M
-
E
A
-
H
H
-
E
P
-
E
created
_
by
recorded
_
by
prescribed
_
by
of
prescribed
_
to
for
created
_
by
17

Vagelis Hristidis, Searching and Exploring Biomedical Data

User Study

18

Vagelis Hristidis, Searching and Exploring Biomedical Data

Explaining Subgraph

19

Vagelis Hristidis, Searching and Exploring
Biomedical Data

User Study Results

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CO085BM25
BM25
CO085
CO030
Average Sensitivity
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
CO085BM25
BM25
CO085
CO030
Average Specificity
Mean Sensitivity



Mean Specificity

BM25: Traditional Information Retrieval Ranking Function

CO: Clinical ObjectRank (Authority Flow)

20

Vagelis Hristidis, Searching and Exploring Biomedical Data

Roadmap


Why is it challenging to search EMRs?


XOntoRank
: Leveraging Ontologies to
improve sensitivity in EMR search


ObjectRank: Use authority flow to rank
EMR entities


BioNav
: Using
MeSH

to explore the
results of PubMed queries


21

Vagelis Hristidis, Searching and Exploring Biomedical Data

Biological Databases (cont’d)


Results Navigation
[ICDE09, TKDE 2010]


With SUNY Buffalo.


Demo at http://db.cse.buffalo.edu/bionav/


Most publications in PubMed annotated
with Medical Subject Headings (
MeSH
)
terms.


Present results in
MeSH

tree.


Propose navigation model and smart
expansion techniques that may skip tree
levels.

22

Vagelis Hristidis, Searching and Exploring
Biomedical Data

BioNav
: Exploring PubMed Results

Static Navigation Tree

for query “prothymosin”

MESH

(313)

Amino Acids, Peptides, and Proteins (310)

Proteins (307)

Nucleoproteins (40)

Biological Phenomena, … (217)

Cell Physiology (161)

Cell Growth Processes (99)

Genetic Processes (193)

Gene Expression (92)

Transcription, Genetic (25)

95 more nodes

2 more nodes

45 more nodes

4 more nodes

3 more nodes

15 more nodes

10 more nodes

1 more node

Histones

(15)

-

Query Keyword:
prothymosin

-

Number of results:
313

-

Navigation Tree stats:



# of nodes:
3941



depth:
10



total citations:
30897


Big
tree with many
duplicates
!

23

Vagelis Hristidis, Searching and Exploring Biomedical Data

BioNav
: Exploring PubMed Results

Reveal to the user a selected set of
descendent

concepts
that:

(a)
Collectively contain all results

(b)
Minimize the expected user navigation cost

Not all children of the root are necessarily revealed as in
static
navigation.


24

Vagelis Hristidis, Searching and Exploring Biomedical Data

BioNav

Evaluation

0
2
4
6
8
10
12
14
16
18
20
Overall Navigation Cost
(# of Concepts Revealed + # of EXPAND Actions)
Static
BioNav
25

Vagelis Hristidis, Searching and Exploring
Biomedical Data

References


Abhijith Kashyap, Vagelis Hristidis, Michalis Petropoulos, and
Sotiria

Tavoulari
.
Effective Navigation of Query Results Based on Concept Hierarchies
. IEEE
Transactions on Knowledge and Data Engineering (TKDE) 2010


Fernando
Farfán
, Vagelis Hristidis,
Anand

Ranganathan
, and Michael Weiner.
XOntoRank
: Ontology
-
Aware Search of Electronic Medical Records
. IEEE
International Conference on Data Engineering (ICDE) 2009


Abhijith Kashyap, Vagelis Hristidis, Michalis Petropoulos, and
Sotiria

Tavoulari
.
BioNav
: Effective Navigation on Query Results of Biomedical Databases
. IEEE
International Conference on Data Engineering, ICDE 2009


Vagelis Hristidis, Fernando
Farfán
, Redmond P. Burke, Anthony F. Rossi, Jeffrey A.
White.
Information Discovery on Electronic Medical Records
. National Science
Foundation Symposium on Next Generation of Data Mining and Cyber
-
Enabled
Discovery for Innovation (NGDM) 2007

Supported by


NSF IIS
-
0811922: Information Discovery on Domain Data Graphs, 2008
-
2011


NSF CAREER IIS
-
0952347, 2010
-
2015

26

Vagelis Hristidis, Searching and Exploring Biomedical Data