3.4 Ontology - NTNU

splashburgerInternet και Εφαρμογές Web

22 Οκτ 2013 (πριν από 3 χρόνια και 7 μήνες)

64 εμφανίσεις

1


Summary

This master thesis is part of the PASTAs project. The main goal of PASTAs is to determine what happens
to chronically ill patients as they are moved between their hospital, their primary doctor and other
services offered by the local authorities.

In this thesis I should cover some of PASTAs' sub
-
goals, such as:

A1. Develop an automated tool. This tool should be able to extract data related to each patient from
Norwegian Patient Register (NPR) data. Furthermore, this tool should transfer the NPR dat
a to the Event
Stream (ES) format.

A2. Identify clusters of actual trajectories. This clustering will be done by using the ICD
-
10 ontology. The
ontology will help classifying the events related to diagnosis into chronic or non
-
chronic diseases.

A3. Devel
op a visualization of the trajectories. These trajectories are related to the chronically ill
patients. For doing the visualization, a tool (Patient Explorer) from a previous master thesis project will
be utilized.

The main focus will be on goals A1 and A2
. Goal A3 will be accomplished, only if there is extra time.














2


Contents

1.

Introduction

................................
................................
................................
................................
..........

3

1.1

Motivation

................................
................................
................................
................................
.....

3

1.2

Our approach to the problem

................................
................................
................................
.......

3

1.3

Case to be investigate

................................
................................
................................
...................

3

1.4

Available Data

................................
................................
................................
...............................

4

1.5

Methodology

................................
................................
................................
................................
.

4

1.6

Outline
................................
................................
................................
................................
...........

4

2.

Background

................................
................................
................................
................................
...........

4

2.1

Event Stream

................................
................................
................................
................................
.

4

2.2


Temporal Event

................................
................................
................................
............................

4

2.3

Relate
d Works

................................
................................
................................
...............................

4

2.3.1

Computer
-
based Patient Record (CPR)

................................
................................
.................

5

2.3.2

Knave
-
II

................................
................................
................................
................................
.

5

2.3.3

Patient Explorer

................................
................................
................................
....................

5

2.4

Visualization

................................
................................
................................
................................
..

5

3.

Propos
ed Design

................................
................................
................................
................................
...

5

3.1


Domain

................................
................................
................................
................................
.........

5

3.2


Files

................................
................................
................................
................................
..............

5

3.2.1


Medical codes

................................
................................
................................
......................

5

3.3


Rec
ognizing chronic diseases

................................
................................
................................
.......

6

3.3.1

Manually

................................
................................
................................
...............................

6

3.3.2

Automatically

................................
................................
................................
........................

6

3.4


Ontology

................................
................................
................................
................................
.......

6

3.4.1


Jena

................................
................................
................................
................................
......

6

3.5

Database

................................
................................
................................
................................
.......

7

3.6

Create episodes

................................
................................
................................
.............................

7





3


1.

Introduction

This master thesis is part of the PAtientS TrAjactorieS (PAsTAs) project. In this project we are going to
develop

a
methodology to present data in the Electronic Medical Record (EMR) format. This
presentation should show patients’ trajectories

through Norwegian health care system which includes
hospitals, prima
ry cares and local organizations that offer health services
.


1.1

Motivation

Up to now, the health care organizations like hospital, primar
y care and
local care organizations

hav
e
kept patients’ records separately
.

Therefore, it is
hard for researchers to have an overview
of the
patient’s trajectory
.

Patient trajectory is

a curve which presents

the process and current status of a
patient during encounters with health care systems.

There are two axes in the patient trajectory. D
ate

is
in horizontal axis,

and event can be observed

in the vertical axis
. Date could be one day like visiting his
primary care or a period of time such as receiving a service or being hospitalized. Event he
re could be
diagnosis or the service he receives in a specific date.



By visualizing the patient trajectory, researchers can observe how the treatment’s plan has worked for
the patient
. Furthermore, they can observe
if there is any progress in the treat
ment [
6
].

1.2

Our approach to the problem

In this thesis we are going to focus on
events that are related to
chronic disease. Chronic disease is a
disease that has long duration and it progresses very slowly. The diseases that are considered chronic
are such
as heart disease, cancer, HIV and diabetes. [
4
]

A
ccording to Anselm [
8
] a

pa
tient trajectory in normal situation

ends when the patient leaves the
hospital but in chronic
diseases
it can continue on with trajectory work in clinics, repeated visits to the
hospital and physicians’ offices
.

Therefore it is important to recognize whi
ch patients have chronic
diseases

in order to be

able to follow their trajectories

correctly without missing any related data.

In order to have a complete patient trajectory, we al
so need to analyze the events that are not related
to chronic diseases
, and to classify them properly.

1.3

Case to be investigate

In order t
o be able to recognize chronic diseases, we need to use International Classification of Disease
(ICD
-
10). A filter can
be developed by this international classification codes, and system can distinguish
chronic diseases from nom
-
chronic ones.

The issue here is that there is not any chronic classification in ICD
-
10, so we need to find a way in order
to recognize chronic di
seases automatically in ICD
-
10 ontology.

The next step is to classify events. Here we will call this classification episode. We want to separate the
events that are related to each other. In order to this classification we need to analyze the data and fin
d
all possible episodes. For instance
a patient might have received a service due to diagnosis recognition in
the hospital.

4


1.4

Available Data

Which organization has provided the data?

1.5

Methodology

Case study?

1.6

Outline

2.

Background

In this chapter
different meth
odologies and tools are going to be discussed. Also we are going to
evaluate them with our existing dataset in order to see if they can be used in our tool.

2.1

Event Stream

Each dataset file can be considered as a data table. A data table contains rows (r
ecords) and columns
(fields). In Electronic Medical Records (EMR) systems, a data table keeps information related to some
specific aspect of the patient’s care. These data tables are linked together by an identifier column.
Therefore it is possible to the

query on them.

[
5
]


Th
e required information for analyzing can be extracted from EMR by querying the data tables. This
extracted information is called Event Stream (ES).

The simplest version of ES contains three elements. First column is an identifier which in this case is
patient’s identification number. The second column is the event. The event tells what kind of event is on
the row. And the last column is time.

In even
t stream time is considered one date. It could be in milliseconds or in days format. For instance
patient 1 has been hospitalized in date 0. And patient number 1 has been tested in date 23.


In ES more columns can be added if it is required.

In our case w
e have a period of time which means each event has a start date and an end date.
Therefore
we need to find another format of events.

2.2


Temporal Event

In this method, they have classified a patient’s history into sequence of episodes.

e
1
, e
2
,…,e
n

Each
episode contains different events

[7]
.

e
i
=
< PID
i
1
,PID
i
2
,…,PID
ik
>


2.3

Related Works

Write about previous works and what are their differences

5


2.3
.1


Computer
-
based Patient Record (CPR)

2.3.2


Knave
-
II

2.3.3


Patient Explorer


2.4

Visualization

Explain

how we are going to use Patient Explorer

3.

Proposed Design

3.1


Domain

Here I will explain about the domain in general. What kommune and hospital files are.

3.2


Files

In detail explain different columns in kommune and hospital

3.2.1


Medical codes

ICD
-
10:
ICD is abbreviation of the International Classification of Diseases. ICD
-
10 classifies medical
diagnosis by alphabetical index.

The alphabetical index has been divided into different categories [1] [2]:

1)

The physical diseases, syndromes, disorders
, problems and diagnosis that are known by
physicians, such as: eye disease, metabolic disorder, pregnancy problem and disorders
involving immune mechanism (A00


E90, G00


Q99).

2)

Mental and behavioral disorders (F00


F98)

3)


External causes of injuries o
r death



Injury, poisoning and certain other consequences of external causes (S00


T98) such as:
Effects of foreign body (like metal shard, eye lash in general external object) entering
through natural orifice.



External causes of morbidity and mortalit
y (V01


Y98) such as: accidents, Internal self
-
harm
and so on.

4)

Symptoms with less certainty or unknown



Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified



Symptoms and signs (R00


R69)



Abnormal findings on examination
(R70


R99)



Codes for special purposes, they are subcategories of unknown diseases and their
causes.



Provisional assignment of new diseases of uncertain etiology (U00
-

U49)



Bacterial agents resistant to antibiotics (U80


U89)

ICPC2

6


NCMP

NCSP

3.3


Recog
nizing chronic diseases

One of the goals in this project is to recognize chronic diseases. In order to be able to do it automatically
I used International Classification of diseases (ICD) coeds. ICD is used to classify diseases and other
health problems [
3
]. But in this classification they have not group chronic diseases.

3.3.1

Manually

For recognizing chronic disease I first tried to only use ICD10 Norwegian ontology. In this ontology, there
are classified ICD10 codes according Umls
-
SemanticType
. They have add an attribute as Umls
-
SemanticType and classified the codes as disease or syndrome, finding, health care activity, congenital
abnormality, mental or behavioral dysfunction, injury or poisoning, pathological function, neoplastic
process or pa
thologic function.

Chronic Condition Indicator (CCI) provides classification for chronically disease on ICD9 codes. In this list
value 1 is for chronic disease and 0 for non
-
chronic [9]. The problem was there was no CCI for ICD10
codes. Therefore I had de
cided to convert ICD9 to ICD10 codes and then manually add an attribute as
Chronic in ICD10 ontology.

The problem was it was time consuming and I could not be one hundred percent sure that the codes are
correct.

3.3.2

Automatically

But after a while I
had International Classification of Primary Care, second edition (ICPC2) codes [10].
These codes were divided into acute, chronic and not diseases categories.

In ICD10 ontology for each code there are relative ICPC2 codes, so I had decided to use ICPC2
codes for
recognizing chronic diseases in ICD10.

In ICPC2 codes there were some codes that were categorized as acute but in ICD10 were recognized as
chronic. When I looked up more this issue, found out that some acute diseases because they take more
than
3 months therefore they are considered as chronic diseases. For solving this problem I have
decided first query chronic ICPC2 codes in ICD10 ontology and later for completing my list, I will query
chronic as it is mentioned in the name tag in ICD10 ontolog
y.

3.4


Ontology


3.4.1


Jena

In order to extract the information we require related to chronic diseases, I use Jena. Jena is an Apache
platform for semantic web application. It provides tools and java libraries for developing semantic web
[11].

Jena has
different formats, such as, RDF/XML, N3, Turtle and N
-
triples. The ICD10 Web Ontology
Language (OWL) file has RDF/XML format.
For reasoning and querying the ontology I will use Simple
7


Protocol and RDF Query Language (SPARQL)
. SPARQL is a SQL
-
like language
for querying RDF data [12].
We can query ontology in two ways; one way is to do it through SPARQL panel in protégé. The other way
is to do it through java application by using Jena library. In this project I have decided to do the query in
java application
, because later on it would be easier for user to apply any changes (flexibility).

Owl has three sublanguages, Owl Lite, OWL DL and OWL Full. The ontology is using OWL Lite language.
OWL Lite was initially built to support hierarchy classification. It supports cardinality constraints and it
only permits 0 or 1 as values.




3.5


Database

Explain about the database tables and their relations

3.6

Create episodes

Explain how we queried database and what
the episodes are







8


Reference

[1] ICD
-
10
-
CM Official Guidelines for Coding and Reporting 2012,
[
http://www.cdc.gov/nchs/data/icd10/10cmguidelines2012.pdf
]

[2] ICD
-
10 Version 2012, [
http://apps.wh
o.int/classifications/icd10/browse/2010/en#/S05.0
]

[3]
http://www.who.int/classifications/icd/en/

[4]

Chronic disease, WHO,
http://www
.who.int/topics/chronic_diseases/en/
, visited in January 2013

[5] Event
-
Stream Format, Mikel Aickin, pp. 1
-
21, 2011

[6] Visualizing Patient Trajectories on Wall
-
Mounted Boards


Information Security Challenges. By Arild
Faxvaag, Lillian Røstad, Inger A.Tøn
del, Andreas R. Seim, Pieter J. Toussaint. IOS press, 2009

[7] “A Workbench for Temporal Event Information Extraction from Patient Records” Svetla Boytcheva
and Galia Angelova, pp.48
-
58, AIMSA 2012.

[8] “Grounded Theory in Practice”. By Anselm Strauss and
Juliet M. Corbin, March 11, 1997, pp. 231
.

[9]
http://www.hcup
-
us.ahrq.gov/toolssoftware/chronic/chronic.jsp

[10]
http://www.who.int/classifications/icd/adaptations/icpc2/en/index.html

[11]
http://jena.apache.org/

[12]
http://opentox.org/data/documents/development/RDF%20files/JavaOnly/query
-
reasoning
-
with
-
jena
-
and
-
sparql