Semantic Annotation - University of Southern California

zurichblueInternet and Web Development

Oct 21, 2013 (3 years and 9 months ago)

78 views


SPE
-
153272
-
PP

Recovering Linkage Between Seismic Images and Velocity Models

Jing Zhao
, Charalampos Chelmis, Vikram Sorathia, Viktor Prasanna,

Abhay Goel,

University of Southern
California

Copyright 20
1
2
, Society of Petroleum Engineers


This paper was prepar
ed for presentation at the
SPE Western North American Regional Meeting

held in
Bakersfield, California, USA
,
19

23 March

201
2
.


This paper was selected for presentation by an S
PE program c
ommittee following review of information contained in an abstract su
bmitted by the author(s). Contents of th
e paper have not
been
reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessar
ily reflect any posit
ion of the Society of Petroleum
Engin
eers, its
officers, or members.
Electronic reproduction,
distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohi
bited. Permission to
reproduce in p
rint is restricted to an
abstract of no
t more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of
SPE copyright.


Abstract

Seismic processing and interpretation involves resource intensive processing in the petroleum exploration domain. By
em
ploying various types of models, seismic interpretations are often derived in an iterative refinement process, which may
result in multiple versions of seismic images. Keeping track of the derivation history (a.k.a
.

provenance) for such images thus
becomes

an important issue for data management. Specifically, the information about what velocity model was used to
generate a seismic image is useful evidence for measuring the quality of the image. The information can also be used for
audit trail and image repr
oduction. However, in practice, existing seismic processing and interpretation systems do not
always automatically capture and maintain this
type of provenance information.


In this paper, by employing state
-
of
-
the
-
art techniques in text analytics, semanti
c processing and machine learning, we
propose an approach that recovers the linkage between seismic images and their ancestral velocity models when no
provenance information is recorded. Our approach first retrieves information from file/directory names of

the images and
models, such as project names, processing vendors, and algorithms involved in the seismic processing and interpretation.
Along with the creation timestamps, the retrieved information is associated with corresponding images

and models as
met
adata. The metadata of a seismic image and its a
ncestral models usually satisfy

certain relationships. In our approach, we
detect and represent
such

relationships as rules, and a matching process utilizes the rules and retrieved m
etadata to find the
best
-
m
atching

images and models.


In practice, images


and models


file
name
s

often do not adhere to
naming standard
s

and they are
stored with
out following
well established

record keeping practices. Users may also
use

different terms to express the same informat
ion in file/directory
names. We employ Semantic Web technologies to ad
dress this challenge. We develop

domain ontologies with OWL/RDFs,
based on which we provide an interactive way for users to semantically annotate terms contained in file/directory names.

All
metadata used by the image
-
model matching process is represented as ontology instances.
M
atching can be
performed

using
the standard semantic query language. The evaluation results show

that our approach can achieve

satisfying accuracy.


Introduction

Petroleum exploration and production domain employs various scientific methods that involve complex workflows
requiring

advanced computational, storage, sensing, networking, and visualization capability

[17]
.

Effective data management becomes
critical with

increased
regulatory

requirements for reporting and standard compliance

[19]
.
Large data volumes in
SCADA

systems
, data historian systems, hydrocarbon accounting systems, systems of records and other production and operation
systems can be managed with re
lative ease due to their structured
nature and well
-
documented schema. However,
other
specialized
system
s

used for imaging
,
analysis
, optimization, fore
casting, and scheduling involve
complex
scientific
workflows

handling data models that are recognized by

specific vendor products only
. Engi
nee
rs and geoscientists handle

various kinds of
data

by subjecting them to complex domain specific models and algorithms that result in large number of
derived datasets
,
unstructured

or
semi
-
structured
, which

are stored
with little or no metadata
describing

their

deriv
ation
history
. Once the analysis is complete, and resulting datasets are transferred in a storage repository, retrieval at later stage
s
becomes

diffi
cult, time consuming and labor intensive
.


Seismic imaging is
a
scientific domain that is being increasingly employed not only in exploration, but also in other stages of
2


SPE
SPE
-
153272
-
PP

E&P lifecycle

[20]
. A typical
seismic

imaging

workflow involves various
steps

including data collection, data processing,
model
building, data interpretation, analysis, rendering and visualization.
Seismic image processing and interpretation
involves highly interactive and iterative process
es

that requ
ire

loading, storing, referencing, and rendering of large volumes
of datasets

[17
]
. This
requires

large amount
s

of computation and storage capability in addition to domain specific software
products and tools capable in hand
ling, manipulating and visualizing

such

datasets. Geoscientists skilled in modeling,
characterization
,

and interpretation employ various techniques and generate large amounts of intermediate datasets during
this process.

From 2D, 2.5D, and 3D surveys; pre
-
stacking, post
-
stacking approaches; various types of migration algorithms;
there are several technique
s proposed for various types of geological structures [18]. In a typical workflow, velocity model or
earth model
is
first generated for specific geological structure, which is then used
to

i
nterpret

th
e

results in seismic volumes.
Typically, this workflow
is repeated

with some variations in interpretation parameters until best representations are found
,
thereby

resulting

in large amounts of volumes for a given velocity model.


Data generated during this
process
is generally retained with

no

or incomplete

me
tadata
.
Over a period,
data

repositories
receive

contribut
ions

by large teams of geoscientists working on multiple projects.
In absence of proper metadata and record
keeping
practices
, seismic datasets lose

the context in which they were created.

I
n order
to be useful in decision
-
making
, all
derived volumes must retain the link to the original velocity models [
17
].
The interpreters have to spend considerable amount
of effort in rediscovering the associated source models.

Without

formal
m
etadata

record
, t
he
file names o
f models may
provide some hints.

H
owever, individual
interpreters

may not

have

follow
ed

consistent

file naming
standards
, and may not
have
use
d

unique

terms to express
the
same semantic meaning
.
This significantly
increases

the time and effort required
to

find the right velocity model from
a

repository.


We argue that with careful application of advanced machine learning, semantic web and text analytics techniques
,

we can
address th
is problem and achieve significant reductio
n
of

the search space.
O
ur
approach
emplo
ys

text analytics to extract
key words used by individual
interpreters

in file names and identify the variations in expressing the

same term. By
introducing S
emantic

Web technologies
, we generate Ontology for
the fi
le
naming convention that contains concepts related
to the seismic interpretation process and their possible expressions. Finally, by introducing
machine
-
learning

techniques, we
implement a matching
system

that enables identification of linkage discovery a
mong
images

and models.

Recovering
linkages
in this manner
can be particularly useful in not only generating the metadata, but also facilitating a
dvance search
capabilities
based on various interpretation techniques and parameters. Establishing the derivat
ion history
is also

useful in
determining the quality and characteristics of seismic volumes.

Motivating
Scenario
.

Figure 1

depicts
the
result of a seismic image interpretation process carried out for BP 2004 Salt Structure Data [
18
]. Here, a
velocity model is utilized with various interpretation techniques and different parameters.
Derived

seismic volumes are stored
by
interpreters i
n
local

disks

or shared network folder
s
. While storing these derived volumes using interpretation sys
tem,
interpreters

select

filename
s that capture

key processing parameters by which the given volume was derived. In this
particular case, interpreter has generated three volumes using three different interpretation parameters. One
-
way and two
-
way
migratio
n technique wa
s performed on

part of the dataset to generate an interpreted volume file. Based on the outcome, the
third interpretation was performed using two
-
way migration technique on the full dataset. These variations are well captured
in the file nam
e
s
. In additi
on, the geological structure type and the dataset name, the volume creation time, and the project
name were also captured in the derived volume file names.


This example provides a good understanding of the file naming convention that has been

followed by
the interpreters. Even
though

proper metadata is not generated for all
derived

seismic volumes, the selection of keywords in file names provides
hint
s

about how particular volume was derived.
Knowing the

Data
s
ets, Project Name and Geological
Structure Type, it
becomes easier to establish link
s

among volumes and the model that was used to derive them. In
Figure 1
, all derived volume
file names indicate

BP_2004
”,


projectbp


and

fslt


that can also be found in the model file name, with
the exc
eption of

fslt

,

which

is expressed as


fullsalt

instead
.
Clearly,

file names not only include information about key parameters, but
interpreters mostly select
the
same terms. However,
the
derived
volumes
include additional parameters
capturing

more
information about the preprocessing and post processing steps, segmentation, and other image loading parameters that were
used. In given example,

full

,

part
”, “
oneway
”, “
twoway
”, “
mig

terms are very specific to the interpretation process and
ther
efore are found in the volume file names only.


We argue that, it is possible to recover the linkages between volumes and associated velocity model by harnessing the hints
provided by interpreters in the filenames. With careful observation of seismic proce
ssing and interpretation workflow and file
naming convention followed in a specific organization, it is possible to establish rules that can help in detecting the linka
ge
between files. For instance, dataset name, project name, geological structure name, a
nd file creation dates all play key roles in
discovering the match. However, pre
-
processing or post
-
processing parameters, file loading parameters etc. can be ignored as
they are specific to volume names only. Designing a system for linkage discovery based

on this matching approach
SPE
SPE
-
153272
-
PP


3

introduces various challenges when implemented for large number of users. As indicted in the example, different users use
different keywords (e.g.

fslt


and

fullsalt

) for the same term (“Full Salt”). The proposed system must t
herefore effectively
handle such variations, and must be able to address semantic and syntactic heterogeneity issues in order to
accurately
establish lost linkages between velocity models and their derived seismic image volumes.



Figure
1
. Example of Seismic Interpretation Process Indicating Generation of Multiple Volumes Using a Velocity Model

System Overview

Figure
2
illustrates the overview of our approach. In general, our approach consists of three steps:

1)

Metadata extraction
. Given a set of seismic images and velocity models, our approach first identifies and extracts
the information that can be used for recovering the linkage between images and models. We employ a
data load
ing

process to retrieve the file names of images and

models and their creation time. The name of a seismic image file or
a velocity model file usually consists

of multiple terms
separated

by “_”
, where each term
captures

information
about

project name, processing vendors, and algorithms involved in the seis
mic processing and interpretation. We
split file names of images and models into individual terms, and clean the terms

by
uti
lizing text analysis techniques
.


2)

Semantic annotation
. The information encoded in the file names
is

the main hint for us to identify the linkage
between images and models. However, as we have discussed, users may use different terms
to express the same
information,
mak
ing

it difficult to directly match seismic images and velocity models based on their n
ames
alone
.
To atta
ck this challenge, we design an

ontology as a global vocabulary to represent the information that may be
encoded in file names. A user
-
interactive semantic annotation process, which is the second step of our approach,
utilizes the ontolo
gy to annotate the terms extracted from file names. Each term is represented as an ontology
instance
that

is stored in an ontology
repository
, and a file name can then be represented by a group of ontology
instances. The group of ontology instances, along
with the creation time, is associated with the corresponding
seismic image

or velocity model as its
attributes
.


3)

Matching
. In the last step, for each seismic image, we identify the velocity models that are probable to have been
used for its creation.
W
e us
e a set of rules to express the relationships that the attributes of an image and its ancestral
model may have. For example, the creation time of the ancestral velocity model of a given image should be within a
certain time window. According to the rules,
we then execute semantic queries

and rules

on image and model
attributes
,

to identify the best
-
match
ing

images and models.


We
describe

each step
in detail

in the following sections.


4


SPE
SPE
-
153272
-
PP



Figure
2
.
Overview of the
S
emantic
B
ased
M
atching
A
pproach

Metadata Extraction

In our use case, all seismic image files and velocity model files are stored in servers
running

Linux operati
ng system. Thus,
we simply use “
ls

l


command to get the standard Linux output containing two fields, i.e., f
ile names and creation time,
separated by spaces.

We use the creation time and file names to extract the terms we use in the following steps in our
approach.


The name of a typical seismic image file or a velocity model file usually consists of multiple te
rms linked with underscore
“_”. For example, one of the

image files is named as “
fslt_psdm
_
bp
_
proj
ectbp
_agc
_il.bri”
. In the file name, each term has its
own semantic meaning, which represents metadata of the corresponding se
ismic image. In this example, “
fslt”

means the
“full
-
salt”

model, and “psdm” refers to the “pre
-
stack depth migration”

imaging algorithm. The image file name contains
these two terms to indicate that the seismic processing system for generating the image has employed the corresponding
m
odel and algorithm.
Table
1

lists the meaning of all terms contained in the example file name.


Term

Semantic Meaning

fslt

The full
-
salt model

psdm

The pre
-
stack depth migration imaging algorithm

bp

The processing vendor name

proj
ectbp

The project name

agc

The image pre
-
processing step

il

The inline sort method

Table
1
.

Terms extracted from the example image file name, and the semantic meaning of the terms

For each file name returned by the “ls”
command, we split the file name

into individual terms. Each image/model file is then
associated with a group of terms. In gener
al, a group of terms usually contain the following information:

1)

Project names
, for
example,

proj
ectbp


in the above example.

2)

Processing vendors
, a
file name may contain terms like

BP

,
“Chevron”,
and

WesternGeco

.

3)

I
maging algorithms, such as the pre
-
sta
ck depth migration algorithm in the example.

4)

Involved models, such as

sediment flood

,

water flood

, and

full
-
salt

.

5)

Version information
, for
example,

v1


means the file is the
first

version of the seismic image. Other terms about
the version informati
on may include

v2

,

new

, and

update1

.

6)

Post
-
processing steps, such as

agc

.

SPE
SPE
-
153272
-
PP


5

7)

Sort order, such as

inline


and

crossline

.

8)

Image loading parameters, such as “subvolume”, “partial” and “full”.


However, n
ot all the
user
-
supplied
terms are useful for the linkage recovery.
Before we proceed further with the extracted
terms, we need to remove
redundant and
useless terms
. For example,
if
the term

agc


cannot be used as a hint for predicting
the a
ncestral velocity models of the
seismi
c image, we do not need to take it into account in the following steps. For the
exemplified

file name
“fslt_psdm_bp_projectbp_agc_il.bri”,
the metadata extraction step will generate the term group
{

fslt

,

psdm

,
“bp”,

projectbp

}.

Matching based on Terms
.

We can
match

images
to

models directly
,

based on terms extracted from file names. Basically, a velocity model may be used
to create a seismic image when we detect the following relationships between them:

1)

The gap between their creation time is within a c
ertain period of time (e.g., around a month).

2)

They share the same or similar model/algorithm name, project name, and/or processing vendors.


Based on this detection, we can compare the creation time and the extracted file name te
rms to identify the best
-
ma
tching

seismic images and velocity models. For example, we
can
match

the seismic image “sedflood_psdm_bp_v5.il.bri”
, which
was created on 03/21/2011,
to

the velocity model “sedflood_psdm_bp.bln”
, which was generated on 02/23/2011, since the
seismic image
file was created nearly one month after the velocity model, and they share the same imaging algorithm, model,
and processing vendors.


However, in practice, users do not always strictly follow the naming standards to name images and models. We use
d

the
Lev
ens
htein Distance
[14]

to calculate the
lexical similarity

between terms, so as to identify terms that represent the same
semantic meaning. For exam
ple, the Levenshtein distance between

full
-
salt


and

fullsalt


is 1, which is a small distance so
that we can infer that the two terms may refer to the same model. However, in some cases terms expressing the same
meaning may have a relative large distanc
e. For example, the distance between

fs


and

fullsalt


is 6.
Thus lexical similarity

cannot accurately
capture


semantic distance

.


Semantic Annotation

Annotation is about attaching names, attributes, comments, descriptions, etc. to a document or to a selected part
of

text. It
provides additional information (metadata) about an existing piece of data. We use
ontolog
y

to annotate terms extracted from
file/
directory names, so as to
solve the problem of

heterogeneous file naming
conventions
and information representation.
The ontology acts as a global vocabulary
that

represent
s

the semantic meaning of terms
, assisting in term
disambiguation and
also help
ing

i
n associati
ng

domain concepts and local facts.



Figure
3
.
Snapshot of
the

O
ntology

Used
for
Annotation


6


SPE
SPE
-
153272
-
PP

Domain Ontology.

Figure
3

illustrates a snapshot of the ontology that we use for annotation. For illustration purpose
s,

this figure has been
compressed to show only part of the classes and instances contained in the o
ntology. We use on
t
ology classes such as
“ProjectName”, “VendorName”, “Version”, “SeismicModelingAlgorithmName”, etc. to describe possible information
contained in file names. As indicated by the name, each class captured a p
articular concept in file/d
i
r
ec
tory names.
S
ubclasses linked to each class define group
s under

particular concept
s
.

For example, “Sub_Salt”, “Base_Salt”, “Full_Salt”,
“Multi_Level_Salt”, and “Top_Salt”

are all subclasses com
ing under the “Salt”

class, which is
in turn a

subclass of
“Geo
body_Structure”.

We can also find concepts like “fullsalt”, “fslt”, “flslt”, “fsalt” and “flst”

that are linked to

the
“Full_Salt” class. In our ontology, the “Unk
n
own” class capture
s

terms
which are unknown or don't fit in any of the above
classes. Later
on
,

with the help of the domain expert
s
;

such

words
can either form

new class
es or become

new instance
s

of

existing class
es
.

Annotation.

During

annotation
[16]
, w
e annotate terms
based on
the domain ontology, and represent the terms as
ontology instances
belonging to
corresponding ontology classes. For example, since the terms

fslt
” is

used to repr
esent the algorithm name

Full_
Salt

, the ontology class

Full
_
Salt


should be used
for

annotat
ion
.

As shown in
Figure
3
,

we

create an

i
nstance “fslt”

that

belong
s

to the class

Full
_
Salt

.
We define
the

instance

as an
Attribute

of the
file

name
.

The whole file name can
then
be
represented as a
Semantic Entity

containing

a set of such
attributes
.
For example, t
he file name
“fslt_psdm_bp_projectbp_agc_il
.bri”,
can be represented by a group of
attributes

{
fslt, psdm,
bp
, proj
ectbp
}

that belongs to
the ontology classes
Full_Salt
,

PreStackDepthMigration
,
BP
,
and
ProjectName
, respectively.


As shown in
Figure
4
,
generated instances are stored in the ontology
repository
, which communicates with
our

annotation
application through
a

SPARQL
[15]

end
point
. When annotating a group of terms extracted from an image/model file name,
the
Automated A
nnotation

process

first
probes

the ontology to identify if any existing ontology instances match the terms

for
annotation. If we can find such ontology instan
ces, the ontology instances, as well as corresponding ontology classes,
will be
included as the

attributes of the image/model.



SPE
SPE
-
153272
-
PP


7

Figure
4
. Annotation approach


Figure
5
. Screenshot of the User Assisted Annotation Tool

[16]

If terms have not been annotated before, our annotation system
mark
s

them as

Unknown

.
The

user assisted annotation tool
then

annotate
s

unknown terms as ontology instances. A domain expert who doesn't ha
ve prior knowledge of Semantic W
eb

technology

can easily update the main ontology using
the

interface

provided by the tool
.

By
utilizing

this tool
,

a domain
expert can either define new on
tology classes for the unknown terms or associate them
to

previously defined classes.


Figure
5

shows
a

screenshot of
the

annotation
tool
.

As shown in the figure,

gulfofmaxico

, which is a term extracted from
the file name “Fl_prstk_krchf_vol1_saltbody_insert_gulfofmaxico
_vol1_2008.bri
”,

cannot be
initially
matched
to

an
y
instance in the ontology because of

a typo
:

maxico
”.
Thus the annotat
ion tool first annotates the term
as

“Unknown”.
To
define

this “unknown” term, the annotation tool allows the user to navigate all relative ontology
classes

and select
appropriate
classes

for annotation. In this example, the user
selects

the ontology
class

“Gulf_of_Mexico”, which is a subclass
of “PlaceName”.

The term “gulfofmax
ico”
is

then described as an attribute of the file name,
and a new instance
“gulfofmaxico” belonging to the class “Gulf_of_Mexico”
is

added to the ontology
with

user

corroboration
.

S
emantic Based Matching

E
very image/model file name can now be represented as
a semantic entity containing
a set of semantic attributes,
each

express
ing

the semantic meaning of a term contained in the file name. Recall that
we also capture the creation time of the file
in the
metadata extraction step.

T
ime information

is
also

represented as a semantic attribute.


In the matching step,
we speculate a velocity model was used to create a seismic image if
we find that their annot
ation
semantic entities match each other according to certain rules.
In particular, we have developed two types of semantic
-
based
matching approaches. Both approaches utilize semantic technologies such as SPARQL
[15]

to express the matching rules and
apply the rules to semantic entities.
We now introduce the two approaches in the following

sections
.

Approach 1:
Exact Pattern Matching
.

In our first app
roach, we first allow users to define a set of rules, each of which specify a condition that the file name of the
matched velocity models must satisfy. We then
compose

a SPARQL query
based on

matching rules
, and execute the query
to
search

among
all
semant
ic entities of velocity models.
The
results of the query
indicate potential

velocity models
which
might have been

used to generate the given seismic image.


Specifically, a matching rule can be used to specify

1)

What terms

MUST

be

included
in the file name
of the model;

2)

What terms are
POSSIBLE

to be included in the file name of the model;

8


SPE
SPE
-
153272
-
PP

3)

What terms
MUST

NOT be included in the file name of the model;

4)

Th
e range of creation time.

Since all file names have been represented as semantic entities with
semantic

att
ributes, when defining a matching rule,
users
do not need to consider
all

possible
terms
differentiations
.

Instead, they can directly use our domain ontology to
form

their
matching rules.


Based on

the annotation semantic entity of a given seismic image,
a user can directly specify what
ontology classes should be
included in the semantic entity of the velocity model. For example, when the user finds that the semantic entity of the im
age
contains an attribute “
cvx

, and she thinks that the name of the match
ing velocity model should
have

the same processing
vend
or name, she can directly define a
rule restricting the results to

velocity model
s

contain
ing

attribute
s

belonging to

the
ontology class “
Chevron

.


Users can
further

define a set of rules wit
h “if
-
else”
structure
in advance.

“if
-
else” rules express the exclusion and/or
inclusion
over

terms in image and model file names.

For example, a user can define a rule
of the form
“if the semantic entity
of the image contains an attribute
belonging to

Processin
gVendor

P
,
the semantic entity of the model should also con
tain an
attribute belonging to P

, where P

can be seen as a parameter of the rule.
Later
,

when the system does the matching for an
image file whose se
mantic entity has an attribute

cvx

, our match
ing system captures “Chevron” as the argument for the “if
-
else” rule, and automatically
generates

a

matching

rule that the semantic entity of the velocity model must contain an
attribute belonging to the

same

ontology class “
Chevron
”.


Based on the creatio
n time of the seismic image, users can define rules to specify the creation time range for the velocity
model
, e.g., 3~5 weeks after the creation time of the image.
In our system, we provide a simple GUI for users to define
dif
ferent types of matching rule
s, where users can select the corresponding ontology classes that are involved in the matching
rules based on our domain ontology,
and also
specify operators such as “=”, “>”, “<”, and “

”.



Figure
6
. SPARQL
Query Example

All matching rules are then integrated together to compose a SPARQL query.
Figure
6

illustrates a SPARQL query example
that contains three matching rul
es:

1)

T
he model name should have an attribute that is an instance of th
e ontology class “Chevron”.

2)

T
he model name should have an attribute that is an instanc
e of the class “Full_Salt”.

3)

T
he creation time of the model should be after
00:00
:00

of
01/01/2010.

A
pproach 2:
Matching Score.

The above approach searches
for

velocity models whose file names satisfy all user
-
specified matching rules. However, in
practice, images and models are often not named stri
ctly following the naming, thus the completeness of the i
nformation
cannot be guaranteed. S
ome important information contained in matching r
ules is often missing in model names.
For
example, the model name may miss the term indicating its processing vendor name.
As a result, the correct matching
model(s) cannot
be identified by the SPARQL query.


We develop a new approach t
o overcome this shortcoming of the exact pattern matching approach
. Still,
users define a set of
matching rules.
Instead of composing a SPARQL query containing all matching rules, we assign a s
core
to

each matching
rule.
E
ach velocity model

is associated with a matching score, which is initiated as 0 in the beginning.
We

go through all
rules and
add

the corresponding score to the model
’s
matching
score

if its file name satisfies a matching rule.
We sort
velocity models based on their matching score
s

and
provide the top
n

model
s

as

our matching result
. Here
n

is a number
specified by the user.


For example,
suppose users define the three rules contained
in the SPARQL query in
Figure
6
.
The scores for R
ule 1),

2) and
3) are 0.3, 0.5, and 0.2, respectively.
Then if a velocity model
, noted as
m
1
,

satisfies Rule 1) and 2),
m
1

gets a matching score
0.3+0.5=0.8.
Similarly

a velocity model

m
2

only satisfying Rule 3) gets a
matching score 0.2
, and a velocity model
m
3

satisfying Rule 2) and 3) gets a matching score 0.7
.
Thus

m
1

and
m
3

will be returned
as matching results

if a user wants the
top 2 matching models

for further selection

(i.e.,
n

is
equal to

2).

SPE
SPE
-
153272
-
PP


9


The score
of

each matching rule can be
considered

“weight
ing
”.

Intuitively,

a

matching rule with more importance should
have
a higher score.
M
atching scores can be assigned by

users according to

domain knowledge, or
can be

learned


using
machine learning algorithms.

Evaluation

We evaluate our approach in this section.
In our experiment, we first collect a set of seismic images and their
corresponding
velocity models, and record the correct matching between them as “ground truth”. We then use our approach to find the
matching model for each seismic image among the set of velocity models, and compare our results with the “ground truth” so
a
s to measure the precision.

We utilize the matching scor
e approach in the matching step
.

Experiment Setup

and Dataset Generation
.

One challenge in our experiment is that although we can collect a relatively large set of images and models, it is still diff
i
cult
to identify the “ground truth” since most of the correct linkage between images and models have not been recorded.
To
overcome this challenge, we designed an algorithm to generate synthetic “ground truth” based on the set of velocity models
we have co
llected.
Starting from around 500 velocity models collected from real applications,
our algorithm mimics the
naming
procedure
conveyed from

domain users
who
utilize velocity models to generate seismic images
.
Our matching
approach is then applied to find
a

matching model for each
generated

image.
The
links

between velocity models and seismic
images that are generated by our algorithm is used as the
synthetic

“ground truth”.


As shown in
Algorithm
1
, for each
seismic

image, we ran
domly generate its creation
time.
Based on our observations on real
data,
the creation time of
an

image should be within 3~5 weeks after the model’s creation. But we also allow exception
s
: the
creation time
of an image
may be out of the time range with a small probability P
T
.

After generating the creation time of the
image, we represent the file name
of the velocity model as a semantic entity.
Our algorithm acts as a domain engineer, and
determines what information should be included in the image file name based on a set of rules.
For each piece of information,
we randomly choose one of its possible re
presentation terms according to our ontology.
We compose the image file name by
connecting all
such

terms (a

term may be missing with a probability P
M
).

We also generate
some
redundant information in the
file name

with probability P
R
.


Input:
1)
the file n
ame of a velocity model
; 2) a set of image generation rules; 3
) probability P
T
, P
M
,
and
P
R

Output: the synthetic file name of a seismic image generated by the
input

velocity model

1.

Identify the creation time

T
V

of the velocity model;

2.

R
andomly generate a
datetime

T
I

within the range [T
V
+21 day, T
V
+35 day]

with a probability P
T

that
T
I

does not
have

any

value

range.

Use T
I

as the creation time of the seismic image.

3.

Extract useful terms from the velocity model file name.

4.

Annotate extracted terms by using our

domain on
tology, and generate the annotation semantic entity.

5.

Based on
the image generation rules,
identify what information should be included in the seismic image file name.
Use
a set of
ontology classes

(noted as {C})

to represent the necessary informa
tion.

6.

For each ontology class

in {C}
, randomly pick one of its instances as its term representation in the image file name.

7.

Compose the file name for the seismic image by connecting all the generated terms

with “_”
. Each term has a
probability P
M

to be mis
sing, and a probability P
R

to have a redundant copy in the file name (maybe with a different
instance belonging to the same ontology class).

Algorithm
1
. Algorithm for
Generating
the
Synthetic Image F
ile
Names

We run our experiment in a Desktop
with 3.06 GHz Intel Core i3 CPU and 4GB memory
.

Evaluation Results.

We measure the precision of our approach when our approach only generates one matching model for each image file. This
precision, noted as P
1
, measures
the probability that our best matching result is the correct matching model:























We also measure the
probability
of

the correct matching model

being co
vered when our approach generates multiple
matching results. In general, if we use




to denote the number of matching results returned for each seismic image, we
have:








































10


SPE
SPE
-
153272
-
PP


Figure
7
. Matching precision

as a function

of number of returned matching results

Figure
7

shows the precision
P
n

where n=1, 2,

, 8.
We see that
we can only achieve a precision
less than
6
0%

when n=1.
However, the precision improves greatly when multiple matching results are
provided
.
When n=8, we can achieve
100%
precision.

This
means
that
in order to recover lost

link
s

between images and models based
solely
on their file names,

about 8
candidate models suffice for the

correct matching model

to be covered (i.e. the correct matching mod
el to be in the retrieved
results). Hence, we conjecture that o
ur approach

can effectively
identify a small set
of matching candidate models, thus
drastically reducing the search space of candidate models
for users to
manually

examine

in order to
select

th
e right one
.

Related Work

In this work, we use the Semantic Web technologies to attack the challenge caused by heterogeneous
expression

of
information contained in image/model file names.
Our later matching scheme is based on the semantic representations o
f file
names.

The Semantic Web technologies have been used in the oil and gas industry to address problems such as information
integration and knowledge management

[1]
[2]
[3]
[4]
.
For example, in
[1]
,

the
POSC Caesar Association

proposed the Oil and
Gas Ontology (OGO), which is an industry wide ontology, in order to provide standard means for data integration

within and
across business domains.

In
[3]
,

Norheim

and Fjellheim introduced the
AKSIO

system, which is knowledge management
system for petroleum industry.
In addition,

different use
-
cases have been discussed as applicatio
ns of Semantic Web
technologies
in the oil and gas industry in
[2]
.

In our earlier work, we use the Semantic Web technologies to address the
problems in the reservoir management and real
-
tim
e oil field operations setting
[4]
.
Issues
about how to develop ontologies
,
how to
build

the knowledge database
,

and examples of applications
for information

integration and knowledge man
agement
have been discussed in
[4]
.


The linkage between
the seismic images and the velocity models that were used for their creation
can be seen as a type of
provenance information. The provenance information explains the derivation histor
y of a data
object

thus
can be used for
audit trail

and is critical for data quality control.

Applications and technologies for provenance collection, management and
access have been widely discussed in
domains

such as e
-
Science and health care
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
.

Conclusion

The linkage bet
ween seismic images and velocity models that were used for their creation is an important type of provenance
information
required
for further analysis and data quality control.
In this paper, we propose
d

a
n

approach to recover the
missing
linkage between s
eismic images and velocity models.
Our approach
extract
s

information
contained in

image/model
file names, and utilizes Semantic Web technologies to annotate and represent extracted information.
Based on user
-
specified
rules, we designed algorithms to identify the matching between images and
models
.
For

future work,
we will

explore other

possible

information
types
that may be
utilized

to improve our prediction precision
.

We will also

address the s
calability of
our approach so that we can
effectively
handle large datasets.

Acknowledgement

This work is supported by Chevron Corp. under the joint project, Center for Interactive Smart Oilfield Technologies (CiSoft),

at the University of Southern Califor
nia.

Reference
s

[1]

POSC Caesar Associat
ion,
http://www.posccaesar.com/

[2]

F.

Chum, “
Use Case: Ontology
-
Driven Information Integration and Delivery A Survey of Semantic Web Technolo
gy in the Oil and
Gas Industry,”

April
2007.
http://www.w3.org/2001/sw/sweo/public/UseCases/Chevron/

SPE
SPE
-
153272
-
PP


11

[3]

D
.

Norheim and
R.

Fjellheim, “
AKSIO
-

Active Knowledge manage
ment in the petroleum industry,


in
3
rd

European Semantic Web
Confere
nce (Industry Forum), June 2006

[4]

R. Soma,
A.

Bakshi,
V.

Prasanna, W. Da

Sie and B. Bourgeois, Semantic
-
web technologies for Oil
-
field Management, SPE Intelligent
Energy Conference and Exhibition, April 2008

[5]

C. Pancerella et. al
., “Metadata in the collaboratory for multi
-
scale chemical science,” Dublin Core Conference, 2003.

[6]

I. Foster, J. S. Vockler, M. Wilde, and Y. Zhao, “Chimera: A virtual data

system for representing, querying, and

automating data
derivation,”
Scientific and
Statistical Database Management Conference (
SSDBM
)
, 2002.

[7]

J. Frew and R. Bose, “Earth system scien
ce workbench: A data management
infrastructure

for earth science products,”
Scientific and
Statistical Database Management Conference (SSDBM)
, 2001.

[8]

J. Zhao, C. Goble, M. Greenwood, C. Wroe, and R. Stevens, “Annotating,

linking and browsing pro
venance logs for e
-
science,”
I
nternational
S
emantic
W
eb
C
onference

(ISWC)

Workshop on Retrieval of Scientific Data, 2003.

[9]

J. Zhao, C. Wroe, C. Goble, R. Stevens,

S. Bechhofer, D. Quan, and

M. Greenwood, “Using semantic web technologi
es for
representing eS
cience

provenance,”
International Semantic Web Conference (ISWC)
, 2004.

[10]

S. Sahoo, A. Sheth, and C. Henson, “Se
mantic provenance for e
-
S
cience:
Managing the deluge

of scientific data,” Internet Computing,
IEEE,

vol. 12, 2008.

[11]

Y. L. Simmhan, B. P., and D. Gannon, “A survey of data provenance in e
-
science,” SIGMOD Record, vol. 34, no. 3, pp. 31

36,
September 2005.

[12]

S.
A
lvarez, J. Vazquez
-
Salceda, T. Kifor, L. Z. Varga
, and S. Willmott. “Applying Provenance in Distributed Organ Transplant
Management,” International Provenance and Annotation Workshop, Chicago, USA, May 2006
.

[13]

J.

Zhao,
Y.

Simmhan,
K.

Gomadam, and
V.

Prasanna, "Querying Provenance Information in Distributed

Environments
,

"International
Journal of Computers and Their Applications, VOLUME 18, NO. 3,
September

2011
.

[14]

V.

I Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals," Soviet Physics Doklady (1966), Vol.

10,
Issue 8, pp. 7
07
-
710.

[15]

SPARQL Query Language for RDF,
http://www.w3.org/TR/rdf
-
sparql
-
query/

[16]

C.

Chelmis
,
J.

Zhao,
V.

Sorathia,
V.

Prasanna,

and S.

Agarwal
,

"Semiautomatic, Semantic Assistance to Manual Curation

of Data in
Smart Oil Fields," SPE Western North American Regional Meeting, 2012.

[17]

P.

Neri
,“
Data Management Challenges in the Pre
-
Stack Era
,” in First Break,

2011, 29, 97
-
100
.

[18]

J. B.

Bednar,

Modeling, Migration and Velocity Analysis in Simple and Complex St
ructure
,” by

Panorama Technologies, Inc., 2009

[19]

SURA
, “
Coastal Ocean Observing and Prediction (SCOOP)
-
Filename Conventions
,” by

Southeastern Un
iversities Research
Association
, 2006
.

[20]

T.

Alsos
,

A.
Eide,
D.
Astratti,
S.
Pickering
,

M.
Benabentos
,

N.
Dutta
, S.
Mallick
,

G.
Schultz,
L.
den

Boer
,

M.
Livingstone,
M. Nickel,
L.
Sonneland,
J.
Schlaf
,

P.
Schoepfer,
M.
Sigismondi,
J. C.
Soldo,
and L. K.
Str
onen
,


Seismic Applications Throughout the Life of the
Reservoir
,”

in
Oilfield Review, 2002, 14, 48
-
65
.