TR_303.doc - University of West Florida

idiotdiscSoftware and s/w Development

Aug 15, 2012 (5 years and 2 months ago)

317 views

idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

1

Static Support for U
nderstanding SOA Descriptions:
Exploring the R
equirements


Laura White,
Thomas Reichherzer
, Norman Wilde, John Coffey

{lwhite|treichherzer|nwilde|jcoffey
}@uwf.edu

Douglas Leal,
Joshua Dault
,
Juan Gil Restrepo
,

David Kaczynski,

{ddl12
|
jb
d16
|
jg38
|
dak19
}
@students.uwf.edu


Executive Summary

Service Oriented Architecture (SOA) has emerged as a way of providing flexibility to large
scale software systems. However there may be problems in understanding and
maintaining software constructed using

this new paradigm. This report summarizes some
of the issues that have been discussed and analyzes the requirements for static analysis
tools to aid SOA maintainers. It also describes ongoing work on SOAMiner, a text search
tool tuned to the analysis of S
OA description files such as WSDL's, XSD's, and BPEL's.
SOAMiner is currently under development following a spiral model to clarify requirements
through repeated evaluations of prototypes.


This report may be cited as S
2
ERC
-
TR
-
303
, Security and Software
En
gineering
Research Center

(S
2
ERC)
, http://www.serc.net,
July 1,

2010.


Table of Contents

1

Introduction and Motivation

................................
................................
................................
....

2

2

Program Com
prehension Tools and SOA

................................
................................
...............

3

3

Static SOA Program Comprehension in Context

................................
................................
....

4

4

The SOAMiner tool

................................
................................
................................
.................

6

5

Initial Studies with SOAMiner

................................
................................
................................

7

5.1

Case Study Sources

................................
................................
................................
.........

7

5.1.1

The Travel Reservations Servic
e

................................
................................
.............

8

5.1.2

A WSDL from MicroPAVER™

................................
................................
.............

8

5.1.3

SOA Descriptions Harvested from the Web

................................
............................

8

5.2

Scalability Study

................................
................................
................................
..............

8

5.3

Basic Maintenance Scenario Study

................................
................................
.................

9

5.4

Locating Data Type Usages

................................
................................
...........................

10

6

Conclusions

................................
................................
................................
...........................

10

7

Acknowledgements

................................
................................
................................
...............

11

8

References

................................
................................
................................
.............................

11

Appendix A
-

Results of the Bas
ic Maintenance Scenario Study

................................
.................

14


idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

2

1

I
NTRODUCTION

AND
M
OTIVATION

In recent years many organizations have turned to Service Oriented Architecture
s

(SOA) as a way
to structure large software systems. While there are many differ
ent views of
SOA, most of them
describe applications structured as a collection of services, running on different nodes, and
loosely coupled by exchange of messages via a layer of SOA infrastructure, sometimes called an
Enterprise Service Bus (Figure 1)



Fig
ure 1


Structure of a Service
Oriented Architecture Application

Since the emergence of the SOA architectural style in the early 2000's, s
ome concern has been
expressed about how this new generation of computer applications will be maintained.
Maintena
nce has always been the most expensive phase of the software life cycle, primarily due
to the need to sustain understanding of complex code as it grows, often losing structure in the
process, and is handed off from one group of software engineers to anothe
r. Software changes, be
they bug fixes or enhancements, become much more risky and time
-
consuming as knowledge is
lost.

T
here seems to be little reason
why these same issues will not emerge over time

with SOA.
In fact, some have described SOA systems as b
eing in
continuous evolution

(or
permanent beta
)
as soon as they are deployed [KONT:2008].


We will explore possible requirements for static analysis tools to aid SOA software engineers and
specifically
will describe
ongoing work to create
SOAMine
r,

a soft
ware engineer's search tool
that users might think of as a
Google
*

for SOA
.



An initial motivation for the development of
SOAMiner

came from working with students on an
introductory SOA
tutorial distributed

as part of the Netbeans development environment.

The
Travel Reservations Service
[KOVAL:2008] is intended to be a simple example of the use of
BPEL to orchestrate services,
and consists

of a BPEL module and three partner services. The
partners are simply stubs, designed to simulate reserving airline

sea
ts
, hotel

rooms

and
rental
car
s
.
Yet the whole example once deployed
consists

of 129 files distributed across 49 directories,
not

counting files actually deployed to the server

(Table 1)
. While the tutorial went smoothly as a



*

Google

is a tra
de mark of Google, Inc.

idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

3

demonstration of Netbeans'
BPE
L
capabilities, as novices we found ourselves completely
bewildered by the multiplicity of components it used.

Table 1
-

Size of the Travel Reservation Service Example


Initially

After Deploy
Partner
Services

Aft
er Deploy
Composite
Application

Directories

26

32

42

Files

76

105

129


We hypothesized that any future maintainer would
encounter

equal bewilderment if faced with
the need to modify
such a system.
Obviously some aid to navigation through the mass of material
could be useful, and a
SOAMiner

search

engine
tuned

to

SOA
requirements
seemed to be a
relevant and understandable analogy.


Further study of
the Travel Reservation Service

and other examples indicated that much of the
complexity is in the files that serve to tie the application together and t
o deploy it to an
application server. For lack of a better term we call these
SOA d
escription files
; they include:



Web Service Definition Language (WSDL) files, which specify the interfaces of web
services and the addre
sses on which they are deployed



XML S
chema Definition (XSD) files, which may define data types used in messages



Business Process Execution Language (BPEL) files, used to specify the orchestration of
services



A variety of XML files of different types, apparently containing mappings used to
pro
vide information to
the programming environment

or to the application server where
the services will be deployed.


These SOA description files provide a complex web of information that may
provide

essential
background for a software
maintenance task
. For e
xample a data type may be described in an
XSD file, and then referred to in a message description within a WSDL file, which is then
mapped to a service operation within that same
WSDL
, which specifies the URL where the
service may be accessed, which is in
turn mapped to specific EJB's by an XML deployment
descriptor. A software maintainer may need to comprehend this web of relationships to fully
understand the consequences of any change

to the data type
.

2

P
ROGRAM
C
OMPREHENSION
T
OOLS
AND
SOA

There is a fairly

extensive body of literature on the comprehension of
pre
-
SOA

styles of software
including reports of research that has been backed up by experiments or careful case studies. It is
clear

from this literature

that experienced software engineers use a pragma
tic, as
-
needed strategy
in studying unfamiliar code. They rarely attempt to understand a large program in its entirety, but
rather seek out those parts that are essential for the specific task they have at hand [
KOEN:1991].

This finding provides the motiva
tion for the development of tools to help software engineers
locate and browse code using different criteria.


The actual mental processes used during comprehension are complex.
For example v
on
Mayrhauser and Vans observed experienced software engineers as

they worked and
noted that
they switch back and forth between different perspectives
:

a program model (overall control flow
of the code)
,

a situation model (functional and data flow abstraction)
,

and a top
-
down model
idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

4

(knowledge of the application domain)
[VONM:1994]
. The conclusions of this line of research
emphasize the ra
pid mental switching involved as engineers recognize

"beacons" such as variable
names
or code patterns, and
extract

information from multiple sources. This view puts a premium
on
agile
t
ools that can
give answers

quickly and play well within the engineer's development
environment.


For the specific case of SOA applications, there is little published work on program
comprehension that is

based on experimental research

but there have been a

number of
discussions in the literature of the potential problems to be expected. In

a

panel discussion
at the

2004

International Confer
ence on Software Maintenance


s
everal of the panelists focused on
organizational and software process changes that may

become necessary

[KAJK:2004].

Kajko
-
Mattson and Tepczynski later elaborated further on these suggested organizational changes and
on the concept of

"Service Centers"

to specialize in the maintenance of web services
[KAJK:2005].

Gold et

al. describe compre
h
en
sion
issues in scenarios in which applications are
composed dynamically, possible differently on every invocation, using broker services that may
not disclose their inner workings [GOLD:2004b]
.

Wilde et al. discuss proposals for understanding
specific f
eatures in SOA applications, based on
their
experiences with earlier kinds of distributed
software [WILD
E
:2008].


Gold and Bennett provide some
interesting
experience based on the development of a prototype
health information service [GOLD:2004a]. This sys
tem involved integrating information from a
wide range of health service providers, and not surprisingly they found that

the

integration of
multiple
changing
data models and ontolo
gies present
s

significant challenges. Either interfaces
must be tightly coor
dinated
among participating organizations
or code must be constructed to
cope with minor interface changes. Tracing of execution patterns via "audit services" could help
both program comprehension and debugging, especially if services are composed on
-
the
-
f
ly.


More recently, two papers at the

2008 Frontiers of Software Maintenance workshop addressed
SOA.

Lewis and Smi
th [LEWIS2008] discuss some of
the issues related to the evolution of SOA
systems, notably the problems of dealing with distributed systems w
ith multiple owners and the
comprehension difficulties of having expertise to deal with multiple languages and operating
environments.

Kontogiannis also discussed the multi
-
language issues and the need for processes
to support the continuous incremental
ev
olution

characteristic of deployed SOA applications.
[KO
NT:2008]
.

3

S
TATIC
SOA

P
ROGRAM
C
OMPREHENSION IN
C
ONTEXT

The history of static support for program comprehension shows an interesting evolution from
simpler to

more complex tools (see Table 2
)


Table 2

-

The Evolution of Static Program Comprehension

Level

Tool Category

Characteristics

1

cross
-
referencing,
indexing

single source file, no user interface (hardcopy),
byproduct of the compilation process

2

text search, regular
expression

multiple file search
, initially command line interface,
later GUIs

3

graph model of impacts
or dependencies

database of whole software system, various query
interfaces, tracing of chains of relationships

4

design recovery

specialized tools for recovery of specific
abstracti
ons assumed to be useful to maintainers


idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

5

From

the 1960's, compilers have
often
provided a cross reference listing of source code as a
byproduct of the compilation process. This simple list of identifiers giving the line numbers
where each was used was a s
ignificant aid for software maintainers, especially at a time when the
use of global data was much more prevalent
than

it has now become. For example the cross
reference helped a maintainer to understand data flows since he could see all the places where a

particular variable was set and accessed. This kind of tracing became, of course, more and more
tedious as programs grew and separate compilation units became common.


In the 1970's programmers gained continuous access to source code through time sharing
terminals. Source code was now more often split across an increasing number of files
so

multi
-
file

search tools were needed
. A classic example was the grep tool for

regular expression search

which has continued to be

available in most Unix environments

[OP
EN:2004]
. Regular
expression matching provided more freedom in expressi
ng queries, but at the cost of possible

false matches to irrelevant code and comments. Tracing effects through code continued to be
difficult, requiring the maintainer to locate each 'h
it' in the code, evaluate its relevance, and then
possibly generate new queries based on the evaluation.
Regular expression search tools are now
commonly built into programming environments such as Eclipse and Netbeans and continue to be
a maintenance prog
rammer
'
s
favorite for many tasks.


A third level of development emerged in the 1980's and 1990's with tools
founded

on more
sophisticated models of the source code. These models were typically based on graphs of some
sort, where the nodes represented diffe
rent entities in the source code (
e.g.
variables, data types,
classes,
functions or methods
) and the arcs represent
ed

different kinds of
relationships between
them
. Tools for relationship extraction [CHEN:1990], p
rogram slicing [
GALL:1991],
dependency ana
lysis [
LINO:1994
] and impact analysis [
QUEI:1994
]

fall into this category. Many
of the

tools at this level
aimed

to reduce the tedium of tracing ef
fects through the code by
allowing
queries
about

chains of relationships, for example, "
show all parts of the

code that affect
the value of variable X".


The fourth level of sophistication is seen in tools that provide or support design recovery. At this
level the toolmaker attempts to abstract up from the code to a higher level representation and
provide a conci
se response to questions maintainers
are presumed to
ask about the co
de. There
are many

such

tools in
the literature

with
specialized

goals and approaches, e.g.
[MURPH:1997]
,
[DILUC:2000]. Gueheneuc, Mens and Wuyts provide a classification scheme [GUEH:20
06].
Perhaps because of this specialization, relatively few of these tools have passed into widespread
use.


In thinking about tool requirements, we see

three interesting progressions as we move up this
scale

of tools. At the lower levels
the tool does muc
h less work and

more is l
eft to the software
maintainer.
T
hat

has benefits as well as costs since m
aintainers generally know quite a lot about
the code they are faced with. They are likely to be familiar with the problem domain, with the
run
-
time environme
nts, and possibly with coding practices used in development. Even
maintainers lacking these advantages may have access to colleagues
who can

help them over the
rough spots.
Thus

in formulating queries and viewing results they have a substantial advantage
o
ver a software tool, no matter how sophisticated it may be.


A second progression derives from the first. Since at the lower levels the maintainer does most of
the work, the requirements for the tool are relatively straightforward. If the maintainer needs

to
understand a variable it is up to him to seek it in the cross reference or formulate a regular
expression query. At
this

level the tool designer is not
directly
concerned with the thought
processes or the work flow of the maintainer.

idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

6


However as the to
ol tries to do more, it becomes important that the 'more' should actually map to
real maintainer tasks

and thought processes
. The tool designer needs to understand typical
maintainer problems and craft a user interface that will present solutions to these
problems in an
easy
-
to
-
use and intuitive way. The user of grep does not need to understand
how it processes
regular expressions. T
he user
of more sophisticated tools should not need

to
understand
either

the

structure of the graph it uses
or

the subtleties
of the output it displays.


Which brings us to our third progression: ease of use, or perhaps more important, ease of
first

use.
For over 20 years o
ur program comprehension research group has been working with industrial
software engineers in the
Security
and
Software Engineering Research Center
-

S
2
ERC
(
http://www.serc.net)
[WILDE:2007]. Our experience is that m
aintainers of real industrial
software are very busy, and have little leisure to explore new tools. Tools that take more than a
few hours to instal
l and run, or that require extensive practice to use effectively are unlikely to
become part of their toolkit.

The cross referencing and text search tools are favored because they
require little or no setup and can be used
quickly
across a range of mainten
ance tasks with little
training.

More sophisticat
ed tools tend to need more time
-
consuming setup, as well as more
expertise to use.

4

T
HE
SOAM
INER TOOL

Our current SOAMiner prototype is firmly located at Level 2 of Table 1.
We feel that it

is too
early to de
velop more elaborate tools since there is still little experience with the practical
maintenance of SOA systems. We simply do not know what tasks and questions maintainers will
encounter. We still need to explore the diversity of different SOA designs and
interact with
industry software engineers before we can define the requirements for higher level tools.


So our initia
l goal is to support maintainer
s of SOA applications with an easy
-
to
-
use text search
tool for

SOA description files while keeping within t
he well known menta
l model of a web search
engine. T
he tool should provide initial benefits quickly; o
ur goal is less than one hour from
tool
download to first results.


Our

initial SOAMiner prototype is based on
Apache Solr
[APAC:2010].

Solr is a widely
-
u
sed
open source text search system that runs within a servlet container and may be accessed from a
web browser. To use Solr,
the administrator

create
s

a
schema

describing the different fields in
each document
he wishes to make searchable

along with in
dexin
g and querying options. He

then
p
arses the input documents

to create the specified fields and post the result into Solr's index.
Users may then make queries using a web
browser

interface.


The
design of the
Solr schema
is important since it
determines the
kinds of queries that can be
made and the results that will be returned

[SMIL:2009, chapter 2]
. An important decision
concerns the granularity of the index. The grep tool searches text that is divided into lines and
echoes the
lines

that match the user's q
uery. Most web search engines index files and return a link
to each
complete
file

that matches the query. We noted that most of the SOA description files
(WSDL, XSD, etc.) have an XML format and that most of the information is given in attr
ibutes
within ta
gs (See Figure 2
).
For such files line endings are arbitrary, and complete files may be
quite complicated so neither granularity is really appropriate.
W
e
thus hypothesized

that the
best
unit to index would be the
XML tag

and
we decided to
focus on
queries

that match or partially
match the values of

tag

attributes.


idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

7


<portType name="AirlineReservationCallbackPortType">


<operation name="airlineReserved">


<input message="tns:AirlineReservedIn"/>


</operation>


</portType>

Figure 2

-

Part of a

WSDL

File

Showing XML Structure

T
here are three tags, <portType>, <operation
> and <input>,
with
matching closing tags

</operation> and </portType>


SOASearch,
our

current user interface used for searching SOAMiner, is a JavaScript application
based on AJAX
-
Solr, an open source library [AJAX:2010].
SOASearch runs in a web browser
and provides a window divided into two panes (see Figure 3). Users make queries in the left pane
and the usual strategy is to make a general query and then narrow it do
wn as needed. This pane
contains "tag clouds" showing the most common file types, file names, XML tag names. etc. in
the
most recent

query response. The user can click on these

tags

or enter search text to restrict the
query.


The right pane displays the c
urrent results, paged 10 at a time. At the moment it simply displays
the data stored in Solr's index for each matching XML tag.



Figure 3
-

SOAMiner Search Interface

5

I
NITIAL
S
TUDIES WITH
SOAM
INER

5.1

Case Study Sources

SOAMiner is still in early stages of
a
spiral
development

process so t
he studies done with it so far
are not evaluations of a finished tool, but rather intended to provide feedback on our design
decisions and
to
surface unanticipated requirements.

idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

8


We have used three data se
ts in these initial

evaluations, the Travel Reservation Service, a WSDL
from the MicroPAVER


civil engineering
application, and a collection of SOA description files
harvested from the Web.

5.1.1

The Travel Reservations Service

The Travel Reservations Service is the Netbeans examp
le described earlier as a
motivation of

this
project [KOVAL:2008].

Disregarding duplicates and many miscellaneous XML files, we were
left with three distinct WSDL's (one each for airline, hotel and rental car reservations), one large
XSD with travel indust
ry standard data types, and a BPEL file for the program to orchestrate the
services.

5.1.2

A WSDL from MicroPAVER™

MicroPAVER

is a large software application widely used by civil engineers in managing the
maintenance of pavement installations such as roads and airport runways

[AWPA:2009]
. It is
implemented as a large collection of services progr
ammed using Window
s

Communication
Foundation (WCF).
Most of these services are tightly coupled and WSDL's are not normally used
internally but they can be generated by the WCF software to allow

external

a
ccess
. We used one
very large generated WSDL of over

1 MB that may be representative of SOA descriptions
generated automatically from large legacy components
*
.

5.1.3

SOA Descriptions Harvested from the
Web

A third data set of WSDL's and XSD's was collected from the Web to provide both test data for
SOAMiner and a

rough snapshot of the current state of practice in service design. A
crawler was
written that generated automatic

Web queries to select and download WSDL a
nd linked XSD
files
. The Web queries included

variou
s keywords and type specifications to select URL
s that
match WSDL or XSD files. Keywords were selected from the vocabulary of WSDL and XSD
files as well as glossaries from different subjects, to select files from a wide range of applications.
Among matching results, up to 100 Web documents were download
ed using the URLs returned
by the search engines. The documents were subsequently filtered to match WSDL or XSD files.
In addition, WSDL files were analyzed to collect any XSD files describing data structures within
the WSDL files.

The result was a data se
t with 1513 WSDL and XSD files.

5.2

Scalability Study

As has been mentioned, we believe that ease of first use is an important design goal for
SOAMiner. Since it may be used on large systems with many files we ran an initial scalability
study to make sure that

the choice of Solr and
our

design of the Solr schema were not
compromising

SOAMiner's ability to

rapidly

index and query data sets of
various

sizes and
complexity.


We made two timed tests, one with the single large WSDL from MicroPAVER to stress memory
u
se in our parser, and a second with the entire 1513 file data set harvested from the Web to stress
Solr's index.
Both tests were run on a MacBook


pro with an Intel Core


2 Duo 2.8GHz
processor, 6MB of L2 cache and 4GB of RAM. We measured the clock time re
quired to parse the
input files

and the time to post the resulting data into the Solr index. The results are shown in
Table 3, along with the dat
a set size, measured as the total number of input XML tags that were
indexed.




*

We would very much like to thank Dr. Arthur Baskin of
Intelligent Information Technologies
, a S
2
ERC
affiliate company, for providing us with this data as well as with many insights into the way services are
used within MicroPAVER
and associated programs.

idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

9


Table 3
-

Times to Parse and Load into SOAMiner

Data Set

Total
Files

Total
Size
(tags)

Parse Time
(sec)

Post Time
(sec)

MicroPaver WSDL

1

12,818

166.63

20.13

Web Harvested
WS
DL's and XSD's

1513

529,127

1278.89

955.93


We made several test queries against each data set and found that the response time was
only

a
few
second
s

in all cases.


The
se

results suggest that SOAMiner
will scale

well with both the size of data sets and t
he size of
individual files.
Parse, load and

query

time
s are within acceptable limits.

5.3

Basic Maintenance Scenario Study

A usability study was conducted with the initial prototype of SOAMiner to evaluate current
capabilities and to identify additional requi
rements. A think aloud protocol was used with two
participants engaging in two predefined software maintenance scenarios using the Travel
Reservations Service described earlier.

P
articipants

were encouraged to verbalize their thoughts
as they

performed act
ivities related to software maintena
nce, while

observers recorded times,
comments, and
participant behavior
.


To help isolate usability factors related to text search tools in general as opposed to usability
issues with SOAMiner in particular, each partici
pant performed one of the scenarios using grep
and the other using SOAMiner. Both participants were students with some reading experience
with WSDL's and XSD's, but without practical experience working with such documents.
Accordingly the study was
precede
d

with approximately three hours of basic orientation related
to XML Schema, WSDLs, BPEL, SOAMiner, and grep.


The first maintenance scenario was fairly simple and involved locating where the URL for a
particular service was defined. The second scenario
i
nvolved a hypothetical bug involving
failed
cancellations of

vehicle reservations
; this

was more difficult in that it

required tracing through
several of the SOA description files to understand how m
essage return data was

defined
.


Both participants, one
using grep and one using S
OAMiner, were able to get the correct answers
for the first scenario.

The participant using grep was able to answer th
e questions within 15
minutes while t
he
participant using SOAMiner
took 25 minutes, the
difference being entire
ly
attributable to the time required to go through SOAMiner's indexing procedure.


The second scenario was more challenging and neither participant was able to correctly answer
all of the questions. The main difficulty seemed to be that both participants o
nly had a novice
level of familiarity with BPEL, WSDL, and XSD and neither tool was sufficient to substitute for
this
lack of background knowledge. One specific problem was that the WSDL contained strings
such as "CancelVehicleOut" and so the participants
searched on variants of "CancelVehicle".
However in the XSD the

data

type they were looking for was called "CancellationStatus" and so
was not found. A SOA expert would be able to trace the point in the WSDL where the
terminology changed but novices could
not make the connection.


The most important result of this study is

the

list of suggested improvements to SOAMiner as
given in Appendix A.

idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

10

5.4

Locating Data Type Usages

One final study explored a task that we think may be typical for users of SOAMiner
-

under
standing usages of data types. The WSDL 1.1 specification provides wide latitude to s
ervice
developers as to how data types are declared
.
There are three possible strategies which may be
combined within any WSDL. The data contained within a particular mess
age may be declared:
(1) by reference in an optional <types> section to an external XML Schema document (2) by
using the XML Schema namespace and coding XML Schema
-
formatted data types in the <types>
section of the WSDL itself or (3) if only the 44 simple
types in the XML Schema
recommendation are used, by coding them directly in <message> tags.
Unfortunately this
flexibility means that maintainers may often be faced with WSDL's written in an unfamiliar style.


For the Travel Reservation Service we imagined

a scenario in which a maintainer needed to
understand the data used in the input message to reserve a vehicle. A WSDL expert would know
that data types are often declared in <part> tags within a <message> tag. In SOAMiner it was
easy to restrict to WSDL f
iles and to <message> tags and then search for "vehicle". This
immediately finds the four matching messages (Figure 3). However it was more difficult to
navigate to the <part> tag contained within the ReserveVehicleIn message. The only solution was
to sear
ch for the "tag child Id", an arbitrary unique string generated during parsing. While that
method works, it would be desirable to have a more
-
intuitive way of navigating up and down the
hierarchy of tags.


Once the <part> tag

for ReserveVehicleIn
was found
, SOAMi
ner showed immediately that its
input data type is

ota:TravelItinerary

. The obvious next step was to do an unrestricted search
for

Tr
avelItinerary”, but that produced

thousands of hits because
all tags

in the file
named
OTA_TravelItinerary.
XSD

ma
tch the query! Thus the file containing the type definition was
quickly identified, but the current Solr search interface does not provide any easy way to search
for that specific string
within that file.
The Solr schema should probably be adjusted to avoi
d
matches to file names or to give low weight to such matches.

6

C
ONCLUSIONS

This report has discussed some of the problems that software maintainers may face when trying
to understand the large SOA applications which are now coming into service. It also des
cribed the
ongoing development of SOAMiner, a proposed search tool that users might think of as a
Google
for SOA.
Since there is little documented experience in the maintenance of SOA applications, we
do not know clearly what maintainers will need, so SOAM
iner is being developed following a
spiral process with repeated evaluation of prototypes.


The evaluations reported in this report will be used to guide the development of the next version
of SOAMiner, which we hope may then be ready for trials at S
2
ERC
industrial affiliates.

The
top
priorities
identified
for the next cycle
are:

1)

Provide a better user interface and a more agile setup and load procedure for
indexing

SOA description files into SOAMiner.

2)

Redesign the panel showing SOASearch output (the right
panel in Figure 3) so that it
conveys more information about the tags that matched the query and the context in which
those tags exist. One possibility would be to integrate with a text display that would show
the query results highlighted on the original
XML file.

3)

Miscellaneous cleanups to our parser and to the Solr schema to avoid searching on file
names and paths and to provide better information for use in the redesigned output panel.

idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

11

7

A
CKNOWLEDGEMENTS

Work described in this paper was partially supported

by the University of West Florida
Foundation under the
Nystul Eminent Scholar Endowment
.

We would also like to thank Dr.
Arthur Baskin of IIT, a S
2
ERC affiliate, for his guidance.

8

R
EFERENCES


[AJAX:2010]

Evolvingweb's AJAX
-
Solr,
http://github.com/evolving
web/ajax
-
solr
, link
accessed June 2010.


[ALMOD: 2010]

Almonaies, Asil A., Cordy, James A. and Dean, Thomas R., Legacy System
Evolution towards Service
-
Oriented Architecture, International Workshop on
SOA Migration and Evolution SOAME 2010, Madrid, March
2010, pp. 53
-
62, ISBN 978
-
3
-
00
-
030627
-
3.


[APAC:2010]

Apache Software Foundation, Apache Solr.
http://lucene.apache.org/solr/

link accessed June 2010.


[APWA:2009
]

APWA
-

American Public Works Association, MicroPAVER 6.1.5 Pavement
Maintenance Management

System,
http://www.apwa.net/About/SIG/Micropaver/

link accessed June 2010.


[CHEN:1990]

Chen, Yih
-
Farn; Nishimoto, Michael; Ramamoorthy, C. V., "The C
Information Abstraction System", IEEE Transactions on Software
Engineering,
Vol. 16, No 1,

pp. 325
-

33
4
.


[DILUC:2000]

Di Lucca, Guiseppe Antonio; Fasolino, Anna Rita; De Carlini, Ugo,
"Recovering Class Diagrams from

Data
-
Intensive Legacy Systems"
Proceedings International Conference on Software Maintenance, ICSM
-
2000, San Jose, October 2000, pp. 52
-

6
3.


[GALL.1991]

Gallagher, Keith B. and Lyle, James R., "Using Program Slicing in Software
Maintenance" IEEE Transactions on Software Engineering, Vol. 17, No. 8,
August 1991, pp. 751
-

761.


[GOLD:2004a]

Nicolas Gold, Keith Bennett, "Program Comprehensi
on for Web Services,"
pp.151, 12th IEEE International Workshop on Program Comprehension
(IWPC'04), 2004


[GOLD:2004b]

Nicolas Gold, Claire Knight, Andrew Mohan, Malcolm Munro,
"Understanding Service
-
Oriented Software," IEEE Software, vol. 21, no. 2,
pp. 7
1
-
77, Mar./Apr. 2004, doi:10.1109/MS.2004.1270766
.


[GUEH:2006]

Guehene
uc, Y.
-
G.; Mens, K.; Wuyts, R.
, "A comparative framework for
design recovery tools,"

Proceedings of the 10th European Conference on
Software Maintenance and
Reengineering, 2006. CSMR 2
006, pp.123
-
134
,
March 2006
,
doi: 10.1109/CSMR.2006.1
.



[KAJK:2004]

Mira Kajko
-
Mattsson, "Evolution and Maintenance of
Web Service
Applications,"

pp.492
-
493, 20th IEEE International Conference on Software
Maintenance (ICSM'04), 2004
.

idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

12


[KAJK:2005]

Mira

Kajko
-
Mattsson, Michal Tepczynski, "A Framework for the Evolution
and Main
tenance of Web Services,"
pp.665
-
668, 21st IEEE International
Conference on Software Maintenance (ICSM'05), 2005
.


[K
OEN:1991]

Koenemann, J. and Robertson, S. P. 1991. Expert probl
em solving strategies
for program comprehension. In Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems: Reaching Through Technology (New
Orleans, Louisiana, United States, April 27
-

May 02, 1991). S. P. Robertson,
G. M. Olson, and
J. S. Olson, Eds. CHI '91. ACM, New York, NY, 125
-
130.
DOI= http://doi.acm.org/10.1145/108844.108863


[KONT:2008]

Kostas Kontogiannis
, Challenges and Opportunities Related to the Design,
Deployment and Operation of Web Services, Frontiers of Software
Main
tenance (FoSM) 2008, Beijing, Sept.
-

Oct. 2008, pp. 11
-
20.


[KOVAL:2008]


Anastasia Koval
,
Understanding

the Travel Reservation Service,

http://netbeans.org/kb/61/soa/understand
-
trs.html
, link accessed June, 2010
.


[LEWIS:2008]

Lewis, G. A. and Smith, D
. B., Service
-
Oriented Architecture and its
implications for software maintenance and evolution, Frontiers of Software
Maintenance (FoSM) 2008, Beijing, Sept.
-

Oct. 2008, pp 1
-
10.


[LINO:1994]

Linos, Panagiotis; Aubet, Philippe; Dumas, Laurent; Helleboid
,
Yann;
Lejeune, Patricia; Tulula, Philippe
,
"Visualizing Program Dependencies: An
Experimental Study"

Software
-

Practice and Experience
,
Vol. 24, No. 4,
April 1994, pp. 387
-

403
.


[MURPH:1997]

Murphy, Gail and Notkin, David
,
"Reengineering with Reflexi
on Models: A
Case Study"
,
IEEE Computer
,
Vol. 30, No. 8, August 1997, pp. 29
-

36
.


[OPEN:2004]

The Open Group, grep
-

The Open Group Base Specifications Issue 6,
http://www.opengroup.org/onlinepubs/009695399/utilities/grep.html, link
accessed June, 2010.


[PANCH:2007]


Oleksandr Panchenko: Concept Location and Program Comprehension in
Service
-
Oriented Software. 23rd IEEE International Conference on Software
Maintenance (ICSM 2007), October 2
-
5, 2007, Paris, France,
ICSM 2007
:
513
-
514


[QUEI:1994]

Queill
e, J.
-
P.; Voidrot, J.
-
F.; Wilde, N.; Munro, M., "The Impact Analysis
Task in Software Maintenance: a Model and a Case Study", Proc. IEEE
International Conference on Software Maintenance
-

1994,
Victoria, Canada,
September 1994, pp. 234
-

242
.


[SMIL:2009]

David Smiley and Eric Pugh, Solr 1.4 Enterprise Search Server, Packt
Publishing Ltd., Birmingham UK, 2009, ISBN 978
-
1
-
847195
-
88
-
3.


[VONM:1994]

von Mayrhauser, A. and Vans, A. M.,
Dynamic

Code Cognition Behaviors
for Large Scale Code, Proceedings Third W
orkshop on Program
idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

13

Comprehension, November 14
-
15, 1994, Washington, DC, IEEE Computer
Society, pp. 74
-
81.


[WILDE:2007]


Norman Wilde, Dennis Edwards, Sharon Simmons, "Software
Reconnaissance: Experiences with a Simple Requirements Traceability
Technique",

International Symposium on Grand Challenges in Traceability,
TEFSE/GCT’07, March 22
-
23, 2007, Lexington, KY, USA, pp. 103
-

107.


[WILDE:2008]

Wilde, N., Simmons, S., Pressel, M., and Vandeville, J. 2008. Understanding
features in SOA: some experiences f
rom distributed systems. In Proceedings
of the 2nd international Workshop on Systems Development in SOA
Environments (Leipzig, Germany, May 11
-

11, 2008). SDSOA '08. ACM,
New York, NY, 59
-
62. DOI= http://doi.acm.org/10.1145/1370916.1370931

idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

14

A
PPENDIX
A

-

R
ESULTS

OF THE
B
ASIC
M
AINTENANCE
S
CENARIO
S
TUDY


Observations from Scenario 1

Both participants, one using grep and one using SOAMiner, were able to correctly answer
questions related to the maintenance of the hotel reservation system. The participant using

grep
was able to answer the questions within 15 minutes. The participant using SOAMiner was able to
answer the questions within 25 minutes. The observer noted that the difference in time was
attributed to the setup time for SOAMiner. Both users retraced s
teps in their attempt to derive the
sought after information.


The participant using SOAMiner remarked that copying all of the SOA description files into a
special directory, clearing the index, and typing the SOAParser command n times


once for each
of n

files was tedious. The participant using SOAMiner initially forgot to load the index


but
fairly quickly realized the error when the system did not behave as expected. This participant
performed partial loads of the files needed to answer the first quest
ion and then performed a
second load later when additional files were needed for the second question.


At one point the participant using SOAMiner did not realize that he had the answer in the right
panel displayed in SOAMiner. The lack of line numbers in

the display provided by SOAMiner
necessitated the supplemental use of a text editor to answer some of the maintenance scenario
questions. SOAMiner doesn’t integrate access to an editor that displays line numbers.


Comments from the participant using SOAMi
ner
were

that an easier way to load each file into the
index is needed, that line numbers are needed, and that he liked the left panel browser with cloud
tags.


The participant using SOAMiner got to the appropriate location within the file structure fairly

quickly once the files were loaded. The participant using grep entered lots of irrelevant queries
before attaining answers.


Comments from the participant using grep were that grep was fast and provided good support
once he knew the commands, however, he

did not like having to use the command line interface,
and had difficulty keeping track of his location within the file structure.


Observations from Scenario 2


The second scenario was more challenging for each of the participants than the first scenari
o.
Neither participant was able to correctly answer all of the questions regarding the maintenance
scenario.


Both participants only had a novice level of familiarity with
BPEL
s,
WSDL
s, and
XSD
s which
caused difficulty for them in finding answers


the too
ls did not substitute for lack of background
knowledge.


The participant using grep also made extensive use of the vi text editor, however
he
did not
understand the
WSDL
,
BPEL
,
XSD

relationships well enough to navigate within and between
files to find answ
ers. He eventually decided the answer had to be in the
XSD
, and examined that
file in the vi editor, however name changes within those files stumped him.



idiotdisc_6f52f6ab
-
21ec
-
4dd1
-
b5f9
-
28d4211520a6.doc

15

The participant using grep entered many mistyped and irrelevant commands
--
never used a text
editor.


The
SOAMiner
participant

spent 11 of 36 minutes creating the index. He also performed a partial
load of needed files.


Comments from the participant using SOAMiner
were

that the interface is easy to use, and that
options in the left panel browser (e.g.,
filter capability) were very useful. However deriving the
desired meaning from the results was most difficult. The tool was easy to use like other web
searches but did not provide enough information in the results.


Conclusions

-

Suggested Improvements
:


1)

T
he extent of time that both participants expended setting
-
up and loading files into
SOAMiner made evident

that

the need to improving the manner in which users set
-
up and
load files into SOAMiner is a priority. The new set
-
up and load procedures should
acco
mmodate multiple load use cases.

2)

The SOAMiner user
-
interface is not as intuitive as desired. The capability to undo/redo
activities would improve support for the natural thought process of users as demonstrated
by both participants’ attempts to retrace th
eir steps as they progressed through the
scenarios. The integration of a text editor that can be launched from the SOAMiner
interface, possibly when clicking on links associated with files, and the display of files
with line numbers will better support use
rs.

3)

Enhancing SOAMiner with the means to represent domain knowledge about WSDL,
BPEL, XSD

and relationships among these

would be beneficial for users, especially for
the novice user.

4)

Incorporating a built
-
in HELP system for SOAMiner, possibly one where a

user can
hover over tags and get information about the meaning of the tag would be one way to
address the participant remarks about deriving meaning from the SOAMiner output.

5)

A problem with text based
-
tools is with different labels for the same concept. In this case
the
WSDL contained strings such as CancelVehicleOut

whereas in
the XSD the term was
CancellationSta
t
us.


6)

Further testing with users with more familiarity with the X
ML vocabularies involved is
needed to evaluate the usability of SOAMiner for more experienced maintainers.


Future Work


Two issues arose from the evaluation of these results that warrant further work. First, we need to
determine the effect of loading the

same files into SOAMiner multiple times, and second we need
to look at configuration issues such as case sensitivity, partial string match, and explicit
namespace qualifiers to see how these are best handled in SOA maintenance scenarios.