DaleFletter-MastersProject.docx - The California State University

perchmysteriousData Management

Dec 1, 2012 (4 years and 10 months ago)

296 views


SOFTWARE ARCHITECTURE RECOVERY METHOD (SARM) WITH A CASE STUDY ON
A MEDIUM
-
SIZED WEB SITE ACCESSIBILITY ASSESSMENT TOOL


Dale Allen Fletter

B.A.
,
Illinois Institute of Technology, 1990


PROJECT


Submitted in partial satisfaction of

the requirements for the degree of



MASTER OF SCIENCE

in


SOFTWARE ENGINEERING

at

CALIFORNIA STATE UNIVERSITY, SACRAMENTO


FALL

2010






ii






















©
2010

Dale Allen Fletter

ALL RIGHTS RESERVED






iii





SOFTWARE ARCHITECTURE RECOVERY METHOD (SARM) WITH A CASE STUDY ON
A MEDIUM
-
SIZED WEB SITE ACCESSIBILITY ASSESSMENT TOOL


A Project


by


Dale Allen Fletter




Approved by:




__________________________________, Committee Chair

Cui Zhang
, Ph.D.




__________________________________, Second Reader

Robert Buckley
, M.S.




____________________________

Date







iv







Student:
Dale Allen Fletter


I certify that this student has met the requirements for format contained in the University format
manual, and

that this project is suitable for shelving in the Library and c
redit is to be awarded for
the p
roject.



____________
______________, Graduate Coordinator


________________

Nikrouz Faroughi
, Ph.D.

Date

Department of Computer Science






v



A
bstract

of

SOFTWARE

ARCHITECTURE RECOVERY METHOD (SARM) WITH A CASE STUDY ON
A MEDIUM
-
SIZED WEB SITE ACCESSIBILITY ASSESSMENT TOOL


by

Dale Allen Fletter


Production software systems often deviate from their intended architectures and sometimes co
m-
pletely lack a
comprehensive, well designed and documented architecture. Effective design met
h-
ods exist to re
-
engineer an existing system for new or omitted functional and quality requirements
using the architecture. But without knowledge of the system's architecture, it

is necessary to first
perform a software system architecture recovery to understand the as
-
built architecture before
applying these design methods. Most research in this area focuses on the creation of tools for use
in semi
-
automated architecture reco
very

but is of little use for
practitioner
s
. This MS project d
e-
veloped a manual methodology that focuses on the specific activities of architecture recovery
aiming at achieving

maximum efficiency. A method, called Software Architecture Recovery
Method (SARM),
is presented, accompanied by a case
-
study of that methodology on a medium
-
sized (

nearly
300kLOC) website accessibility assessment tool that employs a heterogeneous code
base (OO Python, OTS components, Java).


__________________________________, Committee

Chair

Cui Zhang
, Ph.D.


____________________________

Date





vi


PREFACE


I have made this letter rather long only because I have not had time to make it shorter.

Blaise Pascal (1623
-
1662)


Clarity is our only defense against the embarrassment felt on completi
on of a large project when it
is discovered that the wrong problem has been solved.

C.A.R. Hoare.


This project had its origins in the modest intent to create a database of web accessibility metrics
as part of a course on software metrics. It was in the
literature research that I discovered the E
u-
ropean Internet Accessibility Observatory (EIAO). Since their work was far more extensive than
my modest intent, there was no point in creating an inferior competitor. My next thought was to
port it to a Californ
ia Internet Accessibility Observatory (CIAO). While the resulting information
may have been interesting it did not serve my needs to complete a project sufficient for my d
e-
gree. Yet I did not want to abandon a piece of work that I admired and saw as an imp
ortant co
n-
tribution to internet accessibility.

My re
quirement for my graduation was to complete a project.

The California Code of Regul
a-
tions: Title 5 Education, Section 40510 defines a project as:



a significant undertaking appropriate to the fine and
applied arts or to professional fields. It e
v-
idences originality and independent thinking, appropriate form and organization, and rationale. It
is described and summarized in a written abstract that includes the project’s significance, obje
c-
tives, methodol
ogy, and a conclusion or recommendation.





vii


On the recommendation of my advisor I reviewed the system for its use in a demonstration of the
techniques of Carnegie Mellon’s Software Engineering Institute (SEI). Given the size and co
m-
plexity of the EIAO system
, even a minor enhancement to that system was a significant underta
k-
ing and worth of a Master’s Project.

My original intent was to extend this work to make it applicable to the original intent of EIAO but
specifically for the needs of assessing
California
university web sites for

web site accessibility.
N
o one is currently maintaining the EIAO implementation and the primary architects and deve
l-
ope
rs are now engaged with eGovMon creating an opportunity to continue on this tack. But since
there were inadequac
ies in the EIAO system that needed to be resolved before eGovMon could
proceed, that team made significant improvements in the EIAO code base before beginning their
more ambitious project.

I was advised
by those researchers
to use the eGovMon code base fo
r
my purposes and began implementing a copy of
the eGovMon code base in Spring 2010. T
he sy
s-
tem was of sufficient complexity and insufficiently robust to be easily
ported

without consider
a-
ble support from that team.
With their help I was able to port it to

my own platform but quickly
realized that extending it for the purpose of creating CIAO was a larger project than I wanted to
tackle while also largely lacking in the academic challenge that would justify a Master’s Project

(it consist of mor
e than 236,00
0 lines of code). Even t
o do that I found it necessary to reverse e
n-
gineer significant portions of the downloadable package in order to determine how to diagnose
errors that arose. In time
understanding the underlying architecture

became the focus of my wo
rk
and the subject of
the

case study.

In the literature the phrase “architecture reconstruction” is found more often than “architecture
recovery” although in context they mean the same thing. However for this project the phrase “a
r-
chitecture recovery” has
been favored over “architecture reconstruction.” In his book, Software



viii


Architecture in Practice [Bass 2003], Bass states “…every computing system with software has a
software architecture”
. In keeping with this view
,

that every system has an architecture s
eparate
from its representation, I believe the proper term for the activity which creates a representation of
this architecture “recovery” and not “reconstruction” since we are attempting to construct or r
e-
construct the representation, not the architecture
. The architecture will remain the same regardless
of the representation we may document. In deference to SEI, I have used
“reconstruction” in re
f-
erence to their work

in this project.

Architecture recovery is not just architecture documentation.
Documentation is creating the repr
e-
sentation of the architecture when knowledge of its structures is known and understood in rel
a-
tionship to the quality needs placed upon the designers and
implementers
. Architecture recovery
is the rediscovery of those str
uctures, perhaps
the original

needs of the stakeholders
and the inte
r-
relationships between them. Since the architecture exists independently of its representation

and
since every system has an architecture, this process of rediscovery makes it more challen
ging
than choosing the suitable representations. It requires the skills that were needed when the system
was designed and built since only someone who could have designed the system is capable of
understanding the choices faced by the designer and the trad
eoffs that they made. To do this in
the absence of a written product beyond the source code is a severe challenge.

This project is primarily a
n elaboration of the
methods of

SEI for the reconstruction of the arch
i-
tectural views of a system under study with

a
case stud
y based on that elaborated methodology
.
The primary
references in this project

are two books published under the SEI Series in Software
Engineering: Software Architecture in Practice, 2
nd

ed [Bass 2003] and Documenting Software
Architectures: V
iews and Beyond [Clements 2003]. My intent was to create a case
-
study of a si
g-



ix


nificant, but manageable system which had sufficient complexity to demonstrate the power of
these techniques in a practical setting.







x


ACKNOWLEDGMENTS


To look at the finished p
roducts of these academic projects suggests that they primarily spring
forth from the minds of the authors. But to do one shows how interdependent the academic co
m-
munity is and how much
of a contribution is made by people who receive minor credit or som
e-
ti
mes no credit at all. In my case any credit for this work must be shared with my
advi
sor, Dr. Cui
Zhang, for without her encouragement and enthusiasm for the discipline of software engineering I
doubt I would have persevered.
Robert Buckley also provided a

great deal of inspiration during
my program and some perspiration in reviewing this manuscript.
I must also provide a sincere
thank you to
Morton Goodwin Olsen

of the eGovMon project who was more than gracious in
helping me through the early stages of por
ting their system to my environment. Without his a
n-
swers, this project would not have been possible. Of course no one with a spouse can ignore their
contributions of tolerance, support and encouragement through the many ups
-
and
-
downs of a pr
o-
ject like this
.







xi


TABLE OF CONTENTS


Page

Preface

................................
................................
................................
................................
............

vi

Acknowledgments
................................
................................
................................
............................

x

List of Figures

................................
................................
................................
................................

xii

Chapter

1.
INTRODUCTION

................................
................................
................................
......................

1

2.
BACKGROUND

................................
................................
................................
........................

4

3.
LITERATURE REVIEW

................................
................................
................................
...........

6

4.
SOFTWARE ARCHITECTURE RECOVERY METHODOLOGY (SARM)

........................

17

5.
THE EGOVMON CASE STUDY

................................
................................
............................

27

6.
CONCLUSIONS
................................
................................
................................
.......................

58

Appendix A
. A Partial Recovery of the eGovMon Software Architecture

................................
....

60

Appendix

B
.
Pre
-
Existing Architecture
Documentation for EIAO

................................
.............

141

References


................................
................................
................................
................................
..

145






xii


LIST OF FIGURES


Page

Figure 1 (Fig from O'Brien 2003
-
b, p2)
................................
................................
............

15

Figure 2 SARM Diagram

................................
................................
................................
..

18

Figure 3 O'Brien's Element Types and Relations

................................
.............................

23

Figure 4: Scope Diagram

................................
................................
................................
..

32

Figure 5 from http://www.egovmon.no/en/research/figure1.html

................................
....

33

Figure 6


The EIAO Architecture (from http://www.eiao.net/eiao)

................................
..

34

Figure 7 Static relationships for Crawler

................................
................................
..........

56

1




Chapter 1

INTRODUCTION


Some large software systems can remain in production for many years, even decades. We know
that maintenance and upgrades are a significant component of the total cost of ownership over
the

product life
-
cycle
. Given the amount of money spent on the re
-
engineering of these long
-
lived
systems, it is interesting to look at the tools we have to perform these activities effectively and
efficiently. We already know about the benefits of using a
n architecture
-
centric approach for the
initial design and implementation. But we often err in not following that approach, not impl
e-
menting according to that architecture or allowing the production system to depart from that a
r-
chitecture plan over time.
When we
s
tart a significant re
-
engineering effort on
a

syst
em where
the as
-
built architecture deviates from the as
-
designed architecture or where there is no compr
e-
hensive or well designed architecture at all, we cannot use an architecture
-
centric design
a
p-
proach for the re
-
engineering without first reconstructing (recovering) the as
-
built
architecture of
that system
. This project focuses on the methodology that a practitioner can use to methodically
recover that as
-
built architecture to enable the use of
architecture
-
centric design during the re
-
engineering of the system.

An

architecture
-
centric design approach can ensure

that the development can more predictably
meet its non
-
functional (q
uality) goals as well as the desired features and functions
.

[Bass
2003]

If a maintenance team has a well documented and accurate architecture for the system to be mod
i-
fied, they can approach the maintenance as an extension of the original development.
When that
architecture is not available
, the team must first recover t
he architecture of the system as
-
built.

2




Recovering the architecture from a production system that may be lacking in current architecture
documentation is challenging. Any team that will tackle this challenge must have an approach lest
they find themselves
spending too much time on low
-
value activities. A clear plan and a well u
n-
derstood methodology for how the project will be approached can dramatically improve the pe
r-
formance of the team. This project presents such a methodology to structure the team effor
t and
coordinate their effort to achieve improved efficiency.

The term software architecture is new enough to have no generally agreed upon definition. Before
discussing any topic that involves software architecture in a rigorous manner, it is necessary t
o
define the term. While more generally understood, even methodology needs to be understood if
there is to be an understanding of what we hope to achieve by creating one for this purpose.
These, and other topics, are elaborated in Chapter 2, Background, wh
ich should give the reader a
sufficient base to understand the remainder of the work in this project.

This project is not the first to
look at the activities and methodology of software architecture r
e-
covery. Carnegie Mellon’s Software Engineering Institut
e has published many papers on the su
b-
ject and included it in some of their most popular books [Bass 2003], [Clements 2003], [Ivers
2004], [Kazman 2003], [Nord 2009], [O’Brien 2002], [O’Brien 2003
-
a], [O’Brien 2003
-
b].
T
he
prior research has focused on the

academic issues of attempting to automate the chore of extrac
t-
ing information from the source code of the system.
Their work is discussed in some depth in
Chapter 3, Literature Review.

The intent of this project is to propose a novel methodology for softw
are architecture recovery.
This methodology is presented in Chapter 4. How this methodology differs from the prior met
h-
odologies is explored. The activities, their relationships and the artifacts produced are described.

3




To show that this methodology has pr
actical benefit, it is used in a case
-
study to recover the sof
t-
ware architecture of a
real

system. The system under study is a medium
-
scale
web site accessibi
l-
ity assessment system that was developed in Europe. The first implementation of the system was
pu
blished as EIAO (European Internet Accessibility Observatory). It was used to gather evidence
of website accessibility across the EU and various industry categories. [
Nietzio 2006], [Ulltveit
-
Moe 2008]

At

the end of that project, the researchers shifted t
o a focus on measuring other a
s-
pects of government websites in addition to accessibility.
This new project was called eGovMon
(eGovernment Monitoring)
These attributes include transparency, efficiency and impact. The
starting point for their eGovMon proje
c
t was the end
-
state of the EIAO

project. The specific ve
r-
sion that is used for this case study is a version from early in the development of the eGovMon
system, after some of the deficiencies of the EIAO system were addressed but before the
measures for tr
ansparency, efficiency and impact were added. The case study is presented in
Chapter 5 and the partial architecture documentation is contained in Appendix A.





4




Chapter 2

BACKGROUND


While the functionality, size and complexity of software systems continu
e to grow, so too do the
qualities we expect of those systems
, qualities such as performance, usability, availability, test
a-
bility, modifiability and security
. Mere functionality
is
no longer sufficient, if it ever was. Many
academics have turned their att
ention to the challenge of engineering in the qualities that we need.
If a piece of embedded software is to become part of a fly
-
by
-
wire system in an airplane it cannot
have the dependability quality common in consumer software products but must have a mea
n-
time
-
to
-
failure rating that runs to at least many decades before we feel comfortable stepping onto
that plane.

Significant work
in finding ways to engineer qualities into software systems

has been done at
Carnegie Mellon's Software Engineering Institute
(SEI) among others.
[Bass 2003], [Clements
2003], [Stoemer 2007]
Their work depends upon the articulation of a system's architecture as an
early artifact of the system design process so that there is assurance in the design process that the
system will meet

some understood level of quality in the final product. This saves the design team
from the fruitless effort of trying to tack some quality onto the final design. The articulation of a
w
ell
design
ed

and well documented software system architecture provides

the first tangible art
i-
fact of the system that can support reasoning about the design's ability to deliver the intended
qualities.

SEI and others have found ways to incorporate the architecture into the overall design process
while a system is being built
. But not every system is developed with this kind of forethought or
5




with a well documented architecture. It is not as much that the system lacks an architecture, (all
systems have an architecture, no matter how bad) but that it may never have been clearly

doc
u-
mented or followed by the implementers. Many successful systems are running today that may
have little or no current documentation of the architectural decisions that were made when the
system was first built. Consequently the maintainers of those sys
tems make decisions in a vac
u-
um and often begin to deviate from what the original architect may have intended. There is even a
term for this

form of entropy

-

architectural rot.

Now that we are seeing major systems that have lifetimes that are measured in
decades, we are
faced with the dilemma of how to perform major overhauls of these systems while maintaining
the quality in the original product and improving other qualities
while

adding new features. Many
teams faced with this challenge have opted to igno
re the current system and develop from scratch.
But as the scope and complexity of these systems grows, this becomes progressively more infe
a-
sible. Re
-
engineering and not re
-
building is becoming more common.

Where a development team has a system which adhe
res to a well designed and well documented
architecture, it is possible to step back into the development process that was used for the original
development. But where the architecture has been lost or when architectural rot has cause
d

the as
-
built system to deviate seriously from the as
-
design
ed

architecture, the team has no choice but to
either approach the project as new development or to find a way to recover the as
-
built archite
c-
ture before proceeding. It is this second challenge,
the recovery of the as
-
built
architecture that

this project addresses. It looks at the ways that the development team can take the current system
and attempt to recover a usable interpretation of that system that can support its re
-
engineering
and avoid th
e process of developing the system again.

6




Chapter 3

LITERATURE REVIEW


The most visible voices in the area of software architecture can be found at Carnegie Mellon’s
Software Engineering Institute (SEI).
A

canonical text for software engineering is

Softwa
re A
r-
chi
tecture in Practice [Bass 2003]. In it,
the authors give an overview of the
topics which are of
interest to this project including documenting software quality requirements, the evaluation of
software architectures and the reconstruction of softwar
e architectures.

Quality Requirements

A significant concern of architects is ensuring that the implementation of their designs exhibit the
qualities desired by their clients. The basic

thesis of SEI is that it

qualities, specifically th
e non
-
functional
qualities,

give shape to the architecture. Therefore an important consideration of the
design process is the capturing of the client’s quality requirements in a way to supports the arch
i-
tecture design process as well as the subsequent design decisions.


Th
e starting point for how to gather and document the quality requirements for a software product
is
the

Bass

text [Bass 2003]
chapter 4 titled Understanding Quality Attributes (co
-
authored with
Felix Bachmann and Mark Klein)
In it
they provide a way of docu
menting the quality requir
e-
ments by documenting specific concrete Quality Attribute Scenarios. Each scenario is a quality
-
attribute
-
specific requirement and consists of these 6 parts:



source of stimulus (human, computer system, time event or other actuator
)



stimulus (condition that needs to be considered)

7






environment (the overall state such as an overload condition, normal operation, shut
-
down, etc)



artifact (that part of the system stimulated by the stimulus, usually some component of
the overall system)



r
esponse (the
activity

and perhaps message or action that results from that activity)



response measure (some appropriate metric for this scenario)

One advantage of this approach is that it avoids the discussion of quality attribute taxonomies. As
long as ev
eryone agrees that the scenario captures the desired quality of the system, it doesn't
matter how it is categorized. The authors discuss general scenarios for the six quality attributes in
their own taxonomy: availability, modifiability, performance, secur
ity, testability, usability.


Architecture Evaluation

A primary motivation for articulating the software architecture as an early step in the design pr
o-
cess is to enable reasoning about the qualities that the system implemented from this design will
exhibi
t. This architecture evaluation gives leverage over the later design decisions.

Bass [Bass 2003] presents a 9 step architecture
analysis method

that he calls Architecture
Tradeoff Analysis Method (ATAM). Those steps are:

Step 1: Present the ATAM

Step 2: Pr
esent the Business Drivers

Step 3: Present Architecture

8




Step 4: Identify Architecture Approaches

Step 5: Generate Quality Attribute Utility Tree

Step 6: Analyze Architectural Approaches

Step 7: Brainstorm and Prioritize Scenarios

Step 8: Analyze Architectu
ral Approaches (again)

Step 9: Present Results

In addition to Bass’ ATAM a more recent doctoral dissertation by Christoph Hermann Stoermer
titled
Software Quality Attribute Analysis by Architecture Reconstruction (SQUA
3
RE)

[Stoemer
2007]offers an alternati
ve to evaluation of a designed architecture by performing quality analysis
in parallel with the reconstruction of a system’s architecture.


Documenting Software Architecture

To enable ATAM or any form of architecture analysis requires some artifact that
can s
upport the
analysis. T
he researchers at SEI are the most important voices on this topic and the book, Doc
u-
menting Software Architectures by Clements et al [Clements 2003] is one of the dominant texts
to address this topic. It lays out the SEI recommen
dations for the presentation of software arch
i-
tecture.

As Clements argues, while a
pictorial

representation alone is insufficient to adequately commun
i-
cate the full meaning of any given architectural view, it is often the primary presentation of that
view

with text used to elaborate and extend the
pictorial

representation. The most dominant form
9




of graphic communication for software is
Unified

Modeling
Language
, UML. While SEI does not
proscribe the way architecture information should be displayed graphica
lly, it does require a le
g-
end when it is not a standard such as UML. UML provides all the support needed for all the mo
d-
ule views except layers. But for the
Component
-
and
-
Connector

views UML has distinct disa
d-
vantages that must be overcome.

One problem in

Component
-
and
-
Connector

views is that the connector itself can have sophist
i-
cated behaviors. UML only supports a connector which cannot itself hold the required info
r-
mation. Both Bass and Clements [Bass 2003], [Clements 2003] discuss some alternative ways

of
using UML 1.0. Additional methods are presented in the paper Documenting
Component
-
and
-
Connector

Views with UML 2.0. [Ivers 2004] The basic choice is to either represent the connec
t-
or as an association, a class or a stereotyped class in UML 2.0.


Review of Architecture Reconstruction Literature

Reconstructing architecture from an existing system can be a difficult process. The hazards and
barriers to be found in the process are named in The Perils of Reconstructing Architectures by
Carriere and Kaz
man. [Carriere 1998]. One significant barrier is the potential for the actual arch
i-
tecture to differ from the intended architecture. The qualities intended by the architectural dec
i-
sions are only achieved if the system is implemented as designed.


Beyond
its importance for measuring architectural conformance, software architecture
reconstruction also provides important leverage for the effective reuse of software assets.
The ability to identify the architecture of an existing system that has successfully m
et its
10




quality goals fosters reuse of the architecture in systems with similar goals; hence arch
i-
tectural reuse is the cornerstone practice of product line development.


They also observe the insufficiency of static information
derived from lexical analysi
s
alone to
fully understand the architecture:


Unfortunately, system models extracted using these techniques provide a minimum of
information to describe the run
-
time nature of the system. The primary factor contri
b-
uting to this deficiency is the widesprea
d use of programming language features, opera
t-
ing system primitives and middleware functionality that allow the specification of many
aspects of the system's
topology

to be deferred un
til run
-
time.”

These aspects include “polymorphism and first
-
class funct
ions (including approxim
a-
tions such as those provided by C and C++); operating system features such as propri
e-
tary socket
-
based communication and message passing; middleware layers such as
CORBA.” [Carriere 1998]

They also mention some other common obsta
cl
es such as non
-
compilable code,
missing source
elements, language dialects,

and
obscure or non
-
reproducible hardware platforms.

All these fa
c-
tors contribute to the difficulty of a tool
-
assisted reconstruction.

They suggest that ATAM c
an
serve as a struc
tured way
to elicit architectural information

besides being a way to review and
analyze an architecture
.

Guo, Atlee and Kazman wrote about a semi
-
automatic method to extract software architecture
from source material [Guo 1999] that used a tool called Dali
.
Guo says “An architecture recovery
method defines a series of steps, and the pre/post conditions for each step, to guide an analyst in
11




systematically applying existing reverse engineering tools to recover a system’s architecture.”
[Guo 1999, p7]

They not
ed that

it is difficult to use these methods to recover architectures that are designed and
implemented with design patterns. As de
sign patterns are described as well
-
defined structures
with constraint rul
e
s, a patter
n
-
oriented archi
tecture recovery metho
d must incorporate the design
pattern rules as well as structural information such as the system decomposition hierarchy.


They
propose the Architecture
Reconstruction

Method (ARM)

a semi
-
automatic analysis method for
reconstructing architectures based on

the recognition of architectural patterns.


Their method
consists of four major phases:



Developing a concrete pattern recognition plan



Extracting a source model



Detecting and evaluating pattern instances



Reconstructing and Analyzing the architectu
re.


The first phase, developing a concrete pattern recognition plan, consists of defining the patterns
that are expected (or hoped for) in the source material in a form that supports pattern recognition.
Since these researchers are using Dali, their patte
rns are specified using Rigi Standard Form
(RSF).

The second phase is extracting information from the source model. Detecting and evaluating pa
t-
tern instances is the mining of the database using the pattern templates to detect the source el
e-
ment and relati
ons that realize the patterns.

12




Reconstructing and analyzing the architecture is the process of providing the artifacts which e
x-
press the design patterns found in the system, usually as a visual presentation using the Rigi tool.

In the paper
Architecture Re
construction Guidelines, Third Edition

[Kazman 2003], the authors
not only elaborate on the text above but also discuss some of the tools that can, and usually
should, be employed when approaching a large
-
scale reconstruction project. For data extraction
these tools include parsers (Understand for C/C++/Java, Imagix, SNiFF+, C++ Information A
b-
stractor aka CIA, rigiparse), abstract syntax tree (AST)
-
based analyzers (Gen++, Refine), lexical
analyzers (Lightweight Source Model Extractor), profiles (gprof) and

ad
-
hoc tools like grep and
Perl. The authors observe that these tools are limited to extracting static information about the
code. Because of late
-
binding due to polymorphism, function pointers or runtime parameteriz
a-
tion and other mechanisms, the dynamic

structure of the system might not be reconstruct
-
able
without additional information that must come from other tools such as code instrumentation, and
such tools may not even be available in some situations such as embedded code.

The Kazman

paper also discusses the burdens of storing and organizing the large amount of i
n-
formation that can result from the information extraction phase of a large system.

As they o
b-
serve, “…with the large amount of software in most systems, it is nearly impossi
ble to perform all
architecture reconstruction activities manually.”
[ibid]

In particular, this paper talks about a tool
developed at SEI called Architecture Reconstruction and Mining or ARMIN which was co
n-
structed with the intended purpose of assisting
with the architecture reconstruction of large sy
s-
tems.

In
chapter 10

of the Bass text
[Bass 2003,
pp

231
-
259]

the authors address architecture reco
n-
struction.

In it, they present four activities

that are executed iteratively:

13






Information extraction



Database construction



View fusion



Reconstruction

In their approach it is assumed that several tools are employed to extract the available relatio
n-
ships among the components in the source code. In this text they specifically mention the Dali
workbench t
hat was developed at SEI.

Their next step includes the conversion of the information extracted by each of the tools into a
standard form such as Rigi Standard Form (a tuple
-
based data format in the form of relationship
<entity1> <entity2> and the storage

of this information into a database for later retrieval and
analysis.

Due to the inherent limitations of lexical analysis, the information extracted using different tools
will result in incomplete lists of the elements and relations. Therefore a needed ac
tivity is the
manual analysis, verification and reconciliation of these various

views


of the data. (The view of
the system as given by these tools should not be confused with what these authors define as arch
i-
tectural views earlier in the text.) The auth
ors call this activity view fusion.

The final activity is ca
lled reconstruction
. It takes the improved database from the view fusion
activity and seeks to abstract away the non
-
architectural information to support the expression of
the true architectural
views which will document this architecture. The most important part of this
activity is the pattern matching wherein the people involved in the reconstruction map the el
e-
ments and relations of the source material to the known and recognized patterns used
to achieve
the desired qualities. Since their approach assumes the use of a workbench their reconstruction of
14




elements consists of language statements that define the elements in relation to the database.

For
the reconstruction activity, they offer several

practical guidelines. Briefly they include iterating
the reconstructed architecture with the system’s architect, be
ing

judicious in what is brought into
the architecture (do not

list every source element), using

naming conventions and directory stru
c-
tures

to infer intended structure, and
using
a guideline which emphasizes the need of personal
knowledge with the product. “As reconstruction proceeds, information must be added to re
-
introduce the architectural decisions which introduces bias from the reconstr
uctor and thus rei
n-
forces the need for a person knowledgeable in the architecture to be involved.”
[ibid]

Kazman, O'Brien and Verhoef published a paper titled Architecture Reconstruction Guidelines,
Third Edition in Nov 2003
[Kazman 2003]
in which they pres
ent their guidelines for an archite
c-
ture reconstruction project.
In Architecture Reconstruction Guidelines, Third Edition [Kazman
2003], the authors characterize software architecture reconstruction as “the process where ‘as
-
built’ architecture of an imple
mented system is obtained from an existing legacy system.”

Those guidelines are:



articulate a goal which explains why the organization is performing the reconstru
c-
tion



obtain a high
-
level architectural view of the system before beginning the detailed r
e-
co
nstruction process



use existing documentation



involve people familiar with the system



assign someone to work full
-
time on the project

15




“Use the ‘least effort’ extraction. Consider the kind of information that needs to be extracted from
a source corpus and c
hoose the most appropriate tool. Is the information lexical in nature? Does it
require the comprehension of complex syntactic structures? Does it require some semantic anal
y-
sis?”

[Kazman 2003]

As Kazman observed “…with the large amount of software in most
systems, it is nearly imposs
i-
ble to perform all architecture reconstruction activities manually.” [Kazman

2003]
For even a
moderate sy
stem
, the amount of data to be extracted and managed is signifi
cant.

In

Architecture Reconstruction of J2EE Applications
: Generating Views from the Module
Viewtype, O’Brien el al

[O’Brien 2003
-
b]
present
s

the

process diagrammatically
. This can be
seen in

:


Figure
1
(fig from O'Brien 2003
-
b, p2)

16





In Quality
-
driven software re
-
engineering, Thavildari et al
[Tahvildari 2003]
talk about a model
analysis phase which focuses on documenting and understanding the architecture and the fun
c-
tionality of the legacy system. In total, their re
-
engineering fram
ework (life
-
cycle) consists of
these phases:



requirements analysis



model analysis



source code analysis



remediation



transformation



evaluation

Jansen also addresses how architecture design decisions can be recovered after the fact
(Jansen, et
al., Aug 2007)










17




Chapter
4

SOFTWARE ARCHITECTUR
E RECOVER
Y

METHODOLOGY

(SARM)


For small systems, there is rarely a need for a distinct methodology to reconstruct the archite
c-
ture. In those cases, direct examination of t
he system artifacts such as source code, operational
instructions and informal system descriptions is sufficient to gain an understanding of the inten
d-
ed architecture. But for large systems the amount of source code creates a burden on individuals.
For th
ese larger systems it becomes necessary to approach the extraction of information in a m
e-
thodical fashion and frequently with the use of some tool support.

By definition a methodology must lay out a series of steps which while necessary are not suff
i-
cient for the successful achievement of the task to be performed. The steps which must be a
c-
complished to recover a software architecture are:



Planning



Gather Evidence



Analyze



Create Views



Present

As shown in Figure 2, t
hese steps are
iterated

until a sati
sfactory outcome is achieved. Also, it is
common for these steps to occur in parallel in the early iterations with
shifting attention based on
the availability of the resources and schedules and reflecting the investigatory nature of the task.
In most case
s, these activities will be performed iteratively until a satisfactory result is achieved.

Usually, the last iteration will perform these steps one last time in this order.

18





Figure
2

SARM Diagram


Plan
n
ing

Like any project,
initial planning is important
.

Also like most projects, things will happen that
will require an adjustment to that plan. This methodology does not require many artifacts from
the planning activity but it does require a few.

There are two vital documents re
quired for the in
i-
tiation of this sub
-
project; the initiating project’s charter and the architecture drivers in the form
of quality requirements. In addition, t
here are some basic items that must be in place before the
main work of this effort should be al
lowed to start. These include:



identification of
the stakeholders of this project



the scope of study



the results of investigation into those tools that could be used for the work



some ground rules for the decision making process that will conclude the proj
ect

19




A key decision point in the project
regards

the architecture documentation presented to the stak
e-
holders. Those stakeholders must judge when
the representation of the as
-
built architecture is su
f-
ficiently detailed, accurate and complete enough to suppo
rt the needs of the greater project. There
is no oracle possible for this decision; it must be the collective decision of the stakeholders.
Therefore knowing the decision makers and their needs is important if the recovery is to proceed
smoothly from initi
ation to conclusion.

The initiating documents of the initiating project should provide sufficient information to dete
r-
mine the proper scope for the recovery project. However the architecture will serve needs for the
stakeholders that go beyond the immedia
te need to re
-
engineer the system. The stakeholders may
depend upon the architecture for a many other purposes such as long
-
range enterprise architecture
planning or operational support. For these reasons and more, the scope of the recovery effort
should b
e considered separately from the initiating project.

As researchers have shown, the use of tools to assist with the static analysis can be helpful.
[O’Brien 2002][O’Brien 2003
-
1][O’Brien 2003
-
b]
Appendix C shows some of the tools me
n-
tioned in papers that w
ere reviewed for this project. But the choice of tools will depend upon se
v-
eral factors. Will the tool support the code base? How dependable are the results of the tool ana
l-
ysis? If the results will require a complete manual analysis to verify the tool res
ults, little may be
gained. What are the costs of the tool in terms of both acquisition and training?



20




Gather Evidence

While academically less interesting, a key activity in the recovery of an architecture is the inve
s-
tigative work of finding helpful doc
uments and collecting the memories from people with
knowledge of the system. People successful in this activity must possess both the skills of an e
x-
perience
d

business analyst
and knowledge of software architecture principles. The goal is to find
all evide
nce that will illuminate the design choices that were made in the design and implement
a-
tion of the system under study and then document them in a way to makes them available to the
later activities.

This is a discovery activity and

like any discovery

is likely to
proceed from the rapid and general
to the specific and labored during the course of the project. Some information, such as the source
code and any end
-
user documentation are likely to be easily secured. In some cases these may be
sufficient b
ut often making the tie from the in
-
place system and the architecture patterns that had
been intended may prove elusive. This section of the methodology provides some practical advice
on avenues of investigation to consider.

In many cases the system in pro
duction was installed many years ago and personnel have
changed. Some may have been promoted to other positions in the company and others may have
left the organization. In some cases there are some people in the same roles. In an ideal engag
e-
ment the orig
inal architect is available to the team. This would be the starting point for this pr
o-
ject. In the planning step the team was introduced to the stakeholders. Identify which stakeholders
were also stakeholders for the original project. Ask each stakeholder
if they have a copy of any
presentations that were made which presented the original architecture or can provide an intr
o-
duction to
whoever

made that presentation.

21




An assumption of this methodology is that architecture recovery is never done in a vacuum.
Any
effort to recover a software system architecture is driven by some larger purpose; most likely the
re
-
engineering of that system for new or altered requirements.

One motivation for the clear articulation of the architecture is to reason about the qual
ities of the
system implemented according to that architecture. These early design decisions ensure that focus
is placed on the most important qualities that must be achieved in the implementation and that the
proper tradeoffs are made in subsequent design

decisions.

Even when little or no explicit documentation of the original quality requirements has been
found, there may be evidence which suggests the most valued quality requirements of the as
-
built
system. Regardless of whether the quality requirements

are recovered from the original impl
e-
mentation or inferred from historic or contemporary sources, the recovery team should attempt to
capture them in a systematic way to aid in the documentation of the architecture. The justification
for capturing the qua
lities of the as
-
built system is to provide as sound a base as possible for the
subsequent analysis of that system.


Quality Requirements Documentation

The methodology used for capturing quality requirements is directly taken from Clements [Cle
m-
ents 2003].

In his methodology, the stakeholder quality requirements are documented using his 7
part scenario which includes the source, stimulus, artifact, environment, response and response
measure. Using these quality requirements, the architect can argue how the
architecture supports
the requirement.

22




The requirements document from the original development for this system
may be difficult or
impossib
le to find
. Ideally the architecture will make explicit the relationship between the quality
(non
-
functional) requir
ement and the architectural pattern employed. If explicit documentation of
the original project cannot be found, then the interviews with people should include questions
about the requirements of the prior system and current requirements.
Each stakeholder
can pote
n-
tially introduce someone with data previously unseen.
While it may be impossible to always sep
a-
rate what was a requirement at the time of the original implementation and the current needs
which are driving the re
-
engineering, it should be done as
best

as possible.

One of the goals of an architecture recovery is to enable reasoning about the qualities of the sy
s-
tem. Since the point of this reasoning during design time was to determine if the system built to
the given architecture will achieve the i
ntended qualities. Insofar as it is possible, a complete a
r-
chitecture recovery will capture the original architectural drivers and recreate the reasoning of
how the (at the time) proposed architecture achieved the
required

qualities of the
original stak
e-
ho
lders
.


Analyze

Almost immediately following the collection of any documentation, the analysis of that docume
n-
tation can begin.
The overall objective of the analysis activity is to ensure that enough info
r-
mation has been gathered to support the generation
of the views. There will be some overlap b
e-
tween analysis and Generate Views since it is the views that enable some level of analysis of the
system.

23




Analysis must verify that the elements and relationships in the
syste
m under study are collected.
T
he element types and relations between them
in O’Brien’s [O’Brien 2002] work
are

shown in
Figure 3.

Relation Name
Source Element
Target Element
Explanation
defines_fn
Class
Function
A class defines a function
contains
File
Function
A file contains a function
defines
File
Class
A file defines a class
defines_class
Package
Class
A package defines a class
defines_global
File
Global_variable
A file defines a global variable
defines_var
Function
Local_veriable
A function defines a local variable
depends_on
File
File
A file depends on another file
has_member
Class
Member_variable
A class has a member variable

Figure
3

O'Brien's Element Types and Relations


This list can be used but at a minimum, the decompositio
n of the modules, the
is
-
a

relationships
of object
-
oriented systems and the
uses

relationships must be extracted.

During the course of analysis, it is possible that the team may discover discrepancies between the
as
-
designed architecture and the as
-
built a
rchitecture. There are many reasons why this might be
so but the reasons for the discrepancies are less important during recovery. The goal of the team
in documenting the architecture is to attempt to document as much as can be known both about
the as
-
desi
gned and as
-
built architectures.

The first analysis done is the module level relationships. If good architecture documentation is
available on the system, this can proceed in a top
-
down fashion
investigating

the modules that
comprise a high
-
level module a
nd the relationships that exist between those modules. When the
24




high
-
level modules have been documented to the extent possible
, it will be necessary to proceed

bottom up.

In the absence of all other quality requirements, one that will be universally found
is some level
of organization which enabled the system to be buildable. The development team needed to o
r-
ganize themselves to enable them to work together. Unless the number of modules was so small
that every member knew every module, they needed to be org
anized into a structure that enabled
each member to both work independently yet still be contributing to the overall design. The most
common way this is done is in a functionally hierarchical structure where the modules are leaves
in a tree of some height.

Each major branch represents some abstraction of the leaves or other
braches attached to it.

In the O’Brien work, the need was only to reconstruct the static decomposition of the system u
n-
der study. In this project the need is to reconstruct both the sta
tic and dynamic architecture. Ho
w-
ever the overall approach is still appropriate.


Generate Views

This activity is separated from Analyze since the creation of the formal views
requires two skills
that are not needed in Analyze: visual communication and
synthesis. This will work in concert
with Analyze since in performing the synthesis of the view questions will arise. If the information
cannot be found in the document repository then it may trigger further analysis or even more r
e-
search for evidence. It
is possible that there will be gaps which must then be noted in the final
version of the view.

25




Each view is introduced with a primary presentation. For most module and component
-
and
-
connector views this is
a

semi
-
formal diagrammatic presentation. Semi
-
form
al means that every
pictorial element on the diagram
has a specific meaning. For the module diagrams UML works as
well as any other notation. However for component
-
and
-
connector
views
UML offers some cha
l-
lenges as a semi
-
formal notation.
This methodology d
oes not advocate any specific method of
depicting the component
-
and
-
connector views. In the associated case study, the reader can find a
discussion of how this was resolved in that case.

A necessary skill for the person responsible for the creation of the
primary presentation diagrams
besides an understanding of the semi
-
formal notation being used, is the ability to lay out the vis
u-
al elements in a way to emphasize the underlying organization of the architecture pattern.


Present

During planning, the form of the presentation can be negotiated. However the recommended g
o-
ing
-
in position should be SEI’s ATAM. It is a well articulated methodology of its own.
For an
understanding of the ATAM the reader is directed to the Background sect
ion of this paper (Cha
p-
ter
3, Literature Review for Architecture Eva
lu
ation). But before the ATAM is initiated with the
stakehold
ers, the team should review the intended presentation to ensure it meets their internal
quality standards. Better to cancel an
planned presentation than alienate the stakeholders with a
wasted meeting.

The primary reason for this presentation is to determine if the architecture is sufficient for this
project. As mentioned in the section on planning, the recovery of an architecture

is purpose dri
v-
en. It must support the needs of multiple stakeholders and they are the final arbiters of its success.
26




If the architecture fails to satisfy some stakeholder, the shortcoming must be noted and an unde
r-
standing of what can be done to correct
it must be discussed at the meeting.

If the decision is made that the architecture as presented is sufficient, the team must produce the
final version and distribute it.


The Document Repository

While not part of the methodology, it cannot function withou
t a document repository. The ult
i-
mate document will have many authors and include many exhibits which will be independently
produced.





27




Chapter
5

THE EGOVMON CASE STU
DY


A Brief Overview of Website Accessibility

The basic paradigm of website accessibili
ty is the concept of barriers to accessibility. The stan
d-
ards set by World Wide Web Consortium (W3C) in the Web Content Accessibility Guidelines
(WCAG) and as articulated in US law, list the barriers that are to be avoided in the creation of
websites but n
o quantitative method exists to unambiguously assess compliance to these guid
e-
lines since many require human judgment. The most oft cited example is the descriptive text (alt
-
text) for an image. The guidelines recommend that this text provide a suitable de
scription of the
image to allow someone without access to the image to make use of the page, to the extent poss
i-
ble without it. The W3C has worked over the years to create the Unified Web Accessibility
Methodology (UWAM) which has the objective of quantify
ing accessibility where possible as
well as providing a recommended methodology for the evaluation of websites by different orga
n-
izations.

Many organizations have an interest in accessibility for a variety of reasons. However gover
n-
ments in the US and abr
oad are intensely interested in assessing their own service to the public
for both humanitarian and legal reasons. In 2004 a project was co
-
funded by the European Co
m-
mission (EIAO publishable final activity report project no 004526
www.eiao.net/publication
s/EIAO_publishable
-
activity
-
report
-
FINAL
-
2008
-
09
-
04) to create a
system which would apply UWAM to European websites to assess differences across national
and industry segments. This effort was dubbed European Internet Accessibility Observatory or
28




EIAO. Th
e researchers on that project presented their work and went on to a new project called
eGovMon.

eGovMon seeks to expand the attributes of the website assessment to include impact, transpare
n-
cy, and efficiency as well as accessibility. But whi
le the attribu
tes to be assessed

are

expanded,
their focus is now on the information and services provided by municipalities in Norway. They
began with the EIAO system and addressed some of the most immediate issues facing that system
and then began their work on the eG
ovMon system in late 2009/early 2010.

More information
about the eGovMon and its EIAO predecessor can be found in the partial eGovMon architecture
recovery in Appendix A. Specifically Section 2 of Appendix A will give the reader a background
in that projec
t.


Architecture Recovery of the eGovMon System

As documented in Chapter 4, the methodology calls for an initial planning phase followed by i
t-
erations of Gather Evidence, Analyze, Prepare Views, and Present.

A part of that planning is to
understand the li
nk between the architecture recovery effort and whatever motivated it. T
he intent
of
the
project
which motivated this architecture recovery project
is to use
the eGovMon system to
measure the accessibility of university websites in California. Therefore th
e purpose of this arch
i-
tecture recovery is to prepare for a re
-
engineering effort of the existing eGovMon system for this
purpose. What comes from this recovery must allow the developers to reason about the qualities
that this system will exhibit, determin
e if they meet the needs of the re
-
engineered system and
guide the specification for changes to the system.

29




There are many different aspects of the system under study. There is the static structure of the
package that is used for installation, the dynamic

behavior of the install process and the static
structures left on the object machine when it concludes, the dynamic behavior of the system as it
creates the run
-
time structures to provide the services and of course the run
-
time objects and their
behavior
as they interact with the users’ requests. In the course of this project all of these required
some analysis. However the only views that will be developed will be the static structure of the
elements from which the servers are started and the dynamic stru
ctures and behaviors which su
p-
port the services.
The SARM recommends determining the
decision making process for stak
e-
holder acceptance. Since there were no other stakeholders for this project, that was not an issue.


SARM
Planning

-

Tool Selection

A key decision in this project was to not attempt to use any form of tool support for the lexical
analysis Why forego tool support? Cost for one, both financial and time.
Understand for Java

(www.scitools.com) by itself costs $995 for a single use license
and $1995 for a “floating” l
i-
cense. While many of the tools are free, there is still a cost of acquisition in the form of research
time to find the appropriate package, integration time, learning time etc.

Another reason is
the

limitations

of the tools
. Th
e very best tools still suffer from the inherent
limitations of static analysis. All authors agree that tools can provide leverage but the output must
be carefully used lest inappropriate conclusions are drawn from faulty data.

As with any architectural an
alysis, learning the full lexical complexity of the code was not needed
although a sub
-
goal for the larger vision as well as a healthy skepticism that any tool can capture
30




the more subtle semantic information embedded in the code that is needed to document

the b
e-
havior in Component
-
and
-
Connector views.

Another reason a tool was not used is t
raining. No tool is useful without understanding how to use
it. This requires an investment in training to ensure that the results from the tool are dependable
and accur
ate.

Also the d
istraction from the purpose

of the project
. One does not engage in a re
-
engineering effort to establish infrastructure. While many organizations will allocate the budget
needed to create the workbench needed to perform this kind of analysis
using the best available
tools, many other organizations will not. This has significant implications for the skills needed
and the tasks that must be undertaken.

The tools introduce an additional step.

The objective is to reconstruct the architecture and a
n i
n-
herent part of that architecture is the graphic presentation. While tools such as Dali/ARMIN can
provide some graphic support, most authors setting out to create figures will only use the tool for
a draft of the figure and then recreate it using a more

robust graphics tool.

While most other researchers have used automated tools to assist with the lexical analysis of the
source code, this project relied almost exclusively on manual interpretation with only some use of
editors to search for specific toke
ns. In the O’Brien work [O’Brien 2002][O’Brien 2003
-
a][O’Brien 2003
-
b] they comment that it was necessary to determine which view types were to be
reconstructed before beginning the data extraction phase. Unlike the O’Brien work, this project
set out to cr
eate three viewtypes: module, C&C and Implementation for the system under study.

One tool that was indispensible was the tool
used

to create the
visual representation
. Visual
re
pr
e-
sentation of the material is important and hand
-
drawn diagrams would not b
e acceptable. The tool
chosen was Microsoft’s Visio, supplemented by UML templates from the web.

31





SARM Planning
-

S
cope

Like most major systems, the creation of the production environment and the installation process
of the system is a non
-
trivial activity
. Since this could reasonably be considered part of the sy
s-
tem if the intent is to make this system widely available to others, it was excluded from the scope
since the platform is likely to change and it helped to keep the scope to reasonable bounds. Lik
e-
wise that part of the system that was necessary to perform the start
-
up activities of loading the
servers and establishing the software environment was also excluded for this project. This is a
case study and the scope only needed to be as large as necessa
ry to demonstrate the methodology.


While there are tools available to assist with architecture recovery, there were downsides to using
a tool. This project was done with manual code analysis since the use of any tool risked a distra
c-
tion from the primary
purpose of the project. The choice for the final form of the architecture
documentation was from the Clements book, Documenting Software Architectures [Clements
2003].

The eGovMon system consists of a collection of hardware and software components. Not al
l of
them will be within the scope of analysis for this project. Figure 4 shows the major components
of the complete system. The scope of analysis focused on those pieces which are currently under
development and will largely ignore those components that a
re hardware or effectively comme
r-
cial off
-
the
-
shelf components such as PostgresSQL, Linux, Java or Apache. There was some co
n-
sideration given to the way in which the software components can be allocated to

hardware later
in the project.

32





Figure
4
: Scope Diagram




33





Figure
5

from
http://www.egovmon.no/en/research/figure1.html

While the eGovMon Project has many components, there are only a few that fall within the scope
of study for
this master’s

project. Primarily they are:



Automated assessment



eGovMon database

There are many documents available from this website as well. But for the purposes of inferring
the architectural description of the system under study, the documents used are limited to:

34






D4.2.1 eGovM
on System Design Specification [Goodwin
-
Olsen 2009]



Architecture for large
-
scale automatic web accessibility evaluation based on the
UWEM methodology

[Ulltveit
-
Moe 2008]



A proposed architecture for large scale web accessibility assessment [Snap
rud 2006]



Second version of ROBACC WAMs, D3.2.1 [Nietzio 2006]

These documents clearly describe the prior work of the European Internet Accessibility Observ
a-
tory (EIAO).

From the EIAO website (
www.eiao.net
) they offer a
n architecture
diagram

reproduced
in Figure
6.


Figure
6
: The EIAO Architecture (from
http://www.eiao.net/eiao
)

35




In their work they describe
Figure 6
as follows:



Initially, an administrator populates a URL repository with web site
URLs.



The crawler gets web site URLs from the URL repository and further populates the it
(sic) with individual web page URLs. The crawler extracts at most 6000 pages from each
site.



When the crawler is finished, the sampler selects 600 web pages from each

site at random
making a near random uniform sampling.



Each of the 600 pages are (sic) evaluated to detect accessibility barriers by the WAM and
the results are stored in an RDF database.



The ETL extracts these results, transforms them and inserts the resu
lts for long storage in
a data warehouse.



When all scheduled sites have been crawler, evaluated and loaded in the data warehouse,
the data in the data warehouse is organised to be available in the online reporting tool.



Users can then see the results from
the online reporting tool.


Another important document that helps in the reconstruction is the EIAO documented source
code, Deliverable Number:

D5.2.2.1
-
2. The pages of this document which address the archite
c-
ture are in the appendix.

The scope of this pro
ject focus
ed

on the parts of this software system likely to be modified during
research

to extend eGovMon

and place
d

packaged functions out of scope. Clearly the OS, Pos
t-
gresql, Python and Java are out of scope. But other components developed by others and

used
without modification are also out of scope. These include HarvestMan, Imergo, Tidy, Jython and
many others. Statements about the boundaries between the elements within scope and out of
36




scope are primarily found in the Architecture reconstruction chap
ter in the Module decomposition
views.

Many elements in eGovMon are built with extensive logic to aid in development and operational
debugging. Except for the section where the quality of modifiability is discussed, these features
will be kept out of
scope. To do them justice would make an already complex system much more
difficult to understand. While the developers and maintainers of this system are primary stak
e-
holders for the purpose of this project, they will already possess the skills needed to u
nderstand
this variability in use once they understand the architecture as presented here.

Relaxed, the HTML validator, (
http://relaxed.sourceforge.net/
) is out of scope. Here is what the
author, Petr Nalevka

says about it (
http://nalevka.com/content/Projects/category
-
all.en.html
)

:




Relaxed


is my XML validation project and a part of my bachelor's thesis. It is an easy
to use HTML validation application which is special in the sense it doesn't use the official
W3C DTD's. It rather validates HTML documents using schema definitions written in
R
elax NG with embedded Schematron patterns. This is an extremely expressive combin
a-
tion of languages which enables validation of additional restrictions which can not be e
x-
pressed using DTD. This includes most restrictions specified in the W3C HTML 4.01 and

the W3C XHTML 1.0 recommendation and some restrictions from WAI WCAG 1.0
Guidelines.


Besides limiting the scope of this project by functionality, the scope is limited along a temporal
axis as well. Some say that “writing down system installation procedur
es…is not architectural.”
(Clements, et al., 2003 p. 374)

A sidebar in one of the references makes astute observations of the
various “times” that exist in a system
(Clem
ents, et al., 2003 pp. 213
-
215)
. In this spirit, this pr
o-
37




ject will reduce the scope of the “times” of interest in the architectural design of this system.
If
this is accepted, those parts of the eGovMon system that exist to install the correct platform f
or
the system fall outside the scope of analysis and architecture recovery.
In
addition
, the eGovMon
system must perform significant logic to create the run
-
time environment that will provide the
services of this system. Yet many of these elements have no

persistent

run
-
time presence. For the
purposes of this project those start
-
up processes and modules will be largely excluded from the
analysis and presentation. While they do offer some interesting architectural issues, this is being
done to keep the scop
e of this project within reason.


SARM Planning
-

Final Form of Deliverable

SARM recommends that an early decision be made regarding the form the final deliverable will
take. This will guide the data that must be gathered, the analysis that must be done up
on that ev
i-
dence, the types of exhibits that must be created and the way the material must be presented. The
source material for the form of documentation chosen is from Documenting Software Archite
c-
tures by Clements [Clements 2003]. There are many altern
ative ways to document the archite
c-
ture but theirs was chosen. There are other styles, in the text [Clements 2003], but these will serve
to satisfy the stakeholder’s initial needs.


SARM Planning
-

Document Repository

In this case study, it was not possib
le to envision the document repository that was needed. This
repository grew organically over the course of the project. In the end, the document repository for
the recovery was the collected UML diagrams in addition to the source documents.
Since the i
n-
te
nded form of the final deliverable was known from the beginning of the recovery, the various
38




documents that could be part of that deliverable were kept organized according to that structure.
That complete organization can be seen in Appendix A. But there w
ere many other documents
that would not be included in the final deliverable but were important reference documents during
the recovery.
In this case study, an important aid to understanding was the source code itself with
extensive notes made over time to

document key insights into the overall system.



SARM Planning
-

Stakeholder
Needs/Concerns

Any architecture recovery project is driven by some need. To maximize the chances of success
and minimize wasted effort, the team must understand those
project dri
vers. In this case study, all
of the needs documented in the EIAO and eGovMon projects exist as drivers for this case study.
The reason this system exists is to perform research into the metrics of websites as defined by the
Unified Web Evaluation Methodol
ogy (
UWEM
), an ongoing project to create a unified way to
assess website accessibility,

and to provide feedback to those researchers so UWEM might be
amended. In this capacity the system can be viewed from three very different perspectives. On the
one hand
, it is a framework for the housing and execution of metrics under evaluation. Researc
h-
ers posit metrics and plan experiments to determine if the new metric has value over prior ones.
But in another light the system is an autonomous device to apply these i
ndividual metrics to st
a-
tistically significant sets of websites. As such it must be prepared to handle the widest possible set
of exception conditions that a web crawler might encounter when attempting to crawl large po
r-
tions of the web unattended. The thi
rd perspective is that of the service this device can provide to
people who might want to subject individual websites to analysis. It is from these three perspe
c-
tives that the requirements of eGovMon are drawn.

These perspectives provide the framework for
the stakeholder’s needs of the eGovMon system. To support the ATAM, it is best if these r
e-
39




quirements are documented using the multi
-
part format given by Bass in Software Architecture in
Practice [Bass 2003]. The development of those requirements is develop
ed into that form in the
next section.


SARM Planning


Stakeholder Needs/Concerns
-

Need for Extensibility and Maintainability

The first perspective on eGovMon clearly shows the need for maintainability. The primary pu
r-
pose is exploration and research. A system that
is difficult to
modify

to explore a new metric
would be undesirable. Therefore this system has a need for the system quality attribute of modif
i-
ability. From Bass [
Bass 2003

p74] the general quality attribute

scenario for modifiability looks
like this:



Source: Developer/Researcher



Stimulus: A proposed change to the metric calculation



Artifact: Code



Environment: Design Time



Response: Modification is made to eGovMon with no side effects



Response Measure: the
ta
rget for work
-
hours and elapsed time to make the change

This general quality attribute scenario is difficult to translate into specific

concrete scenarios due
to the i
ndefinability of the stimulus. However it is clear that some architectures will enable mo
re
efficient modification of the system while others will make it more difficult. The objective is to
have an architecture which will minimize the work
-
hours and elapsed time for the expected set of
changes which will be made to eGovMon over the foreseeabl
e future.

Therefore the focus for
40




modifiability is primarily within the WAMs module and secondarily in the Sampler
module b
e-
cause of its tight coupling with the WAMs module.


SARM Planning


Stakeholders Needs/Concerns
-

Need for Availability

From the
perspective of a tool running unattended performing metric analysis on a large number
of websites and from the perspective of a casual user requesting the evaluation of a single we
b-
site, there are some needs for availability. For the purpose of performing
large
-
scale website
evaluation, there is a tradeoff between performance and availability. If large numbers of websites
can be evaluated in a brief period of time, it can justify a researcher giving the system undivided
attention during operation to immedia
tely respond to errors. However the idea of acquiring larger
and faster machines to cause the system to fail faster is not consistent with the continual econo
m-
ic pressure that is usually present in a research project. Rather the quality that is desired is
to both
reduce the mean
-
time
-
to
-
failure and the mean
-
time
-
to
-
repair. Mean
-
time
-
to
-
failure is most direc
t-
ly impacted by research and analysis into known and expected exception conditions that will o
c-
cur in operation and ensuring that each one has been inclu
ded in the run
-
time behavior. There will
also be a clear relationship between the mean
-
time
-
to
-
repair and the requirement for modifiabi
l-
ity.

The general
quality attribute scenario for availability is:



Source: Internal, External



Stimulus: (Fault) Omission,
Crash, Timing, Response



Artifact: Process, Storage, Processor, Communication



Environment: Normal Operation, Degraded Operation

41






Response: Record, Notify, Disable, Continue (Normal, Degraded), Become Unavai
l-
able



Response Measure: Repair Time, Availability/ A
vailable Time, Degraded Time I
n-
terval

There are many foreseeable fault events in the operation including website unavailability or
communication failures. However the statistical nature of the evaluation allows for these cond
i-
tions up to but not including
the complete loss of internet connectivity. However since any given
testrun is likely to include a large number of websites
,

what is NOT desired is for the testrun to
cease when it encounters a fault such that manual intervention is required in order to re
start the
testrun. This gives the following concrete scenario:



Source: Internal or External



Stimulus: Fault



Artifact: Process



Environment: Normal Operation



Response: Record and/or Notify and Continue Normal Operation



Response Measure: A nominal increase in

the processing time for the website which
raised the fault is expected and reduced functionality for the metrics of that website are
also permissible. However handling the fault should not increase the handling time of this
website by more than 50%.

In ge
neral, availability,
α
, can be measured as Mean
-
Time
-
To
-
Failure (MTTF) divided by the
sum MTTF and Mean
-
Time
-
To
-
Repair (MTTR).

42












MTTR can be improved with good documentation. This can be expressed with a poorly quant
i-
fied concr
ete scenario as follows:



Source: Fault



Artifact: Process



Environment: Normal Operation



Response: Fault recognized, source of fault identified, code modified and Normal Oper
a-
tion resumed



Response Measure: MTTR improved over EIAO levels


SARM Planning


Stakeholders Needs/Concerns
-

Need for Scalability and Performance

While performance is not a major concern, at least in the early phases of the development for this
product, it is intended to eventually scale to handle the frequent evaluat
ion of a large number of
websites over time. Therefore some consideration to the scalability of this system must be given
even if the implementation of that scalability is deferred. The modifiability requirements already
gives a functional organization to
the system and these functional boundaries can also be used at
the initial source for parallel operation putting different functional components onto different m
a-
chines. This will be analyzed later but the immediate task is to provide one or more concrete
sc
e-
narios which capture this requirement.

The general
scenario for performance is:



Source: One of a number of independent sources, possibly from within system

43






Stimulus: Periodic events arrive; sporadic events arrive; stochastic events arrive



Artifact: Syst
em



Environment: Normal mode; overload mode



Response: Processes stimuli; changes level of service



Response Measure: Latency, deadline, throughput, jitter, miss rate, data loss

As this applies to the need for scalability, one concrete scenario can be:



Source
:
stakeholder or funding agency



Stimulus: Additional funds for hardware are made available to the system to allow for
greater throughput



Artifact: The eGovMon System



Environment: Normal mode



Response: More hardware is made available and throughput is impro
ved



Response Measure: While it is not necessary to specify an exact benchmark for the ma
r-
ginal improvement in throughput for the system, there must be some curve of marginal
improvement that clearly shows that the architecture can be scaled
as needed. i.e.
, the
curve must be approximately linear and not exponential.


SARM Planning


Stakeholder Needs/Concerns
-

Need for
Usability

While it must be possible for the researchers to use the system, there is currently little space b
e-
tween the developers and the
research users. Therefore the primary persona for usability analysis
can be assumed to be one of the less experienced developers and someone familiar with Linux