Open Source Software Development Processes in the Apache Software Foundation

coldwaterphewΔιακομιστές

17 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

186 εμφανίσεις

Open Source Software Development Processes in the Apache Software
Foundation

ICS 225

Chad Ata

Veronica Gasca

John Georgas

Kelvin Lam

Michele Rousseau


1

Introduction

Even though software is an intangible artifact, it is still being developed and used
everyda
y. In order to tackle with this characteristic of software, one can attempt to
specify software in a formal way or in a written description, which in turn provides a
specification of the software. With that in mind, testing the behavior of the software
a
gainst a specification becomes possible. With sufficient testing (both verification and
validation), the software is then released. This is the generic traditional process used to
describe software development. If the software under development is a rea
sonable size,
this generic process is quite intuitive and manageable. Unfortunately in reality, the scale
of software being developed is enormous, and the development team usually is very large
and distributed. The Open Source Software Development (OSSD)

community is one of
the frequently used examples to demonstrate differences that are not captured in the
generic development process. The success of the OSSD effort, such as Apache and
Linux, is not just motivated by those who want to save money by utili
zing a free
resource, but more due to the quality of their software. So how does an OSSD project
manage their process without utilizing traditional software development practices that
have been mandated for success?


There has not been too many in
-
depth r
esearch projects done towards the Open Source
Software Development (OSSD) community and their efforts. In the typical software
engineering textbook, you cannot even find the term ‘Open Source’ in the index. But in
reality, there are more than 40,000 proj
ects that are being hosted in the Open Source
development portal (such as
SourceForge

[2]). The OSSD community is influential to
the software industry. For example, IBM switched from using their proprietary web

server to the open source Apache HTTPD web server in June 1999 [4]. As Eric
Raymond termed in his paper [1], the OSSD ‘bazaar’ style of development is
significantly different from the ‘cathedral’ style of development in the traditional
software industry.

There is a gap between the two communities. If a better
understanding of the OSSD development process can be found, the “closed source”
software industry may be able to benefit from it and vice versa. The software industry
can learn from OSSD community

how to manage a project with a diverse background of
developers in a distributed setting. The high quality and high reliability products
developed by the OSSD community, such as the well
-
known Apache web server, is
rarely found in the mainstream software

industry. Sometime even the big corporations,
such as Microsoft, are still having difficulty to produce highly reliable quality software
(comparing the Internet Information Server (IIS) with Apache HTTPD). There is also a
possibility that the tools bein
g used by the OSSD community is beneficial in the setting
software industry group. The way that the OSSD community facilitate their needs and
features for the software is another important aspect to be researched.

Under these circumstances, it is desirabl
e to have a better understanding of the OSSD
effort. In order to learn from the OSSD experience, one needs to understand how a
feature is added in the form of specification, then be developed and eventually tested and
released to the public. The focus in

this research study is the Apache web server
(HTTPD). The goal is to investigate in detail how the OSSD community, in particular
the Apache group (both the ASF as well as HTTPD developers community), interacts
with each other throughout the development p
rocess. We investigate the roles that exits
within the project, the tools that they use in the project, and the artifacts created
throughout the development process of the project and the overall release process.

In the next section, we briefly describe t
he Apache HTTPD project. We compare this
Open Source Software Development project with the traditional software development
project. Then we explain in detail about the software production architecture of the
HTTPD project. We discuss the details about
the agents, tools, as well as artifact
throughout the development process. Each stage within the process is also explained in
detail. Next an attempt is made to illustrate this development process in a formal manner.
At last we conclude the paper with s
ome further discussion and founding that we had
during the research.


2

Overview of Apache


The Apache group was formed in
February of 1995

by the 8 core founders. Their initial
goal is to extend the

web server created by Rob McCool to stable, bug
-
free, and feature
-
rich software. The founders coordinate together through their private email, applying
their own individual “patches” to the source code. After extensive beta testing, the
Apache web serve
r was born on December 1995. Four years later, the Apache group
formed the
Apache Software Foundation (ASF)

to provide the logistic support (such as
donation or contribution

from others) and other business
-
oriented needs (such as any
licensing

issues) of the project. Since then, many new OSSD projects (e.g. Jakarta,
XML, etc.) have s
tarted under the leadership of the ASF.


3

Problem Domain Characterization

Apache is a well
-
known web server. Currently it has more than 50% of the market share,
as the
May 2002 Netcraft survey

[3] has shown
. This demonstrates the large
-
scale usage
of the software. The Apache 2.0 web server is the result of a successful development
process that one can study, from the beginning of feature specification to the ultimate end
of deployment to general public use
rs. In August of 2000, the Apache group decided to
restructure and rewritten the entire Apache web server (i.e. the ‘requirement’). Their
original aim was to have the new server released by the end of 2000. In reality, their
effort of creating Apache 2.
0 didn’t get into first beta testing stage until April of 2001
(i.e. the ‘testing’). In fact, not until November of 2001, companies that support and
distribute Apache web server considered the Apache 2.0 as ready
-
to
-
use (i.e. the ‘initial
release’). The
ultimate stable public release did not become available until the beginning
of April 2002 (i.e. the ‘ultimate release’), which is approximately 1.5 years behind their
original goal. This is the same typical problem that a traditional software development
faces


missing a deadline.

This research study is aim at understanding the problem stated above. We try to
understand how the OSSD community communicate among themselves. After
understanding the communication mechanism, we try to find out how they facil
itate
requirements for the software under the development effort. During the development, we
look at the channel of communication for help inquiry and testing between the
developers. After the testing effort, we investigate the process of getting the tes
ted
-
source
-
code to become a releasable product to the general public.


4

Process Modeling and Visualization

4.1

Agent Roles

All of the Apache Software Foundation projects use a philosophy of meritocracy to
define the hierarchy of their agents. Meritocracy is ba
sed on the notion of work increases
rights. All the code is reviewed by many eyes. For example, developers can only gain
write access to the CVS by proving their skills and commitment to the committers on the
project. They achieve this by contributing c
ode in a quality of what the committers view
as good code. Committers then must vote to bring a developer into committer status.
Votes on patches and what not only are binding for committers. Thus, the higher the rank
the more power and influence one ha
s. Below is an outline of
agents and their roles as
outlined through the ASF website
. Keep in mind that agents can contribute in any lower
ranking duty, but agents on a lower rank cannot partici
pate in the duties of the agents that
are higher ranking.

4.1.1

Users

Apache considers its development to be user
-
centered. They contribute in three basic
forums. First, they submit bug reports, through the website, using
Bugzilla
. They are the
ultimate testers of the final version of the code. Secondly, they contribute suggestions for
new features. They do this by using Bugzilla and indicating that they are submitting an
enhancement. This is one way of und
erstanding what users want to see in the next
version and is significant in determining the long
-
term goals of the project. Finally, users
support each other through the
mailing lists
.

4.1.2

De
velopers

The developer's role is to contribute either code or documentation in the form of patches
to the project. Developers do not have capabilities to commit changes into the CVS They
submit ‘diff’ file results to different channels in order to solicit

as well as advocate
committers to commit the changes into the CVS. Developers have limited voting power.
They are allowed to vote on patches, but don't have a binding vote unless they authored
the patch. Developers can also contribute by being involved

in alpha testing or beta
testing.

4.1.3

Committers

Committers develop and commit code or documentation. They can commit their own
works as well as patches from developers with their write access in the CVS. Committers
vote on developers’ patches for acceptanc
e or rejection. Committers is responsible to
oversee the development efforts among developers. They determine which developers
become committers by recommending them. After a unanimous voting, a developer will
advance to a committer status. It is the r
esponsibility of committers to ensure that code
integrated into the CVS is ‘good’ code. They are responsible for reviewing what goes
into the CVS and ensure the integrity of the software. Committers can be come part of
the Project Management Committee by

self
-
promotion and long term commitment.

4.1.4

Project

Management Committee (PMC)

Members of the PMC are self
-
selected committers. They are responsible for the long
-
term direction of the project. Although the Board of Directors ultimately has the final
decisi
on making power on any project, they delegate this responsibility to the PMC of
each project. There is a single PMC for every project. They determine what will go into
the next release of a project. Although the Release Manager has the ultimate say in w
hat
goes into the final release, the PMC can make suggestions.

4.1.5

Release Manager (RM)

The release manager's main role is to schedule the release of the project. The RM is a
self
-
selected committer. The RM decides when each testing phase is done and when th
e
general availability (GA) release will be made public. This individual has the ultimate
authority over what makes into the release.

4.1.6

Foundation Members

Foundation Members have demonstrated long
-
term commitment through the amount of
work they have contri
buted to Apache Projects. Members are not project specific, but
part of the Apache Software Foundation. Members are responsible for guiding the
foundation. One of their most critical responsibilities lies in the election of the Board of
Directors. Founda
tion Members are invited by other members and voted into
membership.

4.1.7

Board of Directors and Officers


The Board of Directors and Officers of the Apache Software Foundation are responsible
for the business affairs of the foundation. The officers are electe
d by the Board of
Directors to oversee the daily operations of the foundation. Although the Board of
Directors is officially responsible for the projects, they delegate most of the decision
making process to the Program Management Committee.

4.2


Tools and Ne
twork Infrastructure

As with all large software development projects, there is a need for tools to support the
process. This is especially true for Open Source Development, which needs to confront a
maximally distributed community. For the Apache HTTPD c
ommunity the tools are
essential for its existence. They provide guidelines (which can be considered tools as
well) to provide some standardization and understanding of expectations and process.
Although there are not many tools, these tools are powerful

and effective in handling the
large number of members in the community and the distribution of those members. Each
of these tools are detailed on the Apache website.


4.3

Communication

The entire community coordinates with each other mainly by communicating
through
mailing list, as describe in the
project guideline of HTTPD
. They also rely on
information posted on the
project web portal
. Another form of communication

can also
be found in the CVS “
STATUS
” file, where the vote on each issue is recorded for future
reference. Lastly, the face
-
to
-
face communication is probably the most natural form,
which
is rarely found in the OSSD community. For Apache in particular, there is an
annual
conference

for all the developers to gather and to discuss the project.

4.4

Source

Repository and Configuration
Management

Concurrent Ver
sioning System (CVS) is the source repository used by all the Apache
Software Foundation projects. Each project has their own branch of source code and
documentation within the CVS. Each developer can follow the
guidelines

provided on
the project web portal to setup their own access and synchronize their local copy with the
most current version in the CVS. Developers can obtain the source code for the platform
of their interest through the C
VS binary and source distribution. Developers can also
obtain previous version source code through the CVS system as well. The Release
Manager also uses the CVS extensively throughout the release process, in order to
minimize the interference of the relea
se with the current development effort.

4.5

Development

There is no specific development tool recommended by the community for this project.
Because the Apache HTTPD software is a cross
-
platform product, each individual
developer can pick their own favorite d
evelopment tool that works best for their
particular platform to yield the maximum productivity. However, the project community
does provide different kind of guidelines for the development. For example, the
style
guideline

is being strictly enforced over all source code committed into the CVS. In
order for the development effort by an individual developer to pay off, the developer
needs to follow the
patch guideline

in order to get the maximum possibility of having
his/her patch committed into the CVS.

4.6

Debugging

Apache HTTPD is a large and complex software project. Trying to find a problem is not
an easy task. In order for a developer to

work on a particular Problem Report (PR), or to
find problems within his/her own patch, he/she will have to debug the software. GNU
debugger (gdb) is the recommended tool that the Apache HTTPD community uses to
trace the problem within the software. The
re is a detailed
guideline

to assist developers
in resolving problems throughout the development process.

4.7

Bug

Tracking and Feature Acquisition

Bugz
illa

is the ultimate tool used by the Apache HTTPD community. A
guideline

is
available for the community to use this tool appropriately. This is the tool that the entire
community heavily d
epends on for the success of the project. Users use this tool to
report bugs found, and they also submit suggestions for new features and enhancements
through this tool as well. Developers look through the problem reports submitted and
decide which they
are interested in. Committers, as well as the Project Management
Committee, use this tool to track the general interests of the community, and based on
that to decide the direction for the project.

4.8

Release

The release manger is responsible for the release

of the Apache HTTPD software. There
are clear detailed
guidelines

for Release Manager to follow in order to have a successful
release. There is also an
obsolete document

to educate release manager about the steps
needed for the release process. This obsolete document is replaced by the automated
building script stored in the CVS to ease the effort of releasing this complicated softwar
e.


5


Artifacts


As in any software development process there are many artifacts involved in the process
of developing the Apache HTTPD release.

5.1

Problem Reports

Inputs: Bug reports and new feature submitted by Users

Outputs: Problem Reports number (PR#)
in the Bugzilla Database are input for the
Developers

Agents: Users/Developers


Problem reports are derived from the Bugzilla database. All bug reports and new feature
requests are submitted to the bug database from all users. Users are provided with a s
et
of guidelines to follow before entering a new request. Basically, they are requested to
first download the lasted patch to ensure that their issue has not been resolved. Next,
they should check Bugzilla to see if their issue has not already be submitt
ed.

5.2

Patches

Inputs: Problem reports

Outputs: Patches (diff
-
files) from Developers to Committers

Agents: Developers


Patches are output from developers and input into the communication channels (e.g.
mailing list or Bugzilla). Patches are the main forum fo
r developers to communicate with
each other. Since all code and documentation is submitted as patches, this is probably the
most significant artifact among all. All patches after submission is then pending on the
voting process for acceptance.

5.3

Release Pa
tches


Inputs: Committers consensus

Outputs: The release patch is made available to the general public

Agents: Users/Developers/Committers


Released patches are patches that have been minimally reviewed by and then committed
to the CVS by a committer. Rel
eased patches are made available to the public on the
official web distribution
. Committers can revoke a patch if after reviewing it in detail,
and they find problems with the patch that it should
not be committed. But keep in mind
that the patch can be committed without review from other committers under the current
process. This is consistent with Apache’s current commit
-
then
-
review policy on patches.


5.4

Proposed Features


Inputs: Release patches,

Bugzilla enhancement reports

Outputs: Project Roadmap

Agents: Project Management Committee


Proposed features are output from the Project Management Committee and an input to
the release manager. The PMC develops a list of proposed features based on thei
r
personal judgment, requested feature enhancements in Bugzilla, and from the
enhancement coded as patches (which is submitted to the developers accessible portion of
the website). This list is referred to as the project “
roadmap
” and will then be voted upon
by the PMC. The “
roadmap
” will then be reviewed by the community and turn into the

status
” file, which contains the more elaborated requirements and the outcome of the
votes from the committers.


5.5

Proposed Requirements


Inputs: Proposed features

Outputs: Status file from PMC to Developers

Agents: PMC/Re
lease Manager/Developers


Proposed requirements are output from the PMC voting on the proposed features and
input to the developers, who will start coding these new features, and to the release
manager who will make a decision as to which of the new featur
es will be included in the
release. The proposed requirements include not just the requirements, but also the results
from the votes that each PMC member submitted for each requirement. This file is
placed on the developer accessible website. This file
is referred to as the “
status
” file.
Developers access this file and determine which features they want to implement.


5.6

Patches for New Release


Inputs: Proposed Requirements

Outputs: Patch
es from developers to the release manager

Agents: Developers/Release Manager


Developer source code for new features is contributed as patches, and then sent to the
community, which undergoes the review
-
then
-
commit process. From here the release
manager d
etermines which of these new features will go into the current release.


5.7

Alpha Build


Inputs: Patches for New Release and Status file

Outputs: Bug fixes from developers

Agents: Developers/Committers/Release Manager


The release manager creates an alpha
build from the new feature patches submitted. This
alpha build is place on the web for developers to test. Developers and committers test
and fix bugs in the code and submit those fixes as patches. The output for the Alpha
build is given to the release
manager to create a beta build.


5.8

Beta Build


Inputs: Alpha build and Bug Fixes from Alpha Testing

Outputs: Bug fixes from developers

Agents: Developers/Committers/Release Manager


The release manager decides when it is time to create a beta build. The in
puts to the beta
build are the alpha build and the bug fix patches submitted by the developers and
committers. The output is submitted on the developers website for further testing.

The developers test the beta build and produce more bug fixes in the form
of patches.


5.9

General Availability (GA) Build


Inputs: Beta Build and Bug Fixes from Alpha Testing

Outputs: Final build made available to the public

Agents: Release Manager


The inputs to the GA build are the beta build and the bug fix patches submitted by

developers and committers. The release manager determines when beta testing is
complete and creates the GA build. The Apache guidelines are then to test the GA on the
Apache website for 48 to 72 hours to determine if it is stable. This however is just
a
guideline and the release manager can release the GA to the public whenever he/she
deems it is ready for release.


6

Processes


Figure
1
: The overall Apache HTTPD release process. The green portion depicts the patch
development p
rocess, while the blue portion depict the release process.

Apache works on a meritocracy so developers must prove their skills before any of their
code is committed. All code is reviewed by many eyes. This follows Linus' Law [1]
which basically states th
at the more eyes looking at the code the more likely it is that
faults will be discovered and repaired. Committers have shown that they are competent
developers with an understanding of how to write good code. Most committers have a
good understanding of

software process and best practices in developing software.
Therefore it can be concluded that all the traditional training that these software engineers
have been exposed through either formally or through experience is not disregarded while
developing
open source software, but has become a part of their work routine.

6.1

The Patch Development Process


Previously released patches can be new features as opposed to just bug fixes. They
become part of a new release and understanding how those patches are dev
eloped is
essential before delving into the release process.

Anybody in the community can submit a new feature request via Bugzilla by indicating
the submission as an “enhancement”. Developers scan Bugzilla and decide which of
these items they would like
to implement. Once a developer has committed to coding the
new enhancement, he/she interacts with the submitter to sort out any details. This is
similar to discussing a requirement with a customer. The “customers” of Apache HTTPD
are the users. The dev
eloper then posts the patch in the form of a diff
-
file to the mailing
list dedicated to new patches (i.e. new
-
httpd), or by submitting the diff
-
file to the
Bugzilla database. From there a committer decides whether or not to commit the patch
into the CVS.

In the past, committers follow a review
-
then
-
commit (RTC) policy. But
due to the large amount of patches submitted currently, committers follow a commit
-
then
-
review (CTR) policy for patches. The CTR policy is subject to the lazy consensus
rule. They co
mmit the patch and then see if any other committers dispute the patch by
sending in a

1 vote or a veto vote. Only one veto is needed to cause a patch to be
revoked. This is different from a majority consensus rule. When there are at least three
+1 vote
s and there are more +1 votes than the
-
1 votes, then a majority voting issue is
passed. A veto cannot be cancelled, but can only be withdrawn by the originator of the
veto. The status of the veto must be changed for the patch to be released. Developers

may vote as well, but only the developer who authored the patch has a binding vote in the
voting process.

6.2

The Release Process

Patches that have been committed and not revoked become a part of the proposed
requirements for the new release. The release man
ager can also look at Bugzilla for other
enhancement requests to determine the requirements for the new release.
Communication with the Program Management Committee as well as coordination with
the entire community is also a crucial step for the success o
f the release process.

Requirements are elicited in a variety of ways. First, as mentioned above, potential
features can come from enhancement reports that are submitted to Bugzilla. Second,
features can be in the form of patches already available (commi
tted into the CVS
already). Finally, features can be requested from the Program Management Committee.
The Program Management Committee details a “
ROADMAP
” file, which is overall
directio
n for the preliminary requirements of HTTPD. These requirements are then voted
upon by the community. Each item on the “ROADMAP” file is then detailed out and the
results of these votes are recorded, all together to be put into the “
STATUS
” file. The
status file represents a history of the development effort for the release. Unlike patches,
new features are not subject to veto. A majority vote means approval for new features.
Developer
s and Committers both access and review the status file. From there they
decide which features they want to implement. The new features are then submitted as
patches (diff
-
files); the same applies to bug fixes (with respect to PR#). From there the
Relea
se Manager decides which of these patches and new features will be part of the new
release. Despite what the Program Management Committee’s recommendation is, the
final decision of what will be released is made by the release manager.

Once the final decis
ion as to what goes into the release is determined, the code is then
built into executable ready for Alpha testing. Developers and committers have access to
the Alpha release distribution for testing purposes. This release is provided on the
developers’
website and announced through the developers’ mailing list, where users do
not have access to. As developers and committers find bugs they fix them and submit
them as patches again. When the release manager as well as the community is satisfied
with the
Alpha testing, the code with all the fixes applied will then moves into Beta
Testing. Again, only developers and committers have access to the Beta release. They
test and fix bugs until the release manager decides that the code is finally ready to
become

a general availability (GA) release. Prior to doing this, it is recommended that
the binary distribution is tested on the Apache Software Foundation website for at least
48 to 72 hours. This is only a suggested guideline and again the release manager ha
s the
ultimate control over when the new release is ready and will be available. After all these
testing steps, the code finally is ready for GA and an announcement will be made to the
public regarding of the new release.

6.3

Rich Picture for Apache HTTPD Pro
ject


Figure
2
: Rich picture for the Apache HTTPD project. The link to the entire document is
here
.

6.4

Use Cases for Apache HTTPD Project


Use case per

Process (Relation)

UC provide process sequencing (control flow), tool invocation, resources input/output
along the way, pre
-
conditions, post
-
conditions (goals/outcomes), and anticipated
breakdown and recovery situations

6.5

Formal Model of Release Process A
pache HTTPD Project


The graph of the model that the Protégé tool generates is a very large and complex
graphic; it is best viewed by itself and is available
here
. However, a partial view

from
within the Protégé tool is presented here.



Figure
3

-

Partial view of the Apache HTTPD development process


7

Jakarta Introduction and Overview


In order to have a more general understanding of Open Source development within

the
Apache group, we decided to analyze the Jakarta project, in addition to the httpd project.
Jakarta is one of the largest projects in the Apache Software Foundation.


The purpose of the Jakarta project is to produce and maintain open source products
c
reated on the Java platform. This project is comprised of several server
-
side Java
subprojects, categorized in the following three categories:




Libraries, tools, and APIs. Includes build tools, repositories, Java and JSP
libraries, APIs for file manipulati
on, regular expression packages, etc.




Frameworks and Engines. Frameworks for unit testing and web application
development, text search engines and template engines for source code
generation.




Server Applications. This includes Tomcat, the official Refere
nce implementation
of JSP technologies, WebDAV aware CM systems and email/news/messaging
servers, amongst others.


We analyzed the software process followed in two of these subprojects, in order to gain
insight on Jakarta software product lifecycles. The t
wo subprojects selected, Tomcat and
Lucene, belong to the Engines and Server Applications category, respectively. We chose
these projects due to their success and visibility within the Jakarta project.


Tomcat is the servlet container used by the official

reference implementation for Java
Servlets (
http://java.sun.com/products/servlet/index.html
) and Java Server Pages
(
http://java.sun.com/prod
ucts/jsp/
). Tomcat is commonly used in combination with the
httpd server, in order to support Java Server Pages development and usage. Tomcat is
available for commercial use under the ASL license in both binary and source versions.


Lucene is a fully feat
ured, Java
-
based text search engine, optimized for high
performance. Lucene became part of the Jakarta project in September of 2001. This
subproject features incremental and batch indexing searching, it allows having indexing
control, stop
-
word processing,

content tagging, stemming and querying. Lucene is
available for commercial use under the ASL license.


8

Jakarta Problem Domain Characterization


The Jakarta project, especially the Tomcat subproject, has been very successful. Over the
years the developers
and committers on this project have learned valuable lessons from
personal experience as well as from the older httpd project. This has allowed them to
restructure their open source software development processes in a more efficient way.
Lessons learned in

the Jakarta project, have allowed team members to further refine the
software process in order to enhance it and make it more efficient. An example of this is a
problem that presented itself with the 3.0 release plan of Tomcat. At the time, a group of
dev
elopers that happened to know each other would create the release plan offline, that is,
without making the process public, until they came to a decision. This caused a great deal
of complaints from other Tomcat contributors (see
http://www.mail
-
archive.com/general@jakarta.apache.org/msg02778.html

for details). This led PMC
member, Ted Husted, to prohibit offline committer votes to occur.


These types of problems have allo
wed Jakarta to flourish into a great example of open
source software development. Therefore, analyzing this project provides us with valuable
open source techniques, which may allow us to form a general open source software
development meta
-
model.


9

Jakart
a Process Modeling and Visualization


Using the information we collected from the Jakarta, Tomcat, and Lucene websites and
mailing lists, we have been able to abstract a general software life cycle process for a
typical Jakarta subproject. Several agent ro
les have been identified and are discussed in
the following section. Section
9.2

discusses the tools and network infrastructure used by
typical Jakarta subprojects, followed by the Jakarta artifacts in section
9.3
. The process
description and hierarchy are then described in section
9.4
.


9.1

Agent Roles


A subset of the httpd agent categories compose the members of the Jakarta project. These
agents include the following:



Users







See section
4.1.1



Developers






See section
4.1.2



Committers






See section
4.1.3



PMC members




See section
4.1.4


9.2

Tools and Networ
k Infrastructure

The tools and network infrastructure in the Jakarta project are the same as those in the
HTTPD project. Please see section
4.2

for more information.


9.3

Artifacts

Artifacts of the Jakarta projects are the process
inputs and outputs. For artifact
descriptions please see the appropriate subsection in section
9.4
.


9.4

Process Description and Hierarchy


The following subsections describe the software lifecycle process of a typical Jakarta
proje
ct, which is depicted in figure 2. The subsections represent process enactments,
which are represented as rectangular boxes in the diagram. Since open source software
lifecycles are being examined here, the following subsections cover all three open source

software development (OSSD) process layers. In particular, OSSD articulation processes
are reflected during Jakarta voting procedures, which are discussed in section
9.4.4.1
.
OSSD community development is also p
ortrayed by the extensive emphasis on
communication, which is chiefly described in section
9.4.12
. The bottom
-
most OSSD
process layer, software development process, is encompassed by the entire process
descriptio
n.



Figure
4

-

Partial view of the Jakarta process life cycle model from within Protégé. For the full view
see
here
.


Figure
5

-

Jakarta software development life cycle process rich picture
. See
here

for the hyperlinked
version.

9.4.1

Download


Input: Latest Build

Output: Latest Build

Agents: Users/Developers/Commi
tters


As shown in the Jakarta process diagram (see figure 2), a specific starting point for the
Jakarta software life cycle does not exist. Despite this we chose to begin our process
description with the download procedure since it is at the root of all o
pen source software
development projects.


Downloading application source or binary code is the first step towards application usage
or development. Therefore, as a prerequisite for the download procedure, source and
binary code is made available via the
Jakarta website. For example, Tomcat and Lucene
binaries may be downloaded at
http://jakarta.apache.org/site/binindex.html

and source
code is available at
http://jakarta.apache.org/site/sourceindex.html
.


Downloads are performed by all users interested in the Jakarta project. More specifically,
many users are most likely to download the binaries of an application and use it as is
w
ithout needing to make any modifications to it. These users have a choice amongst four
types of binary builds:




Release builds



See section
9.4.11



Milestone builds



See section
9.4.11




Nightly bui
lds



See section
9.4.8



Demo builds




See section
9.4.11


For more information regarding these builds, see
http://jakarta.apache.org/site/
binindex.html
.


On the other hand, other users download the source code so that they can “hack” and
integrate the Jakarta application into their software product. In this case, these users have
a choice from three types of source code drops:




Release sou
rce drops


This code is “as good as it gets” according to the Jakarta
website and is intended for high quality products. These releases are reviewed to
ensure Servlet and JavaServer Page compatibility.




Milestone source drops


Milestone code is not inten
ded for commercial
products because although much of the functionality is acceptable, there still exist
many bugs. The advantage of milestone source drops is that they allow their users
to explore and use future feature of the product.




Automated nightly s
napshots


These code “snapshots” are automatically taken
by CVS everyday. As a result, the code is very unstable. Despite this, Jakarta
project developers may need this latest code.


More information regarding Jakarta source downloads can be found at
http://jakarta.apache.org/site/sourceindex.html
.


9.4.2

Application Usage


Input: Latest Build

Output: Bugs/Feature Requests

Agents: Users/Developers/Committers


Once application source or binary code

has been acquired following the download
process, users then begin to use the application. As described above, these users may
either use the application as
-
is, or they may make modification to it and possibly
integrate it into their own software product.


Over time, users will begin to notice and potentially be irritated by bugs and other
problems in the software. As problems are encountered these users communicate
primarily through the use of the user mailing list appropriate to the Jakarta product they
are using (see
http://jakarta.apache.org/site/mail2.html
). They use this mailing list in
order to resolve their problems by asking other, more experienced, users for assistance.
Other resources, su
ch as the Tomcat FAQs at
www.jguru.com
, as well as online books,
articles, and even debuggers (see
http://jakarta.apache.org/tomcat/resources.html
).


Pro
blems that are not resolved through the use of these resources are eventually
perceived by the users as bugs or software deficiencies. This leads to the following
section, which discusses the feature requests and bug reports that users can make when
unsati
sfied with some particular aspects of a Jakarta application.


9.4.3

Feature Requesting and Bug Reporting


Input: Bugs/Feature Requests

Output: Requirements

Agents: Users/Developers/Committers


As active users of the Jakarta project come across bugs in their Jaka
rta application they
often report it using mailing lists, IRC chat, and primarily Bugzilla
(
http://nagoya.apache.org/bugzilla/enter_bug.cgi
). In Bugzilla, users must first chose
which applica
tion they are using before submitting their report.


Users of some of the Jakarta applications, such as Tomcat, are encouraged to follow bug
-
reporting guidelines. In particular, for Tomcat, users should include the following
information (
http://jakarta.apache.org/tomcat/bugreport.html
):



Tomcat version



Tomcat component


the component which has the bug



Hardware platform and operating system



JVM and Web server version



Configuration files



Log f
iles and stack traces



Examples which demonstrate the problem



Bug fix patch if available


Users may also use Bugzilla to request new features for a Jakarta application. This is
performed by submitting a bug report as any other bug, and setting its severity
to
“enhancement” (
http://jakarta.apache.org/site/bugs.html
).


9.4.4

New Feature Proposal


Input: Requirements

Output: Proposal

Agents: Developers/Committers


From time to time, developers and committers,

perhaps under the influence of feature
requests, draft a new feature proposal (see
http://www.mail
-
archive.com/tomcat
-
dev@jakarta.apache.org/msg26507.html

for an exam
ple of such a proposal in the Tomcat
project). If their idea actually came from a feature request then they assign themselves to
that bug in Bugzilla, so that others will know who is doing work on that particular
request. Next, they submit the new feature
proposal to the developer mailing list for
approval. Approval or denial of the proposal is dependent on developer and committer
votes. Jakarta voting procedures are explained in the following section.


9.4.4.1

Voting


Input: Proposal

Output: Approved/Disapproved P
roposal

Agents: Developer and Committers


Unlike other projects, Jakarta is not controlled by a single dictator. Instead, Jakarta
projects are based on a “minimum threshold meritocracy” with project decisions being
made by a particular group (
http://jakarta.apache.org/site/decisions.html
). This group
includes contributors that have committer status in the project. The only exception to this
rule occurs when the voting issue regards changing sour
ce code that was created by a
developer. In such a case, the primary author of that code is allowed to make a binding
vote. In addition, all other contributors are encouraged to express their opinions about a
voting issue via the developer mailing list, de
spite the fact that their votes do not count.


Voting in a Jakarta project is performed primarily via the developer mailing list (see
http://www.mail
-
archive.com/tomcat
-
dev@jakarta.apache.org/msg27296.html

for an
example vote). These votes are based on a three point system as described below:


+1

“Yes” or “agree” or “the action should be performed.” For some issues this
癯瑥vca渠潮oy 扩湤b楦i 瑨攠癯瑥v 桡猠a汲lady te獴sd

瑨攠ac瑩潮o潮o瑨敩t o睮w
sy獴敭s




M

“Abstain” or no “opinion.” Although these are neutral votes, too many could
汥l搠瑯dga瑩癥⁲e獵汴献


N

“No.” On issues that require voter consensus, this vote acts as a veto. All
癥瑯敳t a牥 ex灥c瑥搠瑯t 扥 acc潭oan
楥搠睩瑨wa渠ex灬慮p瑩潮Ⱐ潴桥o睩獥 瑨ty
a牥⁤ e浥搠癯楤m


䅮潴桥爠ty灥 潦 癯瑥Ⱐ睨楣栠楳i湯琠楮捬畤i搠a扯癥Ⱐ楮捬畤i猠瑨攠湯n
J
扩湤bng a湤n楮景i浡m
癯瑥猠瑨v琠潣t畲⁩渠uma楬⁡湤⁣桡琠捯湶e牳r瑩潮献o

周q 摩晦e牥湴 type猠潦 v潴敳o摥獣物扥搠a扯癥 a牥 ca獴s潮
摩晦e牥湴ntype猠o映楳i略献s周e獥
楳i略猠s牥⁣a瑥g潲楺e搠楮d漠獩x⁤楦 e牥湴⁣n瑥g潲楥猬⁷桩 栠h牥⁤e獣物re搠d猠s潬o潷猺




Long
-
term plans


These plans are simple announcements made by developers
working on a particular component of a Jakarta project. Bindin
g votes are not
made on these, but committers and developers are encouraged to express the
opinions regarding these plans such that problems can be addressed as quickly as
possible.




Short
-
term plans


Short
-
term plans are also not directly voted upon. The
se are
announcements that are intended to keep developers and committers updated on
who is working on which part of the project.




Release plans


See section
9.4.10




Release testing


New releases must be tested before release t
o the general
public. Release test require majority consensus for approval.




Showstoppers


Showstoppers are issues that must be resolved before the next
public build release. These issues are considered quite important and are kept in a
unique file, named

STATUS, which is packaged with the build. This is done in
order to ensure that the problem is fixed before the release.




Product changes


Project code and documentation changes are also voted upon.
These changes are also kept in the STATUS file.


9.4.5

Checko
ut


Input: Approved Proposal/Requirements

Output: Latest Code

Agents: Developers/Committers


Once a developer has decided what to work on, whether it be a bug fix or a new feature,
he or she then retrieves the latest version of the source code. This is per
formed using
CVS, WinCVS, ViewCVS, CVSup, and Rsync
(
http://jakarta.apache.org/site/cvsindex.html
). Using these tools developers and
committers may access the data repository in two ways: anonym
ously or via login access.


All users are given permission to access the data repository anonymously. When logged
in anonymously, users may only checkout source code. In order to attain full access, the
user must actually be a committer with a login accoun
t on the Apache development
server.


9.4.6

Design and Code Major Change


Input: Latest Code

Output: Latest Code

Agents: Developers/Committers


One central part of the process is the design and implementation of the product.
Implementation is many times the route

in which many developers get started. A person
can propose a new piece of code, or a patch to be included into the code base. This person
then becomes a developer. Developers often get involved in the project because they
desire to include additional feat
ures for the product, and they volunteer to make the
change or contribute to the implementation.


When developing the software, there are two kinds of code changes that developers and
committers can make: major changes and minor changes. Minor changes enco
mpass
simple patches to fix bugs, or minor changes to the code that affect little part of the
functionality. In the other hand, major changes are new features, large
-
scale changes that
can affect the semantics of an existing API function, program size or d
ata formats. In
general, a major change is such that it can affect a major area of the program.


Design is largely done by the contributor of the code, when the feature is small or when
just one person is developing a large feature to be submitted latter t
o the project. But
when a major piece of functionality is to be developed, committers and developers will
exchange ideas and comments on how to go about implementing a major feature. After a
decision is made, developers can start working on the items that
have been assigned to a
particular release version of the product.


In the Jakarta project, developers and committers make code changes by using different
tools (such as text editors, IDEs, etc). Each developer/committer will either contribute
with a piece

of code (such as a feature, for instance, Spanish support in Lucene), or will
volunteer to fix a bug or help implementing a feature request made by somebody else.


All code changes should be successfully compiled and tested before being submitted for
revi
ew/commit.


Each of the projects repositories contain a file called STATUS that keeps track of the
agenda and plans corresponding to that repository. Committers use this file to inform
others of the changes being made. When submitting patches/code changes
to CVS, the
person who checked in the patch should send a message to the person who contributed
with the patch, as well as the mailing list, to specify that the patch has been submitted,
and avoid source code/patch conflicts.


Developers and committers sho
uld follow project conventions when working with the
source code (see
http://jakarta.apache.org/site/source.html

and

Code
Conventions for the
Java Programming Language

for more details)
.


After changes have been made, source code is submitted to CVS for storage. This is
c
alled the “latest code” and it can be used to produce nightly builds. Latest code at this
point is very unstable since people other than the developer who contributed with the
change haven’t tested it.


9.4.7

Design and Code Minor Change/Bug Fix


Input: Latest C
ode

Output: Latest Code

Agents: Developers/Committers


Simple fixes to bugs can be committed and then reviewed. With this kind of process,
there is a high level of confidence in the change made by the Committer. This is an
acceptable practice since minor c
hanges shouldn’t affect major functionality of the
product. Developers and committers implementing minor bug fixes follow the same rules
as those implementing major code changes. Basically, after a minor change has been
made, the code will eventually be re
viewed and included into the main code base. The
person implementing the change informs the developers’ list about the updated code.


9.4.8

Commit


Input: Latest Code

Output: Nightly Build and Source Snapshots

Agents: Committers


When committing code changes, co
mmitters should try to commit related changes as a
group, or as close together as possible. It is very important that the current source code
should be ready for compilation at all times. Thus, committers have to be careful when
committing major changes an
d they must indicate any risks or expected problems when
committing the code.


Committers can use the STATUS file in the repository, to summarize the code changes
submitted since the last release.


Every night, a new build is created, that includes the lat
est code for the day. These
nightly builds are very unstable, but they can be used for further testing. Nightly builds
are meant for developers helping to develop the product.


Once committers have decided that a new build can be considered for final relea
se, a
committer will make the build and send a message to all committers, to indicate that no
changes are to be made to the repository for a certain period of time (after this a build can
be submitted as release candidate). This is known as “code freeze”.


Code freeze periods have to be short, otherwise, changes start being submitted again and
the code freeze stops. This usually happens when changes are still being made to add new
features. Another alternative could be to create a branch when a release is "
feature
complete" and then apply bug fixes in that branch until the release is ready. At this point,
changes can be merged in the mainline. (This alternative was suggested for the Lucene
project, but it seems that to this day, they are still using code fre
eze as the main
alternative to this problem, see
http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene
-
dev@jakarta.apache.org&msgId=115310
)



9.4.9

Review


Input: Latest Code

Output: Nightly Build and Source Snapshots

Agents: Committers


Once a developer or committer has submitted code changes, committers are informed of
this through the mailing list. All major code changes have to go through a rev
ision
process. In this process, developers and committers communicate to verify that the code
has been successfully updated to address the bug or to implement the feature request or
area of functionality that has changed.


Sometimes the actual source code
or documentation will be attached to the body of the
message being sent, along with comments about the code. Another option is to review
what has been submitted to CVS (for instance, when a committer makes a minor change
to the code). After the code/docume
ntation has been reviewed, the code can be
committed, to be included in a nightly build.


It is important to note, that code has to be approved with no rejections on it, by any of the
committers, in order to be approved as part of the code base.


9.4.10

Build Pla
nning


Input: Nightly Build and Source Snapshots

Output: Build Plan

Agents: Committers


Committers develop release plans upon which they vote in order to determine what will
be contained in a build release. Committers will gather information contained in t
he
“status” and “to do” files for a particular project, and then plan what features will be
included in a particular release of the product, and what builds can be considered as
candidates for milestone or release builds.


Several methodologies have been
proposed within the different Jakarta projects, in order
to determine what should be the build planning process to be followed. An example of
this can be found at
http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene
-
dev@jakarta.apache.org&msgId=11556
4


The proposed release staging process is as follows:


Stage 1 (Design)

-

determine and design new features for next release.

Stage 2 (Development)

-

Work on new features.

Stage 3 (alpha)

-

All new features exist, but there are bugs. May fail some unit te
sting.
Feature Freeze (difficult in a open source environment).

Stage 4 (beta)
-

No show stopping bugs and all unit testing completed. Request outside
developers to start working with release. Fix bugs.

Stage 5 (release candidate)

-

All know bugs have been

fixed and the product is
presumed stable. A wider audience tries the release. If not bugs are found in a 5
-
day
period (suggested), the release goes final gold master. Source code freeze unless bugs
found.

Stage 6 (Gold Master)

-

The release is final.


Bui
ld planning results in a build plan. A build plan can include:




Schedule for release candidates.



Schedule for release, considering other product schedules. (For instance, Tomcat
and Apache)



Frequency of release candidates.



Milestone Builds.



Naming conventi
ons for releases.



Features to be included with a release.



Platforms to be supported.



Documentation (FAQs, Installation guides, Release notes, etc).


9.4.11

Build Voting


Input: Build Plan

Output: Release Build and Source

Milestone Build and Source

Demo Build

Agen
ts: Committers


The voting process is fairly similar across the different Jakarta projects. As outlined
previously (see section
9.4.4.1
), committers will vote on what builds can be considered a
final release. There are few diffe
rent kinds of builds that can be decided upon. The
categories for builds (other than nightly) are:


Release Build.

Release Builds are the top quality builds. These are considered to be the
"final" builds. A build is not considered good for release, unless
a considerable amount of
testing has been performed on it. Users, developers and committers will use it for some
time after it was originally released, and if no problems are found, then committers will
vote to make it a final release.


Milestone Build.

Th
ese are somewhat stable, but not as good as a release build. They are
buggy, however they can be used as a patch for users who want to take advantage of new
features. As for developers and committers, these milestone build can be used to track the
progress

of the project.


Demo Builds.

These are done to show demonstrations of the products.


9.4.12

Announcement


Agents: Developers/Committers


In addition to the sub
-
processes described above, announcement is a background process
that continuously occurs in parallel

with any other process. This communication occurs
primarily on the mailing lists. Jakarta contributors have a choice amongst of four types of
lists (
http://jakarta.apache.org/site/communica
tion.html
):



Announcement lists


This type of list is quite low in mail traffic, and is intended
for announcing very important information, such as final release builds, to all
people involved in the Jakarta project.



User lists


This type of list is int
ended for Jakarta software users to discuss with
one another configuration and operating questions. These mailing often lists
contain high mailing traffic.



Developer lists


Developer lists are intended for developers and committers to
discuss development
issues. Some announcement that occur here, which may not
be found in the other mailing lists, include project proposals (see
http://www.mail
-
archive.com/tomcat
-
dev@jaka
rta.apache.org/msg26507.html

for
an example).



Commit Lists


These lists receive all of the automatic CVS code commit
messages of their respective project. Committers are required to be subscribed to
the commit list of their Jakarta subproject so that th
ey can remain aware of the
changes that are made to the repository.


When announcements are made to these lists they must follow certain conventions. For
example, when a new patch is developed an email is sent by the patch’s author to the
developer and/or
user mailing lists. The subject headline of such an email is labeled
“[PATCH].” Likewise, proposal and general announcement emails are labeled
“[PROPOSAL]” and “[ANNOUNCEMENT]” respectively.


Announcements are not only performed via email. They are also co
mmunicated via IRC
chats, website forums and FAQs (for examples, see
http://jakarta.apache.org/site/faqs.html
).



10

Open Source Software Process Modeling with Protégé


10.1

Introduction


Software developm
ent is a challenging task, not only because of the complexity of the
software artifacts being developed but also because of the complexity of the process that
defines the activities needed to produce these artifacts. This process is a very complex
task th
at involves many different artifacts being produced, various users involved in the
process, a variety of tools that are used to support software development and
communication, and constraints on these entities imposed for various purposes. In
software eng
ineering research, there have been many attempts to capture the necessary
information in effective abstractions in order to make the development of software an
easier task. Software lifecycle models such as the traditional waterfall model or more
recent t
echnique such as extreme programming are attempts to create some sort of order
in the chaos of the software development process. Some technologies make these
attempts in a general and highly abstracted manner, such as the waterfall model [Roy70],
or in th
e ad hoc and flexible manner of the extreme programming model [Bec99].
Others, like process programming, attempt to formalize the process so that they can be
manipulated using the advanced programming techniques familiar to software developers
from the re
alm of source code creation and management [Ost87]. One of the most
important challenges involved in these attempts is striking a balance between
expressiveness and succinctness. Processes must be defined precisely enough to be
useful, but need to be suf
ficiently abstracted so as to promote understanding. Formal
approaches to software process descriptions have several advantages such as the enabling
of automated analyses on software process models, easy interchange of process
descriptions due to the comm
on format, and multiple visualization generation from a
single formal description. To leverage these advantages, process formalization using the
Protégé system is described.


10.2

Formal Approaches


Formal approaches to the problem of process definition have b
oth advantages and
disadvantages. One of the most important disadvantages is the fact that it is an
intellectually challenging task to understand a formal process description. Being able to
understand formalisms by looking past the unintelligible collect
ion of the formal syntax
requires a significant familiarity with formal techniques. As a result, these techniques are
not accessible to a large number of users. Additionally, defining any activity or artifact
in a formal way requires a greater degree of
effort mainly dedicated to adhering to the
rigorousness of the model guiding the formalism; formal methods are not well
-
suited to
casual use for a non
-
challenging software development activity.


Nevertheless, formal approaches are not without significant a
dvantages. A formally
defined model is uniquely suited to analytical scrutiny. Automated tools can be created
that examine the formal description of a software development process searching for any
number of possible inconsistencies such as problematic a
ctivity steps. Additionally,
automated efficiency improvements can be applied to such models, as places in the
process model where these are possible can be algorithmically located. Also, a formal
description that adheres to an accepted and well
-
defined
set of constraints is one that is
organization and developer independent. The interchange of formal descriptions and the
use of tools created by third parties become an easy task; the lack of ambiguity and the
common format enable easy information exchang
e. Finally, multiple visualization
capabilities can be leveraged once a formal description exists. It is an easy task to
conceive of different ways to visualize the same model emphasizing different aspects. A
formal description that can be used as an in
put to a particular visualization generator
reduces the effort to create these multiple visualizations significantly. Therefore, a
formal model of software development processes can be a powerful artifact that can
enable many different activities and anal
yses that are impossible with in informal
description of the same model.


10.3

Protégé


Protégé is a tool developed at Stanford University that allows for the creation and
manipulation of ontologies [Pro02]. Ontologies are specifications of conceptualizations
encoding knowledge about the structure of a specific domain. An easily accessible
example for the domain of software development is that of a class hierarchy from object
-
oriented programming techniques. This class definition encompasses the knowledge
abo
ut the domain the software system operates within; this definition is precise and
unambiguous. The Protégé tool allows for the creation of the ontology model, the
general structure that all knowledge bases dealing with the ontology's domain must
adhere to
. Additionally, the tool allows for the instantiation of specific occurrences of the
ontology to capture information about a particular situation. For example, an ontology
can be created capturing the information essential to a vineyard such as the diffe
rent
types of wine that are available and the quantities of each. In the ontology, the structure
for this information is created in an abstract way with no instance
-
specific information.
In the instantiation of this ontology, information is precisely def
ined according to the
ontology's model; for example, specific wines with precise quantities would be defined.
Using the object
-
oriented programming example, instances of classes can be defined.


The capabilities of Protégé can be used for the definition o
f process models. An ontology
can be created that defines a process meta
-
model, encapsulating the general structure of
the actions involved in the software generation software, the users that perform these
actions, the artifacts that are produced by these

actions, and the tools that make the
operation of the process possible. The meta
-
model would be an abstract entity that
contains the general information that underlies all software development process
descriptions; the ontology defines the commonalties b
etween processes.


The Protégé system was designed to be extensible and in line with the open source
development methodology allows its users to develop extensions to the main tool. To
fully leverage the advantages of a formal process description, there a
re certain available
extensions that must be used.


The Ontoviz plug
-
in is an extension to the Protégé tool that implements automated graph
layout capabilities. The extension uses the Graphviz package developed at AT&T to
automatically graph the entities
defined in the main Protégé ontology both for the abstract
ontology as well as any instantiations. Because the graph layout is done automatically,
process designers can focus on the creation of the process model rather than its
visualization. A variety o
f options for graph customizations are provided by the Ontoviz
extension, further emphasizing the multiple visualization advantage of formalization. For
example, the tool allows for the expansion of certain aspects of the process model and the
suppression

of others in the final graph.


In addition to Ontoviz, the XML Tab extension is one that is very useful in the processing
of open source software development processes. This plug
-
in allows for the importing
and exporting of both ontologies and instantiat
ions in XML format. These capabilities
support the formalization advantage of interchange by saving process models in a non
-
proprietary format. Sharing of models could be done easily, with only a single XML file
representing processes. Additionally, the

XML format is well supported in a variety of
other tools; therefore, third party tools could be modified to perform various analyses on
the software development process models in addition to a host of other tasks intrinsic to
the XML file format.


10.4

Meta
-
Mo
del Definition


The meta
-
model is the driving entity of any software development process model; the
foundations for all models are contained within the meta
-
model. The first step toward
using Protégé for formal software development process descriptions is

the definition of a
meta
-
model that can be used to instantiate specific instances. This paper defines a meta
-
model that can be used to represent software development processes, both those having to
do with open source as well as traditionally developed s
oftware. The overall design has
been heavily influenced by the Process Markup Language (PML) presented in [NS01].
The application domain of the Protégé tool with the limitations that the graphical options
available impose on the model, and the desire to
decentralize some of the information
contained within individual action constructs prompted some changes that are presented
in the following discussion of the meta
-
model. For example, the notion of a
next

field
has been used to indicate the proper sequenc
e of actions.



Figure
6

-

Meta
-
model view from within the Protégé tool


The basic design of the software development process meta
-
model that is presented here
consists of certain high
-
level elements that are abstractions of the b
asic entities of the
software development as well as some constructs meant to illustrate the different types of
the logical control flow that the process can follow. Each of these elements is composed
of a number of attributes that specify the values that

distinguish one element of the same
type from another. A listing and discussion of each of these high
-
level elements follows.

Process Model



name

(required)



url



flow scenario

(required)


The
Process Model

element is the top
-
level element that represents t
he overall process
construct. The
name

field defines the name of the entire process, while the
url

field is a
possible link to documentation. The
flow scenario

field is a link to an instantiation of a
Control Flow

construct, and represents the main logic
al flow of the process. Both the
name
and the
flow scenario

fields are required; values must be entered for these fields.

Agent



name

(required)



url



acts on

(required)


The
Agent

element is an abstraction of the actors that participate in the execution of
a
certain process step. The
name

field defines the name of the actor being represented, and
the
url

field is a hyperlink to possible documentation. The
acts on

field is a link to an
instantiation of an
Action

element, which defines the action that the pa
rticular agent
participates in. Motivation for the design decision of including the
acts on
field within
the
Agent

construct was twofold. First, there is a set of display limitations imposed by
the Ontoviz plug
-
in; by establishing the relationship betwee
n actors and actions in such a
manner, the graph generated by Ontoviz was easily understandable without significant
display customization. Second, and perhaps more importantly, there was a desire to
distribute some of the information encapsulated in the
A
ction

construct to other element
types in order to make the
Action

construct more compact.

Resource



name

(required)



url



required by


The
Resource

construct is an element that defines a particular resource that is either
produced or required by a certain ac
tion. The
name
field defines the name of the
resource being defined, while the
url

field defines a possible link to further
documentation. The
required by

field is a link to an instantiation of an
Action

element
and establishes that the resource being re
presented is required by the defined action.
Similar to the case of the
Agent acts on

field, the
required by

field was included in this
construct for both graphical understandability and information distribution.

Tool



name

(required)



url



command



used by

(
required)


The
Tool

element describes a particular tool that is used by actors in accomplishing a
task. These tools are most usually in the form of executable tools, though they can also
be collaboration support applications such as chat programs. The
na
me

field defines the
name of the element, the
url

field defines a link to possible additional documentation, and
the
command

field may define the command to begin the tool's execution. This
command

field will be especially useful when applying a process p
rototype generation system on
the formal description of the software development process being modeled. The
used by

field defines which actions these tools are used in. The decision to associate tools with
actions was made to promote readability of the g
raph by minimizing the number of
connecting lines in the case when the same tool was used by more than one actor in a
single action; by associating the tool with the action, only one line needs to be drawn to
connect the action with the tool. If tools wer
e associated with agents, there would need to
be as many lines as there are tool users.

Script



name

(required)



url



code

(required)


The
Script

construct is an abstraction of an automated script that performs an action
without the manual contribution of an
agent. The
name

field defines the name of the
script, the
url

defines a possible link to further documentation, and the
code

field defines
the actual code that the script executes; this code may be one of many different types of
an executable program rang
ing from a script to a full
-
fledged program.

Action



name

(required)



url



type

(required)



script



provides



next action


The
Action

element is the abstraction of a primitive process step that represents a
particular action; the action that this construct repre
sents will be the smallest granularity
action that the process designer desires for the model being built and should carefully
balance the prescriptive and advisory features of the process definition. The
name

field is
the name of the action being represe
nted, the
url

field is possible hyperlinked
documentation, the
script

field may define which
Script

construct presents the executable
of this action, and the
provides

field defines what resources are produced by the action.
The
type

field defines whether
the action is a "manual" one or an "executable" one; if it is
an executable action, the
script

field should be defined. Finally, the
next action

field is a
logical link to the next action that composes the overall sequence the action being defined
belongs

in; this field was added partly to ensure that during XML interchange of models
the order of their execution remained unambiguous, and partly to take advantage of the
graphical capabilities of Ontoviz.

Control Flow



name

(required)



url



next control flow


C
onstructs that belong to the
Control Flow

element type specify the logical order in
which actions should be performed. The
name

field defines the name of the construct,
and the
url

field is a link to possible hyperlinked documentation. The
next control f
low

field is a link to an instantiation of a
Control Flow

element and defines the logical flow
construct that follows the one being defined, and is similar to the
next action

field that
was defined in the
Action

element type. The existence of this
next co
ntrol flow

field
allows for the nesting of
Control Flow

constructs to allow for the construction of
logically complex processes. This element as four sub
-
categories that provide more
details about the logical flow.

Sequence



actions


The
Sequence
sub
-
const
ruct defines a set of actions that are to be performed
sequentially, and is the most common type of control flow encountered. The
actions
field defines the first of these actions.

Selection



actions


The
Selection

sub
-
construct defines a set of actions, on
ly one of which is to be
performed. The
actions

field defines what these actions are.

Branch



actions


The
Branch

sub
-
construct defines a set of actions or control flows that can be
performed concurrently. The flow of the process only moves on if all of t
hese
actions are completed. The
actions

field defines the set of actions to be
performed.

Iteration



actions



condition


The
Iteration
sub
-
construct defines an iteration over the specified actions or
control flows. The
actions
field defines the first actio
n of the sequence to be
iterated, while the
condition

field defines when iteration ends. This
condition
field is currently defined as a string that can take the form of a natural language
statement; this was considered to be the most versatile way to impl
ement a
conditional check that would accommodate everything from a formal condition to
a completely informal and developer
-
dependent one.


10.5

Using Protégé for Process Descriptions

Using the Protégé system is not a difficult task, once the meta
-
model has been

established; instantiations of each entity that comprise the process are easy to create. The
most difficult part of using the tool is the identification and proper use of the different
types of logical control flows. Simple sequences are less common as
the software
development process scales up; once the process begins to glow, more and more
constructs such as branches and iterations begin to appear. It seems, then, that the most
important element of an effective and understandable formalization of a so
ftware
development process is the proper decomposition of the overall process into smaller sub
-
processes, which the
Control Flow
constructs are meant to define. It is on this task that
the majority of time should be spent when formalizing a process so tha
t the maximum
gains from the formal description can be had.

11

Discussion


In the following subsection we compare our findings on the Jakarta and HTTPD software
process lifecycle models. Then we discuss the differences observed between traditional
software li
fecycle models and open source software development at the Apache Software
Foundation.

11.1

Jakarta versus HTTPD


As shown above, Jakarta and HTTPD follow relatively similar software lifecycle process
models. Due to the fact that Jakarta is a more recent Apach
e project, its process is more
clearly reflected in its website. This is because Jakarta contributors had already learned
from the mistakes which occurred in the Apache project, resulting in more organized
software development process guidelines.


11.2

Traditio
nal Software Lifecycle Models versus Apache

Clearly, open source software development processes are quite different from traditional
software development models.


Based in our findings on the Apache projects analyzed, it appears that the most
significant d
ifferences are found in the following areas:


Management
. While traditional software enterprises maintain a tight management to
control software development projects, open source development projects are controlled
by individuals who volunteer their time a
nd skills to the community.


Licensing and Usage
. Open source development licenses (i.e.
http://www.apache.org/licenses/LICENSE
) are created to maintain the open source status
of a project as it is di
stributed, reused, and modified by different users. This means that
the source code is required to remain openly available to the public. On the other hand,
traditional software source code is protected from public distribution by special types of
licenses
, such as copyrights, in order to ensure that it is kept proprietary to its
manufacturer.


Requirements Elicitation
. Requirements in OSSD are obtained from user and developer
requests, whereas in traditional models, software requirements are drawn from a
p
articular department, like marketing from instance.


Design and Development
. Design does happen in OSSD, mainly for major changes to
the base code. Development is also different since developers volunteer to work on
certain part of the project they are i
nterested on.


Testing
. Since all members of the community have access to the product at any time,
testing becomes everybody’s task. The more people download and test the software, the
more problems will be discovered and eventually fixed. This differs fr
om the traditional
approach, where only a handful of people will test the software, under controlled
circumstances.


Frequency of builds and product releases
: Traditional software processes only release
when a build is stable enough (called either “alpha”
or “beta”, depending on the quality
of the build). In comparison, OSSD builds happen nightly. They follow the rule of
releasing early and often.


Team communication
: Communication in OSSD happens asynchronously, as team
members are located in different p
laces, and work under different schedules. Most of the
time, team members do not know each other and the only interaction between them
happens through email exchange.


12

Conclusions


As demonstrated in this paper, open source software development processes
blatantly
defy the rules and methodologies that have been so carefully laid out by proponents of
traditional software development processes. Despite this, some of the most successful
software in the world, such as those developed at the Apache Software Fou
ndation, has
been developed following open source ideologies. This would seem to indicate that
software development efforts can be successful even without following the traditional
models. However, there is method to the open source community’s madness.

The
processes that govern their software development efforts exist, even though they may not
match those generally accepted by monolithic software development organizations.
However, without a dedicated effort to precisely identify and model these proces
ses, the
questions of why these processes are successful and how they may be improved cannot
be answered. This paper has been an attempt to study in detail the software development
efforts of the Apache Foundation and the very closely related Jakarta proj
ect. The
examination of the process that these two entities follow in developing their software
yields valuable results both in providing insight into a successful open
-
source
development methodology as well as in uncovering potential areas of improvement

where the existing process can be streamlined. Perhaps the clearest contribution is the
identification of the model itself; processes in the open
-
source community are well
hidden within vast mailing list archives and personal communications. Formalizati
on of
these processes provides further added value to the models developed, as the realm of
easy interchange of process models and automated analyses becomes available to the
open
-
source community.


13

Acknowledgements




Justin Erenkrantz <
jerenkrantz@apache.org
> provides information that is not
documented in the HTTPD project website. More information about Justin can be
found at his personal website at
http://www.erenkran
tz.com/
.



Jason Robbins <
mailto:jrobboins@collab.net
> provide information regarding the
Tigris project. More information regarding the Tigris project can be found at
http://
www.tigris.org/
.



Walt Scacchi
wscacchi@ics.uci.edu

provided much guidance and inspiration for
studying the OSSD community and the processes they follow.



14

References


1.

E. S. Raymond,
The Cathedral and the Bazaar
,
First Monday
, 3(3), 1998.

2.

SourceForge
, http://www.sourceforge.net

3.

Netcraft Web Server Survey
,
http://www.netcraft.com/survey/

4.

B. Behlendorf,
The Apache Story
, Linux Magazine, June 1999

5.

R. Fielding and G. Kaiser,
The Apache HTTP Server Project
,
IEEE Internet
Computing
, 1(4):88
-
90, July/Aug. 1997.

6.

D. Cubranic and K.S. Booth, Coordination in open
-
source software development,
Proc. 8th IEEE International Workshops on Enabling Technologies:
Infrastructure for Collaborative Enterprises
, 1999.

7.

D. Wheeler,
Why Open Source Software / Free Software (OSS/FS)? Look at the
Nu
mbers!
, June 2002

8.

M Kasichainula,
Presentation: IBM and Apache plan their first date
,
ApacheCon
2000
, March, 2000

9.

New version of Apache released


again
,
http://www.news.com/
, April 8, 2002

10.

Apache 2.0 to debut Monday


partway
,
http://www.news.com/
, November 9,
2001

11.

Delayed Apache software nears release
,
http://www.news.com/
, April 5, 2001

12.

Apache Web software on verge of major revision
,
http://www.news.com/
, August
8th, 2000

13.

W. S
cacchi,
Understanding the Requirements for Developing Open Source
Software Systems
, to appear in
IEEE Proceedings
--
Software
, 2002.

14.

C.R. Reis and R.P.M. Fortes,
An Overview of the Software Engineering Process
and Tools in the Mozilla Project
,
Proc. Workshop on Open Source Software
Development
, Newcastle, UK, February 2002.

15.

A. Mockus and J. Herbsleb,
Why not improve

coordination in distributed software
development by stealing good ideas from Open Source?
,
Proc. 2nd Workshop on
Open Source Software Engineering
, Orlando, FL, May 2002.

16.

T. Halloran and W. Scherlis,
High Quality and Open Source Software Practices
,
Proc. 2nd Workshop on Open Source Software Engineering
, Orlando, FL, May
2002.

17.

A. Brown and G. Booch,
Reusing Open Source Software and Practices: The
Impact of Open Source on Commercial Vendors
, Proc. 7th International
Conference on Software Reuse, 123
-
136, Austin, TX, USA,
April 15
-
19, 2002.
Appears in, C. Gacek (Ed.),
Software Reuse: Methods, Techniques, and Tools
,
LNCS 2319, Spring
-
Verlag, May 2002

18.

A. Monk and S. Howard,
The Rich Pic
ture: A Tool for Reasoning about Work
Context
,
Interactions

, March
-
April 1998.

19.

S. Bendifallah and W. Scacchi,
Work Structures and Shifts: An Empirica
l
Analysis of Software Specification Teamwork
,
Proc. 11th. Intern. Conf. Software
Engineering
, IEEE Computer Society Press, Pittsburgh, PA. 260
-
270, May 1989.

20.

P. Mi and W. Scacchi,
A Knowledge
-
Based Environment for Modeling and
Simulating Software Engineering Processes
,
IEEE Trans. Data and Knowledge
Engineering
, 2(3):283
-
294, September 1990. Reprinted in
Nikkei Artificial
Intelligence
, 20(1):176
-
191, January

1991, (in Japanese). Reprinted in
Process
-
Centered Software Engineering Environments
, P.K. Garg and M. Jazayeri (eds.),
IEEE Computer Society, 119
-
130, 1996.

21.

P. Mi, M.J. Lee, and W. Scacchi,
Knowledge
-
Based Software Process Library for
Process
-
Driven Software Development

,
Proc. 7th. Knowledge
-
Based Software
Engineering Conf.
, Washington, DC, IEEE Computer Society, 122
-
131,
September 1992.

22.

P. Mi and W. Sc
acchi,
Articulation: An Integrated Approach to the Diagnosis,
Replanning, and Rescheduling of Software Process Failures
,
Proc. 8th.
Knowledge
-
Based Soft
ware Engineering Conference
, Chicago, IL, IEEE
Computer Society, 77
-
85, 1993.

23.

W. Scacchi and P. Mi,
Process Life Cycle Engineering
,
Intern. J. Intelligent
Sy
stems in Accounting, Finance, and Management
, 6(1):83
-
107, 1997.

24.

J. Noll and W. Scacchi,
Supporting Software Development in Virtual Enterprises
,
Journal of Digital Information
, 1(4), February 1999.

25.

W. Scacchi,
Understanding Software Process Redesign using Modeling, Analysis
and Simulation
,
Software Process
--
Improvement and Practice
, 5(2/3):183
-
195,
2000.

26.

J. Noll and W. Scacchi,
Specifying Process
-
Oriented Hypertext for Organizati
onal
Computing
,
J. Network and Computer Applications,
24(1):39
-
61, 2001.

27.

W. Scacchi,
Process Models in Software Engineering
, in J. Marciniak (ed.),
Enc
yclopedia of Software Engineering

(Second Edition), 993
-
1005, Wiley, New
York, 2002.

28.


[Roy87] Royce, W. W., Managing the Development of Large Software Systems,
Proc. 9th. Intern. Conf. Software Engineering
,IEEE Computer Society, 1987,
328
-
338.

29.

[Bec99] Beck
, K. Embracing Change with Extreme Programming.
IEEE
Computer
. 32(10), p. 70
-
77, 1999.

30.

[Ost87] Leon J. Osterweil. Software Processes Are Software Too. In
Proceedings
of the 9th International Conference on Software Engineering
, pp. 2
-
13, Monterey,
CA, March

1987.

31.

[Pro02] Protégé Project. Stanford University. 9 June 2002.
http://protege.stanford.edu/

32.

[Geo02] Software development process using Protégé. University of California,
Irvine. 9 June, 2002.
http://www.ics.uci.edu/~jgeorgas/ics225/index.htm

15

Appendix



Figure
7
: Formal graph of Apache




Figure
8
: Formal graph of Jakarta




Figure 9: The
XML representation of the process meta
-
model.

<?
xml version="1.0" encoding="UTF
-
8"
?>

-

<
ontology
>

<
Process_Model name
="
Process Model
"

sl1
="
flow scenario
"

vt
1
="
Instance(Control Flow)
"

sl2
="
name
"

vt2
="
String
"

sl3
="
url
"

vt3
="
String
">
The top level definition of the overall process
model.
</
Process_Model
>

<
Agent sl1
="
name
"

vt1
="
String
"

sl2
="
acts on
"

vt2
="
Instance(Action)*
"

sl3
="
url
"

vt3
="
String
">
An actor that
parti
cipates in a part of the process.
</
Agent
>

<
Resource sl1
="
name
"

vt1
="
String
"

sl2
="
url
"

vt2
="
String
"

sl3
="
required by
"

vt3
="
Instance(Action)*
">
A resource item that
is required or produced by actions.
</
Resource
>

<
Tool sl1
="
command
"

vt1
="
String
"

sl2
="
used by
"

vt2
="
Instance(Action)*
"

sl3
="
name
"

vt3
="
String
"

sl4
="
url
"

vt4
="
String
">
A tool that is used by an agent as part of an
action.
</
Tool
>

<
Action sl1
="
script
"

vt1
="
Instance(Script)
"

sl2
="
next action
"

vt2
="
Instance(Action)
"

sl3
="
type
"

vt3
="
String
"

sl4
="
name
"

vt4
=
"
String
"

sl5
="
provides
"

vt5
="
Instance(Resource)*
"

sl6
="
url
"

vt6
="
String
">
A primitive step in a process.
</
Action
>

-

<
Control_Flow name
="
Control Flow
"

sl1
="
nex
t control flow
"

vt1
="
Instance(Control Flow)*
"

sl2
="
name
"

vt2
="
String
"

sl3
="
url
"

vt3
="
String
">

A construct specifying the order in which actions should be
performed.

<
Sequence sl1
="
actions
"

vt1
="
Instance(Action)
">
A set of actions
to be performed in order.
</
Sequence
>

<
Selection sl1
="
actions
"

vt1
="
Instance(Action)*
">
A set of
actions, only one of which should be performed.
</
Selection
>

<
Branch sl1
="
actions
"

vt1
="
Instance(Action)*
">
A set of actions
that can be performed concurrently, in any order.
</
Branch
>

<
Itera
tion sl1
="
condition
"

vt1
="
String
"

sl2
="
actions
"

vt2
="
Instance(Action)
">
An iteration over the specified
sequence of actions.
</
Iteration
>

</
Control_Flow
>

<
Script sl1
="
code
"

vt1
="
String
"

sl2
="
name
"

vt2
="
String
"

sl3
="
url
"

vt3
="
String
">
An automated script that
can be executed.
</
Script
>

</
ontology
>