Open Source Software Development Processes in the Apache Software Foundation


17 Νοε 2013 (πριν από 4 χρόνια και 7 μήνες)

211 εμφανίσεις

Open Source Software Development Processes in the Apache Software

ICS 225

Chad Ata

Veronica Gasca

John Georgas

Kelvin Lam

Michele Rousseau



Even though software is an intangible artifact, it is still being developed and used
y. In order to tackle with this characteristic of software, one can attempt to
specify software in a formal way or in a written description, which in turn provides a
specification of the software. With that in mind, testing the behavior of the software
gainst a specification becomes possible. With sufficient testing (both verification and
validation), the software is then released. This is the generic traditional process used to
describe software development. If the software under development is a rea
sonable size,
this generic process is quite intuitive and manageable. Unfortunately in reality, the scale
of software being developed is enormous, and the development team usually is very large
and distributed. The Open Source Software Development (OSSD)

community is one of
the frequently used examples to demonstrate differences that are not captured in the
generic development process. The success of the OSSD effort, such as Apache and
Linux, is not just motivated by those who want to save money by utili
zing a free
resource, but more due to the quality of their software. So how does an OSSD project
manage their process without utilizing traditional software development practices that
have been mandated for success?

There has not been too many in
depth r
esearch projects done towards the Open Source
Software Development (OSSD) community and their efforts. In the typical software
engineering textbook, you cannot even find the term ‘Open Source’ in the index. But in
reality, there are more than 40,000 proj
ects that are being hosted in the Open Source
development portal (such as

[2]). The OSSD community is influential to
the software industry. For example, IBM switched from using their proprietary web

server to the open source Apache HTTPD web server in June 1999 [4]. As Eric
Raymond termed in his paper [1], the OSSD ‘bazaar’ style of development is
significantly different from the ‘cathedral’ style of development in the traditional
software industry.

There is a gap between the two communities. If a better
understanding of the OSSD development process can be found, the “closed source”
software industry may be able to benefit from it and vice versa. The software industry
can learn from OSSD community

how to manage a project with a diverse background of
developers in a distributed setting. The high quality and high reliability products
developed by the OSSD community, such as the well
known Apache web server, is
rarely found in the mainstream software

industry. Sometime even the big corporations,
such as Microsoft, are still having difficulty to produce highly reliable quality software
(comparing the Internet Information Server (IIS) with Apache HTTPD). There is also a
possibility that the tools bein
g used by the OSSD community is beneficial in the setting
software industry group. The way that the OSSD community facilitate their needs and
features for the software is another important aspect to be researched.

Under these circumstances, it is desirabl
e to have a better understanding of the OSSD
effort. In order to learn from the OSSD experience, one needs to understand how a
feature is added in the form of specification, then be developed and eventually tested and
released to the public. The focus in

this research study is the Apache web server
(HTTPD). The goal is to investigate in detail how the OSSD community, in particular
the Apache group (both the ASF as well as HTTPD developers community), interacts
with each other throughout the development p
rocess. We investigate the roles that exits
within the project, the tools that they use in the project, and the artifacts created
throughout the development process of the project and the overall release process.

In the next section, we briefly describe t
he Apache HTTPD project. We compare this
Open Source Software Development project with the traditional software development
project. Then we explain in detail about the software production architecture of the
HTTPD project. We discuss the details about
the agents, tools, as well as artifact
throughout the development process. Each stage within the process is also explained in
detail. Next an attempt is made to illustrate this development process in a formal manner.
At last we conclude the paper with s
ome further discussion and founding that we had
during the research.


Overview of Apache

The Apache group was formed in
February of 1995

by the 8 core founders. Their initial
goal is to extend the

web server created by Rob McCool to stable, bug
free, and feature
rich software. The founders coordinate together through their private email, applying
their own individual “patches” to the source code. After extensive beta testing, the
Apache web serve
r was born on December 1995. Four years later, the Apache group
formed the
Apache Software Foundation (ASF)

to provide the logistic support (such as
donation or contribution

from others) and other business
oriented needs (such as any

issues) of the project. Since then, many new OSSD projects (e.g. Jakarta,
XML, etc.) have s
tarted under the leadership of the ASF.


Problem Domain Characterization

Apache is a well
known web server. Currently it has more than 50% of the market share,
as the
May 2002 Netcraft survey

[3] has shown
. This demonstrates the large
scale usage
of the software. The Apache 2.0 web server is the result of a successful development
process that one can study, from the beginning of feature specification to the ultimate end
of deployment to general public use
rs. In August of 2000, the Apache group decided to
restructure and rewritten the entire Apache web server (i.e. the ‘requirement’). Their
original aim was to have the new server released by the end of 2000. In reality, their
effort of creating Apache 2.
0 didn’t get into first beta testing stage until April of 2001
(i.e. the ‘testing’). In fact, not until November of 2001, companies that support and
distribute Apache web server considered the Apache 2.0 as ready
use (i.e. the ‘initial
release’). The
ultimate stable public release did not become available until the beginning
of April 2002 (i.e. the ‘ultimate release’), which is approximately 1.5 years behind their
original goal. This is the same typical problem that a traditional software development

missing a deadline.

This research study is aim at understanding the problem stated above. We try to
understand how the OSSD community communicate among themselves. After
understanding the communication mechanism, we try to find out how they facil
requirements for the software under the development effort. During the development, we
look at the channel of communication for help inquiry and testing between the
developers. After the testing effort, we investigate the process of getting the tes
code to become a releasable product to the general public.


Process Modeling and Visualization


Agent Roles

All of the Apache Software Foundation projects use a philosophy of meritocracy to
define the hierarchy of their agents. Meritocracy is ba
sed on the notion of work increases
rights. All the code is reviewed by many eyes. For example, developers can only gain
write access to the CVS by proving their skills and commitment to the committers on the
project. They achieve this by contributing c
ode in a quality of what the committers view
as good code. Committers then must vote to bring a developer into committer status.
Votes on patches and what not only are binding for committers. Thus, the higher the rank
the more power and influence one ha
s. Below is an outline of
agents and their roles as
outlined through the ASF website
. Keep in mind that agents can contribute in any lower
ranking duty, but agents on a lower rank cannot partici
pate in the duties of the agents that
are higher ranking.



Apache considers its development to be user
centered. They contribute in three basic
forums. First, they submit bug reports, through the website, using
. They are the
ultimate testers of the final version of the code. Secondly, they contribute suggestions for
new features. They do this by using Bugzilla and indicating that they are submitting an
enhancement. This is one way of und
erstanding what users want to see in the next
version and is significant in determining the long
term goals of the project. Finally, users
support each other through the
mailing lists



The developer's role is to contribute either code or documentation in the form of patches
to the project. Developers do not have capabilities to commit changes into the CVS They
submit ‘diff’ file results to different channels in order to solicit

as well as advocate
committers to commit the changes into the CVS. Developers have limited voting power.
They are allowed to vote on patches, but don't have a binding vote unless they authored
the patch. Developers can also contribute by being involved

in alpha testing or beta



Committers develop and commit code or documentation. They can commit their own
works as well as patches from developers with their write access in the CVS. Committers
vote on developers’ patches for acceptanc
e or rejection. Committers is responsible to
oversee the development efforts among developers. They determine which developers
become committers by recommending them. After a unanimous voting, a developer will
advance to a committer status. It is the r
esponsibility of committers to ensure that code
integrated into the CVS is ‘good’ code. They are responsible for reviewing what goes
into the CVS and ensure the integrity of the software. Committers can be come part of
the Project Management Committee by

promotion and long term commitment.



Management Committee (PMC)

Members of the PMC are self
selected committers. They are responsible for the long
term direction of the project. Although the Board of Directors ultimately has the final
on making power on any project, they delegate this responsibility to the PMC of
each project. There is a single PMC for every project. They determine what will go into
the next release of a project. Although the Release Manager has the ultimate say in w
goes into the final release, the PMC can make suggestions.


Release Manager (RM)

The release manager's main role is to schedule the release of the project. The RM is a
selected committer. The RM decides when each testing phase is done and when th
general availability (GA) release will be made public. This individual has the ultimate
authority over what makes into the release.


Foundation Members

Foundation Members have demonstrated long
term commitment through the amount of
work they have contri
buted to Apache Projects. Members are not project specific, but
part of the Apache Software Foundation. Members are responsible for guiding the
foundation. One of their most critical responsibilities lies in the election of the Board of
Directors. Founda
tion Members are invited by other members and voted into


Board of Directors and Officers

The Board of Directors and Officers of the Apache Software Foundation are responsible
for the business affairs of the foundation. The officers are electe
d by the Board of
Directors to oversee the daily operations of the foundation. Although the Board of
Directors is officially responsible for the projects, they delegate most of the decision
making process to the Program Management Committee.


Tools and Ne
twork Infrastructure

As with all large software development projects, there is a need for tools to support the
process. This is especially true for Open Source Development, which needs to confront a
maximally distributed community. For the Apache HTTPD c
ommunity the tools are
essential for its existence. They provide guidelines (which can be considered tools as
well) to provide some standardization and understanding of expectations and process.
Although there are not many tools, these tools are powerful

and effective in handling the
large number of members in the community and the distribution of those members. Each
of these tools are detailed on the Apache website.



The entire community coordinates with each other mainly by communicating
mailing list, as describe in the
project guideline of HTTPD
. They also rely on
information posted on the
project web portal
. Another form of communication

can also
be found in the CVS “
” file, where the vote on each issue is recorded for future
reference. Lastly, the face
face communication is probably the most natural form,
is rarely found in the OSSD community. For Apache in particular, there is an

for all the developers to gather and to discuss the project.



Repository and Configuration

Concurrent Ver
sioning System (CVS) is the source repository used by all the Apache
Software Foundation projects. Each project has their own branch of source code and
documentation within the CVS. Each developer can follow the

provided on
the project web portal to setup their own access and synchronize their local copy with the
most current version in the CVS. Developers can obtain the source code for the platform
of their interest through the C
VS binary and source distribution. Developers can also
obtain previous version source code through the CVS system as well. The Release
Manager also uses the CVS extensively throughout the release process, in order to
minimize the interference of the relea
se with the current development effort.



There is no specific development tool recommended by the community for this project.
Because the Apache HTTPD software is a cross
platform product, each individual
developer can pick their own favorite d
evelopment tool that works best for their
particular platform to yield the maximum productivity. However, the project community
does provide different kind of guidelines for the development. For example, the

is being strictly enforced over all source code committed into the CVS. In
order for the development effort by an individual developer to pay off, the developer
needs to follow the
patch guideline

in order to get the maximum possibility of having
his/her patch committed into the CVS.



Apache HTTPD is a large and complex software project. Trying to find a problem is not
an easy task. In order for a developer to

work on a particular Problem Report (PR), or to
find problems within his/her own patch, he/she will have to debug the software. GNU
debugger (gdb) is the recommended tool that the Apache HTTPD community uses to
trace the problem within the software. The
re is a detailed

to assist developers
in resolving problems throughout the development process.



Tracking and Feature Acquisition


is the ultimate tool used by the Apache HTTPD community. A

available for the community to use this tool appropriately. This is the tool that the entire
community heavily d
epends on for the success of the project. Users use this tool to
report bugs found, and they also submit suggestions for new features and enhancements
through this tool as well. Developers look through the problem reports submitted and
decide which they
are interested in. Committers, as well as the Project Management
Committee, use this tool to track the general interests of the community, and based on
that to decide the direction for the project.



The release manger is responsible for the release

of the Apache HTTPD software. There
are clear detailed

for Release Manager to follow in order to have a successful
release. There is also an
obsolete document

to educate release manager about the steps
needed for the release process. This obsolete document is replaced by the automated
building script stored in the CVS to ease the effort of releasing this complicated softwar



As in any software development process there are many artifacts involved in the process
of developing the Apache HTTPD release.


Problem Reports

Inputs: Bug reports and new feature submitted by Users

Outputs: Problem Reports number (PR#)
in the Bugzilla Database are input for the

Agents: Users/Developers

Problem reports are derived from the Bugzilla database. All bug reports and new feature
requests are submitted to the bug database from all users. Users are provided with a s
of guidelines to follow before entering a new request. Basically, they are requested to
first download the lasted patch to ensure that their issue has not been resolved. Next,
they should check Bugzilla to see if their issue has not already be submitt



Inputs: Problem reports

Outputs: Patches (diff
files) from Developers to Committers

Agents: Developers

Patches are output from developers and input into the communication channels (e.g.
mailing list or Bugzilla). Patches are the main forum fo
r developers to communicate with
each other. Since all code and documentation is submitted as patches, this is probably the
most significant artifact among all. All patches after submission is then pending on the
voting process for acceptance.


Release Pa

Inputs: Committers consensus

Outputs: The release patch is made available to the general public

Agents: Users/Developers/Committers

Released patches are patches that have been minimally reviewed by and then committed
to the CVS by a committer. Rel
eased patches are made available to the public on the
official web distribution
. Committers can revoke a patch if after reviewing it in detail,
and they find problems with the patch that it should
not be committed. But keep in mind
that the patch can be committed without review from other committers under the current
process. This is consistent with Apache’s current commit
review policy on patches.


Proposed Features

Inputs: Release patches,

Bugzilla enhancement reports

Outputs: Project Roadmap

Agents: Project Management Committee

Proposed features are output from the Project Management Committee and an input to
the release manager. The PMC develops a list of proposed features based on thei
personal judgment, requested feature enhancements in Bugzilla, and from the
enhancement coded as patches (which is submitted to the developers accessible portion of
the website). This list is referred to as the project “
” and will then be voted upon
by the PMC. The “
” will then be reviewed by the community and turn into the

” file, which contains the more elaborated requirements and the outcome of the
votes from the committers.


Proposed Requirements

Inputs: Proposed features

Outputs: Status file from PMC to Developers

Agents: PMC/Re
lease Manager/Developers

Proposed requirements are output from the PMC voting on the proposed features and
input to the developers, who will start coding these new features, and to the release
manager who will make a decision as to which of the new featur
es will be included in the
release. The proposed requirements include not just the requirements, but also the results
from the votes that each PMC member submitted for each requirement. This file is
placed on the developer accessible website. This file
is referred to as the “
” file.
Developers access this file and determine which features they want to implement.


Patches for New Release

Inputs: Proposed Requirements

Outputs: Patch
es from developers to the release manager

Agents: Developers/Release Manager

Developer source code for new features is contributed as patches, and then sent to the
community, which undergoes the review
commit process. From here the release
manager d
etermines which of these new features will go into the current release.


Alpha Build

Inputs: Patches for New Release and Status file

Outputs: Bug fixes from developers

Agents: Developers/Committers/Release Manager

The release manager creates an alpha
build from the new feature patches submitted. This
alpha build is place on the web for developers to test. Developers and committers test
and fix bugs in the code and submit those fixes as patches. The output for the Alpha
build is given to the release
manager to create a beta build.


Beta Build

Inputs: Alpha build and Bug Fixes from Alpha Testing

Outputs: Bug fixes from developers

Agents: Developers/Committers/Release Manager

The release manager decides when it is time to create a beta build. The in
puts to the beta
build are the alpha build and the bug fix patches submitted by the developers and
committers. The output is submitted on the developers website for further testing.

The developers test the beta build and produce more bug fixes in the form
of patches.


General Availability (GA) Build

Inputs: Beta Build and Bug Fixes from Alpha Testing

Outputs: Final build made available to the public

Agents: Release Manager

The inputs to the GA build are the beta build and the bug fix patches submitted by

developers and committers. The release manager determines when beta testing is
complete and creates the GA build. The Apache guidelines are then to test the GA on the
Apache website for 48 to 72 hours to determine if it is stable. This however is just
guideline and the release manager can release the GA to the public whenever he/she
deems it is ready for release.



: The overall Apache HTTPD release process. The green portion depicts the patch
development p
rocess, while the blue portion depict the release process.

Apache works on a meritocracy so developers must prove their skills before any of their
code is committed. All code is reviewed by many eyes. This follows Linus' Law [1]
which basically states th
at the more eyes looking at the code the more likely it is that
faults will be discovered and repaired. Committers have shown that they are competent
developers with an understanding of how to write good code. Most committers have a
good understanding of

software process and best practices in developing software.
Therefore it can be concluded that all the traditional training that these software engineers
have been exposed through either formally or through experience is not disregarded while
open source software, but has become a part of their work routine.


The Patch Development Process

Previously released patches can be new features as opposed to just bug fixes. They
become part of a new release and understanding how those patches are dev
eloped is
essential before delving into the release process.

Anybody in the community can submit a new feature request via Bugzilla by indicating
the submission as an “enhancement”. Developers scan Bugzilla and decide which of
these items they would like
to implement. Once a developer has committed to coding the
new enhancement, he/she interacts with the submitter to sort out any details. This is
similar to discussing a requirement with a customer. The “customers” of Apache HTTPD
are the users. The dev
eloper then posts the patch in the form of a diff
file to the mailing
list dedicated to new patches (i.e. new
httpd), or by submitting the diff
file to the
Bugzilla database. From there a committer decides whether or not to commit the patch
into the CVS.

In the past, committers follow a review
commit (RTC) policy. But
due to the large amount of patches submitted currently, committers follow a commit
review (CTR) policy for patches. The CTR policy is subject to the lazy consensus
rule. They co
mmit the patch and then see if any other committers dispute the patch by
sending in a

1 vote or a veto vote. Only one veto is needed to cause a patch to be
revoked. This is different from a majority consensus rule. When there are at least three
+1 vote
s and there are more +1 votes than the
1 votes, then a majority voting issue is
passed. A veto cannot be cancelled, but can only be withdrawn by the originator of the
veto. The status of the veto must be changed for the patch to be released. Developers

may vote as well, but only the developer who authored the patch has a binding vote in the
voting process.


The Release Process

Patches that have been committed and not revoked become a part of the proposed
requirements for the new release. The release man
ager can also look at Bugzilla for other
enhancement requests to determine the requirements for the new release.
Communication with the Program Management Committee as well as coordination with
the entire community is also a crucial step for the success o
f the release process.

Requirements are elicited in a variety of ways. First, as mentioned above, potential
features can come from enhancement reports that are submitted to Bugzilla. Second,
features can be in the form of patches already available (commi
tted into the CVS
already). Finally, features can be requested from the Program Management Committee.
The Program Management Committee details a “
” file, which is overall
n for the preliminary requirements of HTTPD. These requirements are then voted
upon by the community. Each item on the “ROADMAP” file is then detailed out and the
results of these votes are recorded, all together to be put into the “
” file. The
status file represents a history of the development effort for the release. Unlike patches,
new features are not subject to veto. A majority vote means approval for new features.
s and Committers both access and review the status file. From there they
decide which features they want to implement. The new features are then submitted as
patches (diff
files); the same applies to bug fixes (with respect to PR#). From there the
se Manager decides which of these patches and new features will be part of the new
release. Despite what the Program Management Committee’s recommendation is, the
final decision of what will be released is made by the release manager.

Once the final decis
ion as to what goes into the release is determined, the code is then
built into executable ready for Alpha testing. Developers and committers have access to
the Alpha release distribution for testing purposes. This release is provided on the
website and announced through the developers’ mailing list, where users do
not have access to. As developers and committers find bugs they fix them and submit
them as patches again. When the release manager as well as the community is satisfied
with the
Alpha testing, the code with all the fixes applied will then moves into Beta
Testing. Again, only developers and committers have access to the Beta release. They
test and fix bugs until the release manager decides that the code is finally ready to

a general availability (GA) release. Prior to doing this, it is recommended that
the binary distribution is tested on the Apache Software Foundation website for at least
48 to 72 hours. This is only a suggested guideline and again the release manager ha
s the
ultimate control over when the new release is ready and will be available. After all these
testing steps, the code finally is ready for GA and an announcement will be made to the
public regarding of the new release.


Rich Picture for Apache HTTPD Pro

: Rich picture for the Apache HTTPD project. The link to the entire document is


Use Cases for Apache HTTPD Project

Use case per

Process (Relation)

UC provide process sequencing (control flow), tool invocation, resources input/output
along the way, pre
conditions, post
conditions (goals/outcomes), and anticipated
breakdown and recovery situations


Formal Model of Release Process A
pache HTTPD Project

The graph of the model that the Protégé tool generates is a very large and complex
graphic; it is best viewed by itself and is available
. However, a partial view

within the Protégé tool is presented here.



Partial view of the Apache HTTPD development process


Jakarta Introduction and Overview

In order to have a more general understanding of Open Source development within

Apache group, we decided to analyze the Jakarta project, in addition to the httpd project.
Jakarta is one of the largest projects in the Apache Software Foundation.

The purpose of the Jakarta project is to produce and maintain open source products
reated on the Java platform. This project is comprised of several server
side Java
subprojects, categorized in the following three categories:

Libraries, tools, and APIs. Includes build tools, repositories, Java and JSP
libraries, APIs for file manipulati
on, regular expression packages, etc.

Frameworks and Engines. Frameworks for unit testing and web application
development, text search engines and template engines for source code

Server Applications. This includes Tomcat, the official Refere
nce implementation
of JSP technologies, WebDAV aware CM systems and email/news/messaging
servers, amongst others.

We analyzed the software process followed in two of these subprojects, in order to gain
insight on Jakarta software product lifecycles. The t
wo subprojects selected, Tomcat and
Lucene, belong to the Engines and Server Applications category, respectively. We chose
these projects due to their success and visibility within the Jakarta project.

Tomcat is the servlet container used by the official

reference implementation for Java
Servlets (
) and Java Server Pages
). Tomcat is commonly used in combination with the
httpd server, in order to support Java Server Pages development and usage. Tomcat is
available for commercial use under the ASL license in both binary and source versions.

Lucene is a fully feat
ured, Java
based text search engine, optimized for high
performance. Lucene became part of the Jakarta project in September of 2001. This
subproject features incremental and batch indexing searching, it allows having indexing
control, stop
word processing,

content tagging, stemming and querying. Lucene is
available for commercial use under the ASL license.


Jakarta Problem Domain Characterization

The Jakarta project, especially the Tomcat subproject, has been very successful. Over the
years the developers
and committers on this project have learned valuable lessons from
personal experience as well as from the older httpd project. This has allowed them to
restructure their open source software development processes in a more efficient way.
Lessons learned in

the Jakarta project, have allowed team members to further refine the
software process in order to enhance it and make it more efficient. An example of this is a
problem that presented itself with the 3.0 release plan of Tomcat. At the time, a group of
elopers that happened to know each other would create the release plan offline, that is,
without making the process public, until they came to a decision. This caused a great deal
of complaints from other Tomcat contributors (see

for details). This led PMC
member, Ted Husted, to prohibit offline committer votes to occur.

These types of problems have allo
wed Jakarta to flourish into a great example of open
source software development. Therefore, analyzing this project provides us with valuable
open source techniques, which may allow us to form a general open source software
development meta


a Process Modeling and Visualization

Using the information we collected from the Jakarta, Tomcat, and Lucene websites and
mailing lists, we have been able to abstract a general software life cycle process for a
typical Jakarta subproject. Several agent ro
les have been identified and are discussed in
the following section. Section

discusses the tools and network infrastructure used by
typical Jakarta subprojects, followed by the Jakarta artifacts in section
. The process
description and hierarchy are then described in section


Agent Roles

A subset of the httpd agent categories compose the members of the Jakarta project. These
agents include the following:


See section


See section


See section

PMC members

See section


Tools and Networ
k Infrastructure

The tools and network infrastructure in the Jakarta project are the same as those in the
HTTPD project. Please see section

for more information.



Artifacts of the Jakarta projects are the process
inputs and outputs. For artifact
descriptions please see the appropriate subsection in section


Process Description and Hierarchy

The following subsections describe the software lifecycle process of a typical Jakarta
ct, which is depicted in figure 2. The subsections represent process enactments,
which are represented as rectangular boxes in the diagram. Since open source software
lifecycles are being examined here, the following subsections cover all three open source

software development (OSSD) process layers. In particular, OSSD articulation processes
are reflected during Jakarta voting procedures, which are discussed in section
OSSD community development is also p
ortrayed by the extensive emphasis on
communication, which is chiefly described in section
. The bottom
most OSSD
process layer, software development process, is encompassed by the entire process



Partial view of the Jakarta process life cycle model from within Protégé. For the full view



Jakarta software development life cycle process rich picture
. See

for the hyperlinked



Input: Latest Build

Output: Latest Build

Agents: Users/Developers/Commi

As shown in the Jakarta process diagram (see figure 2), a specific starting point for the
Jakarta software life cycle does not exist. Despite this we chose to begin our process
description with the download procedure since it is at the root of all o
pen source software
development projects.

Downloading application source or binary code is the first step towards application usage
or development. Therefore, as a prerequisite for the download procedure, source and
binary code is made available via the
Jakarta website. For example, Tomcat and Lucene
binaries may be downloaded at

and source
code is available at

Downloads are performed by all users interested in the Jakarta project. More specifically,
many users are most likely to download the binaries of an application and use it as is
ithout needing to make any modifications to it. These users have a choice amongst four
types of binary builds:

Release builds

See section

Milestone builds

See section

Nightly bui

See section

Demo builds

See section

For more information regarding these builds, see

On the other hand, other users download the source code so that they can “hack” and
integrate the Jakarta application into their software product. In this case, these users have
a choice from three types of source code drops:

Release sou
rce drops

This code is “as good as it gets” according to the Jakarta
website and is intended for high quality products. These releases are reviewed to
ensure Servlet and JavaServer Page compatibility.

Milestone source drops

Milestone code is not inten
ded for commercial
products because although much of the functionality is acceptable, there still exist
many bugs. The advantage of milestone source drops is that they allow their users
to explore and use future feature of the product.

Automated nightly s

These code “snapshots” are automatically taken
by CVS everyday. As a result, the code is very unstable. Despite this, Jakarta
project developers may need this latest code.

More information regarding Jakarta source downloads can be found at


Application Usage

Input: Latest Build

Output: Bugs/Feature Requests

Agents: Users/Developers/Committers

Once application source or binary code

has been acquired following the download
process, users then begin to use the application. As described above, these users may
either use the application as
is, or they may make modification to it and possibly
integrate it into their own software product.

Over time, users will begin to notice and potentially be irritated by bugs and other
problems in the software. As problems are encountered these users communicate
primarily through the use of the user mailing list appropriate to the Jakarta product they
are using (see
). They use this mailing list in
order to resolve their problems by asking other, more experienced, users for assistance.
Other resources, su
ch as the Tomcat FAQs at
, as well as online books,
articles, and even debuggers (see

blems that are not resolved through the use of these resources are eventually
perceived by the users as bugs or software deficiencies. This leads to the following
section, which discusses the feature requests and bug reports that users can make when
sfied with some particular aspects of a Jakarta application.


Feature Requesting and Bug Reporting

Input: Bugs/Feature Requests

Output: Requirements

Agents: Users/Developers/Committers

As active users of the Jakarta project come across bugs in their Jaka
rta application they
often report it using mailing lists, IRC chat, and primarily Bugzilla
). In Bugzilla, users must first chose
which applica
tion they are using before submitting their report.

Users of some of the Jakarta applications, such as Tomcat, are encouraged to follow bug
reporting guidelines. In particular, for Tomcat, users should include the following
information (

Tomcat version

Tomcat component

the component which has the bug

Hardware platform and operating system

JVM and Web server version

Configuration files

Log f
iles and stack traces

Examples which demonstrate the problem

Bug fix patch if available

Users may also use Bugzilla to request new features for a Jakarta application. This is
performed by submitting a bug report as any other bug, and setting its severity
“enhancement” (


New Feature Proposal

Input: Requirements

Output: Proposal

Agents: Developers/Committers

From time to time, developers and committers,

perhaps under the influence of feature
requests, draft a new feature proposal (see

for an exam
ple of such a proposal in the Tomcat
project). If their idea actually came from a feature request then they assign themselves to
that bug in Bugzilla, so that others will know who is doing work on that particular
request. Next, they submit the new feature
proposal to the developer mailing list for
approval. Approval or denial of the proposal is dependent on developer and committer
votes. Jakarta voting procedures are explained in the following section.


Input: Proposal

Output: Approved/Disapproved P

Agents: Developer and Committers

Unlike other projects, Jakarta is not controlled by a single dictator. Instead, Jakarta
projects are based on a “minimum threshold meritocracy” with project decisions being
made by a particular group (
). This group
includes contributors that have committer status in the project. The only exception to this
rule occurs when the voting issue regards changing sour
ce code that was created by a
developer. In such a case, the primary author of that code is allowed to make a binding
vote. In addition, all other contributors are encouraged to express their opinions about a
voting issue via the developer mailing list, de
spite the fact that their votes do not count.

Voting in a Jakarta project is performed primarily via the developer mailing list (see

for an
example vote). These votes are based on a three point system as described below:


“Yes” or “agree” or “the action should be performed.” For some issues this
癯瑥vca渠潮oy 扩湤b楦i 瑨攠癯瑥v 桡猠a汲lady te獴sd

瑨攠ac瑩潮o潮o瑨敩t o睮w


“Abstain” or no “opinion.” Although these are neutral votes, too many could


“No.” On issues that require voter consensus, this vote acts as a veto. All
癥瑯敳t a牥 ex灥c瑥搠瑯t 扥 acc潭oan
楥搠睩瑨wa渠ex灬慮p瑩潮Ⱐ潴桥o睩獥 瑨ty
a牥⁤ e浥搠癯楤m

䅮潴桥爠ty灥 潦 癯瑥Ⱐ睨楣栠楳i湯琠楮捬畤i搠a扯癥Ⱐ楮捬畤i猠瑨攠湯n
扩湤bng a湤n楮景i浡m

周q 摩晦e牥湴 type猠潦 v潴敳o摥獣物扥搠a扯癥 a牥 ca獴s潮
楳i略猠s牥⁣a瑥g潲楺e搠楮d漠獩x⁤楦 e牥湴⁣n瑥g潲楥猬⁷桩 栠h牥⁤e獣物re搠d猠s潬o潷猺

term plans

These plans are simple announcements made by developers
working on a particular component of a Jakarta project. Bindin
g votes are not
made on these, but committers and developers are encouraged to express the
opinions regarding these plans such that problems can be addressed as quickly as

term plans

term plans are also not directly voted upon. The
se are
announcements that are intended to keep developers and committers updated on
who is working on which part of the project.

Release plans

See section

Release testing

New releases must be tested before release t
o the general
public. Release test require majority consensus for approval.


Showstoppers are issues that must be resolved before the next
public build release. These issues are considered quite important and are kept in a
unique file, named

STATUS, which is packaged with the build. This is done in
order to ensure that the problem is fixed before the release.

Product changes

Project code and documentation changes are also voted upon.
These changes are also kept in the STATUS file.



Input: Approved Proposal/Requirements

Output: Latest Code

Agents: Developers/Committers

Once a developer has decided what to work on, whether it be a bug fix or a new feature,
he or she then retrieves the latest version of the source code. This is per
formed using
CVS, WinCVS, ViewCVS, CVSup, and Rsync
). Using these tools developers and
committers may access the data repository in two ways: anonym
ously or via login access.

All users are given permission to access the data repository anonymously. When logged
in anonymously, users may only checkout source code. In order to attain full access, the
user must actually be a committer with a login accoun
t on the Apache development


Design and Code Major Change

Input: Latest Code

Output: Latest Code

Agents: Developers/Committers

One central part of the process is the design and implementation of the product.
Implementation is many times the route

in which many developers get started. A person
can propose a new piece of code, or a patch to be included into the code base. This person
then becomes a developer. Developers often get involved in the project because they
desire to include additional feat
ures for the product, and they volunteer to make the
change or contribute to the implementation.

When developing the software, there are two kinds of code changes that developers and
committers can make: major changes and minor changes. Minor changes enco
simple patches to fix bugs, or minor changes to the code that affect little part of the
functionality. In the other hand, major changes are new features, large
scale changes that
can affect the semantics of an existing API function, program size or d
ata formats. In
general, a major change is such that it can affect a major area of the program.

Design is largely done by the contributor of the code, when the feature is small or when
just one person is developing a large feature to be submitted latter t
o the project. But
when a major piece of functionality is to be developed, committers and developers will
exchange ideas and comments on how to go about implementing a major feature. After a
decision is made, developers can start working on the items that
have been assigned to a
particular release version of the product.

In the Jakarta project, developers and committers make code changes by using different
tools (such as text editors, IDEs, etc). Each developer/committer will either contribute
with a piece

of code (such as a feature, for instance, Spanish support in Lucene), or will
volunteer to fix a bug or help implementing a feature request made by somebody else.

All code changes should be successfully compiled and tested before being submitted for

Each of the projects repositories contain a file called STATUS that keeps track of the
agenda and plans corresponding to that repository. Committers use this file to inform
others of the changes being made. When submitting patches/code changes
to CVS, the
person who checked in the patch should send a message to the person who contributed
with the patch, as well as the mailing list, to specify that the patch has been submitted,
and avoid source code/patch conflicts.

Developers and committers sho
uld follow project conventions when working with the
source code (see


Conventions for the
Java Programming Language

for more details)

After changes have been made, source code is submitted to CVS for storage. This is
alled the “latest code” and it can be used to produce nightly builds. Latest code at this
point is very unstable since people other than the developer who contributed with the
change haven’t tested it.


Design and Code Minor Change/Bug Fix

Input: Latest C

Output: Latest Code

Agents: Developers/Committers

Simple fixes to bugs can be committed and then reviewed. With this kind of process,
there is a high level of confidence in the change made by the Committer. This is an
acceptable practice since minor c
hanges shouldn’t affect major functionality of the
product. Developers and committers implementing minor bug fixes follow the same rules
as those implementing major code changes. Basically, after a minor change has been
made, the code will eventually be re
viewed and included into the main code base. The
person implementing the change informs the developers’ list about the updated code.



Input: Latest Code

Output: Nightly Build and Source Snapshots

Agents: Committers

When committing code changes, co
mmitters should try to commit related changes as a
group, or as close together as possible. It is very important that the current source code
should be ready for compilation at all times. Thus, committers have to be careful when
committing major changes an
d they must indicate any risks or expected problems when
committing the code.

Committers can use the STATUS file in the repository, to summarize the code changes
submitted since the last release.

Every night, a new build is created, that includes the lat
est code for the day. These
nightly builds are very unstable, but they can be used for further testing. Nightly builds
are meant for developers helping to develop the product.

Once committers have decided that a new build can be considered for final relea
se, a
committer will make the build and send a message to all committers, to indicate that no
changes are to be made to the repository for a certain period of time (after this a build can
be submitted as release candidate). This is known as “code freeze”.

Code freeze periods have to be short, otherwise, changes start being submitted again and
the code freeze stops. This usually happens when changes are still being made to add new
features. Another alternative could be to create a branch when a release is "
complete" and then apply bug fixes in that branch until the release is ready. At this point,
changes can be merged in the mainline. (This alternative was suggested for the Lucene
project, but it seems that to this day, they are still using code fre
eze as the main
alternative to this problem, see



Input: Latest Code

Output: Nightly Build and Source Snapshots

Agents: Committers

Once a developer or committer has submitted code changes, committers are informed of
this through the mailing list. All major code changes have to go through a rev
process. In this process, developers and committers communicate to verify that the code
has been successfully updated to address the bug or to implement the feature request or
area of functionality that has changed.

Sometimes the actual source code
or documentation will be attached to the body of the
message being sent, along with comments about the code. Another option is to review
what has been submitted to CVS (for instance, when a committer makes a minor change
to the code). After the code/docume
ntation has been reviewed, the code can be
committed, to be included in a nightly build.

It is important to note, that code has to be approved with no rejections on it, by any of the
committers, in order to be approved as part of the code base.


Build Pla

Input: Nightly Build and Source Snapshots

Output: Build Plan

Agents: Committers

Committers develop release plans upon which they vote in order to determine what will
be contained in a build release. Committers will gather information contained in t
“status” and “to do” files for a particular project, and then plan what features will be
included in a particular release of the product, and what builds can be considered as
candidates for milestone or release builds.

Several methodologies have been
proposed within the different Jakarta projects, in order
to determine what should be the build planning process to be followed. An example of
this can be found at

The proposed release staging process is as follows:

Stage 1 (Design)


determine and design new features for next release.

Stage 2 (Development)


Work on new features.

Stage 3 (alpha)


All new features exist, but there are bugs. May fail some unit te
Feature Freeze (difficult in a open source environment).

Stage 4 (beta)

No show stopping bugs and all unit testing completed. Request outside
developers to start working with release. Fix bugs.

Stage 5 (release candidate)


All know bugs have been

fixed and the product is
presumed stable. A wider audience tries the release. If not bugs are found in a 5
period (suggested), the release goes final gold master. Source code freeze unless bugs

Stage 6 (Gold Master)


The release is final.

ld planning results in a build plan. A build plan can include:

Schedule for release candidates.

Schedule for release, considering other product schedules. (For instance, Tomcat
and Apache)

Frequency of release candidates.

Milestone Builds.

Naming conventi
ons for releases.

Features to be included with a release.

Platforms to be supported.

Documentation (FAQs, Installation guides, Release notes, etc).


Build Voting

Input: Build Plan

Output: Release Build and Source

Milestone Build and Source

Demo Build

ts: Committers

The voting process is fairly similar across the different Jakarta projects. As outlined
previously (see section
), committers will vote on what builds can be considered a
final release. There are few diffe
rent kinds of builds that can be decided upon. The
categories for builds (other than nightly) are:

Release Build.

Release Builds are the top quality builds. These are considered to be the
"final" builds. A build is not considered good for release, unless
a considerable amount of
testing has been performed on it. Users, developers and committers will use it for some
time after it was originally released, and if no problems are found, then committers will
vote to make it a final release.

Milestone Build.

ese are somewhat stable, but not as good as a release build. They are
buggy, however they can be used as a patch for users who want to take advantage of new
features. As for developers and committers, these milestone build can be used to track the

of the project.

Demo Builds.

These are done to show demonstrations of the products.



Agents: Developers/Committers

In addition to the sub
processes described above, announcement is a background process
that continuously occurs in parallel

with any other process. This communication occurs
primarily on the mailing lists. Jakarta contributors have a choice amongst of four types of
lists (

Announcement lists

This type of list is quite low in mail traffic, and is intended
for announcing very important information, such as final release builds, to all
people involved in the Jakarta project.

User lists

This type of list is int
ended for Jakarta software users to discuss with
one another configuration and operating questions. These mailing often lists
contain high mailing traffic.

Developer lists

Developer lists are intended for developers and committers to
discuss development
issues. Some announcement that occur here, which may not
be found in the other mailing lists, include project proposals (see

an example).

Commit Lists

These lists receive all of the automatic CVS code commit
messages of their respective project. Committers are required to be subscribed to
the commit list of their Jakarta subproject so that th
ey can remain aware of the
changes that are made to the repository.

When announcements are made to these lists they must follow certain conventions. For
example, when a new patch is developed an email is sent by the patch’s author to the
developer and/or
user mailing lists. The subject headline of such an email is labeled
“[PATCH].” Likewise, proposal and general announcement emails are labeled
“[PROPOSAL]” and “[ANNOUNCEMENT]” respectively.

Announcements are not only performed via email. They are also co
mmunicated via IRC
chats, website forums and FAQs (for examples, see


Open Source Software Process Modeling with Protégé



Software developm
ent is a challenging task, not only because of the complexity of the
software artifacts being developed but also because of the complexity of the process that
defines the activities needed to produce these artifacts. This process is a very complex
task th
at involves many different artifacts being produced, various users involved in the
process, a variety of tools that are used to support software development and
communication, and constraints on these entities imposed for various purposes. In
software eng
ineering research, there have been many attempts to capture the necessary
information in effective abstractions in order to make the development of software an
easier task. Software lifecycle models such as the traditional waterfall model or more
recent t
echnique such as extreme programming are attempts to create some sort of order
in the chaos of the software development process. Some technologies make these
attempts in a general and highly abstracted manner, such as the waterfall model [Roy70],
or in th
e ad hoc and flexible manner of the extreme programming model [Bec99].
Others, like process programming, attempt to formalize the process so that they can be
manipulated using the advanced programming techniques familiar to software developers
from the re
alm of source code creation and management [Ost87]. One of the most
important challenges involved in these attempts is striking a balance between
expressiveness and succinctness. Processes must be defined precisely enough to be
useful, but need to be suf
ficiently abstracted so as to promote understanding. Formal
approaches to software process descriptions have several advantages such as the enabling
of automated analyses on software process models, easy interchange of process
descriptions due to the comm
on format, and multiple visualization generation from a
single formal description. To leverage these advantages, process formalization using the
Protégé system is described.


Formal Approaches

Formal approaches to the problem of process definition have b
oth advantages and
disadvantages. One of the most important disadvantages is the fact that it is an
intellectually challenging task to understand a formal process description. Being able to
understand formalisms by looking past the unintelligible collect
ion of the formal syntax
requires a significant familiarity with formal techniques. As a result, these techniques are
not accessible to a large number of users. Additionally, defining any activity or artifact
in a formal way requires a greater degree of
effort mainly dedicated to adhering to the
rigorousness of the model guiding the formalism; formal methods are not well
suited to
casual use for a non
challenging software development activity.

Nevertheless, formal approaches are not without significant a
dvantages. A formally
defined model is uniquely suited to analytical scrutiny. Automated tools can be created
that examine the formal description of a software development process searching for any
number of possible inconsistencies such as problematic a
ctivity steps. Additionally,
automated efficiency improvements can be applied to such models, as places in the
process model where these are possible can be algorithmically located. Also, a formal
description that adheres to an accepted and well
set of constraints is one that is
organization and developer independent. The interchange of formal descriptions and the
use of tools created by third parties become an easy task; the lack of ambiguity and the
common format enable easy information exchang
e. Finally, multiple visualization
capabilities can be leveraged once a formal description exists. It is an easy task to
conceive of different ways to visualize the same model emphasizing different aspects. A
formal description that can be used as an in
put to a particular visualization generator
reduces the effort to create these multiple visualizations significantly. Therefore, a
formal model of software development processes can be a powerful artifact that can
enable many different activities and anal
yses that are impossible with in informal
description of the same model.



Protégé is a tool developed at Stanford University that allows for the creation and
manipulation of ontologies [Pro02]. Ontologies are specifications of conceptualizations
encoding knowledge about the structure of a specific domain. An easily accessible
example for the domain of software development is that of a class hierarchy from object
oriented programming techniques. This class definition encompasses the knowledge
ut the domain the software system operates within; this definition is precise and
unambiguous. The Protégé tool allows for the creation of the ontology model, the
general structure that all knowledge bases dealing with the ontology's domain must
adhere to
. Additionally, the tool allows for the instantiation of specific occurrences of the
ontology to capture information about a particular situation. For example, an ontology
can be created capturing the information essential to a vineyard such as the diffe
types of wine that are available and the quantities of each. In the ontology, the structure
for this information is created in an abstract way with no instance
specific information.
In the instantiation of this ontology, information is precisely def
ined according to the
ontology's model; for example, specific wines with precise quantities would be defined.
Using the object
oriented programming example, instances of classes can be defined.

The capabilities of Protégé can be used for the definition o
f process models. An ontology
can be created that defines a process meta
model, encapsulating the general structure of
the actions involved in the software generation software, the users that perform these
actions, the artifacts that are produced by these

actions, and the tools that make the
operation of the process possible. The meta
model would be an abstract entity that
contains the general information that underlies all software development process
descriptions; the ontology defines the commonalties b
etween processes.

The Protégé system was designed to be extensible and in line with the open source
development methodology allows its users to develop extensions to the main tool. To
fully leverage the advantages of a formal process description, there a
re certain available
extensions that must be used.

The Ontoviz plug
in is an extension to the Protégé tool that implements automated graph
layout capabilities. The extension uses the Graphviz package developed at AT&T to
automatically graph the entities
defined in the main Protégé ontology both for the abstract
ontology as well as any instantiations. Because the graph layout is done automatically,
process designers can focus on the creation of the process model rather than its
visualization. A variety o
f options for graph customizations are provided by the Ontoviz
extension, further emphasizing the multiple visualization advantage of formalization. For
example, the tool allows for the expansion of certain aspects of the process model and the

of others in the final graph.

In addition to Ontoviz, the XML Tab extension is one that is very useful in the processing
of open source software development processes. This plug
in allows for the importing
and exporting of both ontologies and instantiat
ions in XML format. These capabilities
support the formalization advantage of interchange by saving process models in a non
proprietary format. Sharing of models could be done easily, with only a single XML file
representing processes. Additionally, the

XML format is well supported in a variety of
other tools; therefore, third party tools could be modified to perform various analyses on
the software development process models in addition to a host of other tasks intrinsic to
the XML file format.


del Definition

The meta
model is the driving entity of any software development process model; the
foundations for all models are contained within the meta
model. The first step toward
using Protégé for formal software development process descriptions is

the definition of a
model that can be used to instantiate specific instances. This paper defines a meta
model that can be used to represent software development processes, both those having to
do with open source as well as traditionally developed s
oftware. The overall design has
been heavily influenced by the Process Markup Language (PML) presented in [NS01].
The application domain of the Protégé tool with the limitations that the graphical options
available impose on the model, and the desire to
decentralize some of the information
contained within individual action constructs prompted some changes that are presented
in the following discussion of the meta
model. For example, the notion of a

has been used to indicate the proper sequenc
e of actions.



model view from within the Protégé tool

The basic design of the software development process meta
model that is presented here
consists of certain high
level elements that are abstractions of the b
asic entities of the
software development as well as some constructs meant to illustrate the different types of
the logical control flow that the process can follow. Each of these elements is composed
of a number of attributes that specify the values that

distinguish one element of the same
type from another. A listing and discussion of each of these high
level elements follows.

Process Model




flow scenario


Process Model

element is the top
level element that represents t
he overall process
construct. The

field defines the name of the entire process, while the

field is a
possible link to documentation. The
flow scenario

field is a link to an instantiation of a
Control Flow

construct, and represents the main logic
al flow of the process. Both the
and the
flow scenario

fields are required; values must be entered for these fields.





acts on



element is an abstraction of the actors that participate in the execution of
certain process step. The

field defines the name of the actor being represented, and

field is a hyperlink to possible documentation. The
acts on

field is a link to an
instantiation of an

element, which defines the action that the pa
rticular agent
participates in. Motivation for the design decision of including the
acts on
field within

construct was twofold. First, there is a set of display limitations imposed by
the Ontoviz plug
in; by establishing the relationship betwee
n actors and actions in such a
manner, the graph generated by Ontoviz was easily understandable without significant
display customization. Second, and perhaps more importantly, there was a desire to
distribute some of the information encapsulated in the

construct to other element
types in order to make the

construct more compact.





required by


construct is an element that defines a particular resource that is either
produced or required by a certain ac
tion. The
field defines the name of the
resource being defined, while the

field defines a possible link to further
documentation. The
required by

field is a link to an instantiation of an

and establishes that the resource being re
presented is required by the defined action.
Similar to the case of the
Agent acts on

field, the
required by

field was included in this
construct for both graphical understandability and information distribution.






used by



element describes a particular tool that is used by actors in accomplishing a
task. These tools are most usually in the form of executable tools, though they can also
be collaboration support applications such as chat programs. The

field defines the
name of the element, the

field defines a link to possible additional documentation, and

field may define the command to begin the tool's execution. This

field will be especially useful when applying a process p
rototype generation system on
the formal description of the software development process being modeled. The
used by

field defines which actions these tools are used in. The decision to associate tools with
actions was made to promote readability of the g
raph by minimizing the number of
connecting lines in the case when the same tool was used by more than one actor in a
single action; by associating the tool with the action, only one line needs to be drawn to
connect the action with the tool. If tools wer
e associated with agents, there would need to
be as many lines as there are tool users.








construct is an abstraction of an automated script that performs an action
without the manual contribution of an
agent. The

field defines the name of the
script, the

defines a possible link to further documentation, and the

field defines
the actual code that the script executes; this code may be one of many different types of
an executable program rang
ing from a script to a full
fledged program.









next action


element is the abstraction of a primitive process step that represents a
particular action; the action that this construct repre
sents will be the smallest granularity
action that the process designer desires for the model being built and should carefully
balance the prescriptive and advisory features of the process definition. The

field is
the name of the action being represe
nted, the

field is possible hyperlinked
documentation, the

field may define which

construct presents the executable
of this action, and the

field defines what resources are produced by the action.

field defines whether
the action is a "manual" one or an "executable" one; if it is
an executable action, the

field should be defined. Finally, the
next action

field is a
logical link to the next action that composes the overall sequence the action being defined

in; this field was added partly to ensure that during XML interchange of models
the order of their execution remained unambiguous, and partly to take advantage of the
graphical capabilities of Ontoviz.

Control Flow




next control flow

onstructs that belong to the
Control Flow

element type specify the logical order in
which actions should be performed. The

field defines the name of the construct,
and the

field is a link to possible hyperlinked documentation. The
next control f

field is a link to an instantiation of a
Control Flow

element and defines the logical flow
construct that follows the one being defined, and is similar to the
next action

field that
was defined in the

element type. The existence of this
next co
ntrol flow

allows for the nesting of
Control Flow

constructs to allow for the construction of
logically complex processes. This element as four sub
categories that provide more
details about the logical flow.



ruct defines a set of actions that are to be performed
sequentially, and is the most common type of control flow encountered. The
field defines the first of these actions.




construct defines a set of actions, on
ly one of which is to be
performed. The

field defines what these actions are.




construct defines a set of actions or control flows that can be
performed concurrently. The flow of the process only moves on if all of t
actions are completed. The

field defines the set of actions to be




construct defines an iteration over the specified actions or
control flows. The
field defines the first actio
n of the sequence to be
iterated, while the

field defines when iteration ends. This
field is currently defined as a string that can take the form of a natural language
statement; this was considered to be the most versatile way to impl
ement a
conditional check that would accommodate everything from a formal condition to
a completely informal and developer
dependent one.


Using Protégé for Process Descriptions

Using the Protégé system is not a difficult task, once the meta
model has been

established; instantiations of each entity that comprise the process are easy to create. The
most difficult part of using the tool is the identification and proper use of the different
types of logical control flows. Simple sequences are less common as
the software
development process scales up; once the process begins to glow, more and more
constructs such as branches and iterations begin to appear. It seems, then, that the most
important element of an effective and understandable formalization of a so
development process is the proper decomposition of the overall process into smaller sub
processes, which the
Control Flow
constructs are meant to define. It is on this task that
the majority of time should be spent when formalizing a process so tha
t the maximum
gains from the formal description can be had.



In the following subsection we compare our findings on the Jakarta and HTTPD software
process lifecycle models. Then we discuss the differences observed between traditional
software li
fecycle models and open source software development at the Apache Software


Jakarta versus HTTPD

As shown above, Jakarta and HTTPD follow relatively similar software lifecycle process
models. Due to the fact that Jakarta is a more recent Apach
e project, its process is more
clearly reflected in its website. This is because Jakarta contributors had already learned
from the mistakes which occurred in the Apache project, resulting in more organized
software development process guidelines.


nal Software Lifecycle Models versus Apache

Clearly, open source software development processes are quite different from traditional
software development models.

Based in our findings on the Apache projects analyzed, it appears that the most
significant d
ifferences are found in the following areas:

. While traditional software enterprises maintain a tight management to
control software development projects, open source development projects are controlled
by individuals who volunteer their time a
nd skills to the community.

Licensing and Usage
. Open source development licenses (i.e.
) are created to maintain the open source status
of a project as it is di
stributed, reused, and modified by different users. This means that
the source code is required to remain openly available to the public. On the other hand,
traditional software source code is protected from public distribution by special types of
, such as copyrights, in order to ensure that it is kept proprietary to its

Requirements Elicitation
. Requirements in OSSD are obtained from user and developer
requests, whereas in traditional models, software requirements are drawn from a
articular department, like marketing from instance.

Design and Development
. Design does happen in OSSD, mainly for major changes to
the base code. Development is also different since developers volunteer to work on
certain part of the project they are i
nterested on.

. Since all members of the community have access to the product at any time,
testing becomes everybody’s task. The more people download and test the software, the
more problems will be discovered and eventually fixed. This differs fr
om the traditional
approach, where only a handful of people will test the software, under controlled

Frequency of builds and product releases
: Traditional software processes only release
when a build is stable enough (called either “alpha”
or “beta”, depending on the quality
of the build). In comparison, OSSD builds happen nightly. They follow the rule of
releasing early and often.

Team communication
: Communication in OSSD happens asynchronously, as team
members are located in different p
laces, and work under different schedules. Most of the
time, team members do not know each other and the only interaction between them
happens through email exchange.



As demonstrated in this paper, open source software development processes
defy the rules and methodologies that have been so carefully laid out by proponents of
traditional software development processes. Despite this, some of the most successful
software in the world, such as those developed at the Apache Software Fou
ndation, has
been developed following open source ideologies. This would seem to indicate that
software development efforts can be successful even without following the traditional
models. However, there is method to the open source community’s madness.

processes that govern their software development efforts exist, even though they may not
match those generally accepted by monolithic software development organizations.
However, without a dedicated effort to precisely identify and model these proces
ses, the
questions of why these processes are successful and how they may be improved cannot
be answered. This paper has been an attempt to study in detail the software development
efforts of the Apache Foundation and the very closely related Jakarta proj
ect. The
examination of the process that these two entities follow in developing their software
yields valuable results both in providing insight into a successful open
development methodology as well as in uncovering potential areas of improvement

where the existing process can be streamlined. Perhaps the clearest contribution is the
identification of the model itself; processes in the open
source community are well
hidden within vast mailing list archives and personal communications. Formalizati
on of
these processes provides further added value to the models developed, as the realm of
easy interchange of process models and automated analyses becomes available to the
source community.



Justin Erenkrantz <
> provides information that is not
documented in the HTTPD project website. More information about Justin can be
found at his personal website at

Jason Robbins <
> provide information regarding the
Tigris project. More information regarding the Tigris project can be found at

Walt Scacchi

provided much guidance and inspiration for
studying the OSSD community and the processes they follow.




E. S. Raymond,
The Cathedral and the Bazaar
First Monday
, 3(3), 1998.




Netcraft Web Server Survey


B. Behlendorf,
The Apache Story
, Linux Magazine, June 1999


R. Fielding and G. Kaiser,
The Apache HTTP Server Project
IEEE Internet
, 1(4):88
90, July/Aug. 1997.


D. Cubranic and K.S. Booth, Coordination in open
source software development,
Proc. 8th IEEE International Workshops on Enabling Technologies:
Infrastructure for Collaborative Enterprises
, 1999.


D. Wheeler,
Why Open Source Software / Free Software (OSS/FS)? Look at the
, June 2002


M Kasichainula,
Presentation: IBM and Apache plan their first date
, March, 2000


New version of Apache released

, April 8, 2002


Apache 2.0 to debut Monday

, November 9,


Delayed Apache software nears release
, April 5, 2001


Apache Web software on verge of major revision
, August
8th, 2000


W. S
Understanding the Requirements for Developing Open Source
Software Systems
, to appear in
IEEE Proceedings
, 2002.


C.R. Reis and R.P.M. Fortes,
An Overview of the Software Engineering Process
and Tools in the Mozilla Project
Proc. Workshop on Open Source Software
, Newcastle, UK, February 2002.


A. Mockus and J. Herbsleb,
Why not improve

coordination in distributed software
development by stealing good ideas from Open Source?
Proc. 2nd Workshop on
Open Source Software Engineering
, Orlando, FL, May 2002.


T. Halloran and W. Scherlis,
High Quality and Open Source Software Practices
Proc. 2nd Workshop on Open Source Software Engineering
, Orlando, FL, May


A. Brown and G. Booch,
Reusing Open Source Software and Practices: The
Impact of Open Source on Commercial Vendors
, Proc. 7th International
Conference on Software Reuse, 123
136, Austin, TX, USA,
April 15
19, 2002.
Appears in, C. Gacek (Ed.),
Software Reuse: Methods, Techniques, and Tools
LNCS 2319, Spring
Verlag, May 2002


A. Monk and S. Howard,
The Rich Pic
ture: A Tool for Reasoning about Work

, March
April 1998.


S. Bendifallah and W. Scacchi,
Work Structures and Shifts: An Empirica
Analysis of Software Specification Teamwork
Proc. 11th. Intern. Conf. Software
, IEEE Computer Society Press, Pittsburgh, PA. 260
270, May 1989.


P. Mi and W. Scacchi,
A Knowledge
Based Environment for Modeling and
Simulating Software Engineering Processes
IEEE Trans. Data and Knowledge
, 2(3):283
294, September 1990. Reprinted in
Nikkei Artificial
, 20(1):176
191, January

1991, (in Japanese). Reprinted in
Centered Software Engineering Environments
, P.K. Garg and M. Jazayeri (eds.),
IEEE Computer Society, 119
130, 1996.


P. Mi, M.J. Lee, and W. Scacchi,
Based Software Process Library for
Driven Software Development

Proc. 7th. Knowledge
Based Software
Engineering Conf.
, Washington, DC, IEEE Computer Society, 122
September 1992.


P. Mi and W. Sc
Articulation: An Integrated Approach to the Diagnosis,
Replanning, and Rescheduling of Software Process Failures
Proc. 8th.
Based Soft
ware Engineering Conference
, Chicago, IL, IEEE
Computer Society, 77
85, 1993.


W. Scacchi and P. Mi,
Process Life Cycle Engineering
Intern. J. Intelligent
stems in Accounting, Finance, and Management
, 6(1):83
107, 1997.


J. Noll and W. Scacchi,
Supporting Software Development in Virtual Enterprises
Journal of Digital Information
, 1(4), February 1999.


W. Scacchi,
Understanding Software Process Redesign using Modeling, Analysis
and Simulation
Software Process
Improvement and Practice
, 5(2/3):183


J. Noll and W. Scacchi,
Specifying Process
Oriented Hypertext for Organizati
J. Network and Computer Applications,
61, 2001.


W. Scacchi,
Process Models in Software Engineering
, in J. Marciniak (ed.),
yclopedia of Software Engineering

(Second Edition), 993
1005, Wiley, New
York, 2002.


[Roy87] Royce, W. W., Managing the Development of Large Software Systems,
Proc. 9th. Intern. Conf. Software Engineering
,IEEE Computer Society, 1987,


[Bec99] Beck
, K. Embracing Change with Extreme Programming.
. 32(10), p. 70
77, 1999.


[Ost87] Leon J. Osterweil. Software Processes Are Software Too. In
of the 9th International Conference on Software Engineering
, pp. 2
13, Monterey,
CA, March



[Pro02] Protégé Project. Stanford University. 9 June 2002.


[Geo02] Software development process using Protégé. University of California,
Irvine. 9 June, 2002.



: Formal graph of Apache

: Formal graph of Jakarta

Figure 9: The
XML representation of the process meta

xml version="1.0" encoding="UTF



Process_Model name
Process Model

flow scenario

Instance(Control Flow)




The top level definition of the overall process

Agent sl1


acts on



An actor that
cipates in a part of the process.

Resource sl1




required by

A resource item that
is required or produced by actions.

Tool sl1


used by





A tool that is used by an agent as part of an

Action sl1


next action









A primitive step in a process.


Control_Flow name
Control Flow

t control flow

Instance(Control Flow)*





A construct specifying the order in which actions should be

Sequence sl1

A set of actions
to be performed in order.

Selection sl1

A set of
actions, only one of which should be performed.

Branch sl1

A set of actions
that can be performed concurrently, in any order.

tion sl1



An iteration over the specified
sequence of actions.


Script sl1





An automated script that
can be executed.